Setting up Zookeeper and Kafka

This was tested on an m1.medium instance on AWS on Ubuntu 12.04.

Installing Zookeeper

Zookeeper is needed for Kafka. To quote the Zookeeper Apache project page,

Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

In other words, we need Zookeeper to maintain configuration for Kafka nodes in a multi-node environment. In this tutorial, we’ll only be setting up a single Kafka node instance, but Kafka relies on Zookeeper to for configuration information.

We can start off by grabbing Zookeeper:

$ wget http://mirrors.advancedhosters.com/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz

Extract the files:

$ tar -zxvf zookeeper-3.4.6.tar.gz

Change to the zookeeper/conf directory:

$ cd zookeeper-3.4.6/conf

In this directory, create the zoo.cfg file from the provided template configuration file named zoo_sample.cfg:

$ cp zoo_sample.cfg zoo.cfg

Open up zoo.cfg and change the dataDir variable to point to /var/lib/zookeeper. (I used the vi editor to make these changes).

dataDir_var

Since this directory doesn’t exist yet, create it:

$ sudo mkdir /var/lib/zookeeper

Install Java on a fresh Ubuntu 12.04:

$ sudo apt-get update
$ sudo apt-get install openjdk-7-jdk

In the zookeeper-3.4.6 directory, start Zookeeper:

$ sudo bin/zkServer.sh start

You should see a message similar to:

zoo_start_success

You can test to see that it can be connected from Java:

$ bin/zkCli.sh -server 127.0.0.1:2181

zoo_connected

Great! Now you have Zookeeper running.

Installing Kafka

We’ll setup using kafka_2.9, since at the time of this writing, Storm-0.9.2 (which we will be using) plays nicer with this version of Kafka.

First grab Kafka, and then extract it:

$ wget http://mirror.cogentco.com/pub/apache/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
$ tar xvzf kafka_2.9.2-0.8.1.1.tgz

Change to the kafka_2.9.2-0.8.1.1/config directory:

$ cd kafka_2.9.2-0.8.1.1/config

Here, change the log.dirs property in the server.properties file to /var/lib/kafka-logs:

kafka_log_dir

Create that directory:

$ sudo mkdir /var/lib/kafka-logs

In the server.properties file again, point the advertised.host.name property to the public DNS (the address that others can connect to) of your instance:

advertised.host

In the kafka_2.9.2-0.8.1.1 directory, you can start the Kafka server:

$ sudo bin/kafka-server-start.sh config/server.properties &

Or, start it as a daemon:

$ sudo bin/kafka-server-start.sh -daemon config/server.properties

You can stop the server anytime:

$ sudo bin/kafka-server-stop.sh

Create a “topic” named test with a single partition and one replica:

$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

See it at the command line:

$ bin/kafka-topics.sh --list --zookeeper localhost:2181

list_topic_test

You can also start inputting command-line message, and read the output in another terminal. First start the producer to the topic test:

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Type a message and press ‘enter’. For instance, the message can be “hello”.

producer

Open another terminal, and in the kafka_2.9.2-0.8.1.1 directory, start the consumer:

$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

This will start consuming all messages since the beginning of the topic, but you can set the offset. Messages survive for 24 hours by default, but can be tuned to your desire.

You should see each message given from the producer:

consumer

Talking to Kafka with Python

Instead of the command line, maybe we’d like to push messages to Kafka programmatically, say, from a Python script.

First, get ‘pip’ so that you can install Python packages.

$ sudo apt-get install python-pip

Now install the kaka-python package:

$ sudo pip install kafka-python

While leaving your consumer terminal open, run the following Python script:

You should see the message show up in the consumer:

consumer_from_python_producer

You’ve now successfully pushed messages to Kafka using a Python script. There are many more things you can do with the kafka-python package, and you are encouraged to look through the documentation on the kafka-python package Github page.

Summary

In this tutorial, you set up Zookeeper, Kafka, and used a Python script to push messages to Kafka. If you’ve seen any errors in this tutorial, please feel free to leave comments in the section below.