This was tested on an m1.medium instance on AWS on Ubuntu 12.04.
Installing Zookeeper
Zookeeper is needed for Kafka. To quote the Zookeeper Apache project page,
Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
In other words, we need Zookeeper to maintain configuration for Kafka nodes in a multi-node environment. In this tutorial, we’ll only be setting up a single Kafka node instance, but Kafka relies on Zookeeper to for configuration information.
We can start off by grabbing Zookeeper:
$ wget http://mirrors.advancedhosters.com/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz
Extract the files:
$ tar -zxvf zookeeper-3.4.6.tar.gz
Change to the zookeeper/conf directory:
$ cd zookeeper-3.4.6/conf
In this directory, create the zoo.cfg file from the provided template configuration file named zoo_sample.cfg:
$ cp zoo_sample.cfg zoo.cfg
Open up zoo.cfg and change the dataDir variable to point to /var/lib/zookeeper. (I used the vi editor to make these changes).
Since this directory doesn’t exist yet, create it:
$ sudo mkdir /var/lib/zookeeper
Install Java on a fresh Ubuntu 12.04:
$ sudo apt-get update
$ sudo apt-get install openjdk-7-jdk
In the zookeeper-3.4.6 directory, start Zookeeper:
$ sudo bin/zkServer.sh start
You should see a message similar to:
You can test to see that it can be connected from Java:
$ bin/zkCli.sh -server 127.0.0.1:2181
Great! Now you have Zookeeper running.
Installing Kafka
We’ll setup using kafka_2.9, since at the time of this writing, Storm-0.9.2 (which we will be using) plays nicer with this version of Kafka.
First grab Kafka, and then extract it:
$ wget http://mirror.cogentco.com/pub/apache/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
$ tar xvzf kafka_2.9.2-0.8.1.1.tgz
Change to the kafka_2.9.2-0.8.1.1/config directory:
$ cd kafka_2.9.2-0.8.1.1/config
Here, change the log.dirs property in the server.properties file to /var/lib/kafka-logs:
Create that directory:
$ sudo mkdir /var/lib/kafka-logs
In the server.properties file again, point the advertised.host.name property to the public DNS (the address that others can connect to) of your instance:
In the kafka_2.9.2-0.8.1.1 directory, you can start the Kafka server:
$ sudo bin/kafka-server-start.sh config/server.properties &
Or, start it as a daemon:
$ sudo bin/kafka-server-start.sh -daemon config/server.properties
You can stop the server anytime:
$ sudo bin/kafka-server-stop.sh
Create a “topic” named test with a single partition and one replica:
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
See it at the command line:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
You can also start inputting command-line message, and read the output in another terminal. First start the producer to the topic test:
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Type a message and press ‘enter’. For instance, the message can be “hello”.
Open another terminal, and in the kafka_2.9.2-0.8.1.1 directory, start the consumer:
$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This will start consuming all messages since the beginning of the topic, but you can set the offset. Messages survive for 24 hours by default, but can be tuned to your desire.
You should see each message given from the producer:
Talking to Kafka with Python
Instead of the command line, maybe we’d like to push messages to Kafka programmatically, say, from a Python script.
First, get ‘pip’ so that you can install Python packages.
$ sudo apt-get install python-pip
Now install the kaka-python package:
$ sudo pip install kafka-python
While leaving your consumer terminal open, run the following Python script:
You should see the message show up in the consumer:
You’ve now successfully pushed messages to Kafka using a Python script. There are many more things you can do with the kafka-python package, and you are encouraged to look through the documentation on the kafka-python package Github page.
Summary
In this tutorial, you set up Zookeeper, Kafka, and used a Python script to push messages to Kafka. If you’ve seen any errors in this tutorial, please feel free to leave comments in the section below.