Apache Kafka is a distributed message broker designed to handle large volumes of real-time data efficiently. Unlike traditional brokers like ActiveMQ and RabbitMQ, Kafka runs as a cluster of one or more servers. This makes it highly scalable, and due to this distributed nature, it has inbuilt fault tolerance while delivering higher throughput when compared to its counterparts. 

But, tackling the challenges while installing Kafka is not easy. This article will walk you through the steps to Install Kafka on Ubuntu 20.04 using simple 8 steps. It will also provide you with a brief introduction to Kafka install Ubuntu 20.04. Let’s get started.

How to Install Kafka on Ubuntu 20.04

To begin Kafka installation on Ubuntu, ensure you have the necessary dependencies installed:

  • A server running Ubuntu 20.04 with at least 4 GB of RAM and a non-root user with sudo access. If you do not already have a non-root user, follow our Initial Server Setup tutorial to set it up. Installations with fewer than 4GB of RAM may cause the Kafka service to fail.

OpenJDK 11 is installed on your server. To install this version, refer to our post on How to Install Java using APT on Ubuntu 20.04. Kafka is written in Java and so requires a JVM.

Let’s try to understand the procedure to install Kafka on Ubuntu. Below are the steps you can follow to install Kafka on Ubuntu:

Simplify Integration Using Hevo’s No-code Data Pipeline

What if there is already a platform that uses Kafka and makes the replication so easy for you? Hevo Data helps you directly transfer data from Kafka and 150+ data sources (including 40+ free sources) to Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Its fault-tolerant architecture ensures that the data is replicated in real-time and securely with zero data loss.

Sign up here for a 14-Day Free Trial!

Step 1: Install Java and Bookeeper

Kafka is written in Java and Scala and requires jre 1.7 and above to run it. In this step, you need to ensure Java is installed.

sudo apt-get update
sudo apt-get install default-jre

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses Zookeeper for maintaining the heartbeats of its nodes, maintaining configuration, and most importantly to elect leaders.

sudo apt-get install zookeeperd

You will now need to check if Zookeeper is alive and if it’s OK 😛

telnet localhost 2181

at Telnet prompt, You will have to enter

ruok

(are you okay) if it’s all okay it will end the telnet session and reply with

imok

Step 2: Create a Service User for Kafka

As Kafka is a network application creating a non-root sudo user specifically for Kafka minimizes the risk if the machine is to be compromised.

$ sudo adduser kafka

Follow the Tabs and set the password to create Kafka User. Now, you have to add the User to the Sudo Group, using the following command:

$ sudo adduser kafka sudo

Now, your User is ready, you need to log in using, the following command:

$ su -l kafka

Step 3: Download Apache Kafka

Now, you need to download and extract Kafka binaries in your Kafka user’s home directory. You can create your directory using the following command:

$ mkdir ~/Downloads

You need to download the Kafka binaries using Curl:

$ curl "https://downloads.apache.org/kafka/2.6.2/kafka_2.13-2.6.2.tgz" -o ~/Downloads/kafka.tgz

Create a new directory called Kafka and change your path to this directory to make it your base directory.

$ mkdir ~/kafka && cd ~/kafka

Now simply extract the archive you have downloaded using the following command:

$ tar -xvzf ~/Downloads/kafka.tgz --strip 1

–strip 1 is used to ensure that the archived data is extracted in ~/kafka/.

Step 4: Configuring Kafka Server

The default behavior of Kafka prevents you from deleting a topic. Messages can be published to a Kafka topic, which is a category, group, or feed name. You must edit the configuration file to change this.

The server.properties file specifies Kafka’s configuration options. Use nano or your favorite editor to open this file:

$ nano ~/kafka/config/server.properties

Add a setting that allows us to delete Kafka topics first. Add the following to the file’s bottom:

delete.topic.enable = true

Now change the directory for storing logs:

log.dirs=/home/kafka/logs

Now you need to Save and Close the file. The next step is to set up Systemd Unit Files.

Step 5: Setting Up Kafka Systemd Unit Files

In this step, you need to create systemd unit files for the Kafka and Zookeeper service. This will help to manage Kafka services to start/stop using the systemctl command.

Create systemd unit file for Zookeeper with below command:

$ sudo nano /etc/systemd/system/zookeeper.service

Next, you need to add the below content:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save this file and then close it. Then you need to create a Kafka systemd unit file using the following command snippet:

$ sudo nano /etc/systemd/system/kafka.service

Now, you need to enter the following unit definition into the file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This unit file is dependent on zookeeper.service, as specified in the [Unit] section. This will ensure that zookeeper is started when the Kafka service is launched.
The [Service] line specifies that systemd should start and stop the service using the kafka-server-start.sh and Kafka-server-stop.sh shell files. It also indicates that if Kafka exits abnormally, it should be restarted.
After you’ve defined the units, use the following command to start Kafka:

$ sudo systemctl start kafka

Check the Kafka unit’s journal logs to see if the server has started successfully:

$ sudo systemctl status kafka

Output:

kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-10 00:09:38 UTC; 1min 58s ago
   Main PID: 55828 (sh)
      Tasks: 67 (limit: 4683)
     Memory: 315.8M
     CGroup: /system.slice/kafka.service
             ├─55828 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1
             └─55829 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=>

Feb 10 00:09:38 cart-67461-1 systemd[1]: Started kafka.service.

On port 9092, you now have a Kafka server listening.

The Kafka service has been begun. But if you rebooted your server, Kafka would not restart automatically. To enable the Kafka service on server boot, run the following commands:

$ sudo systemctl enable zookeeper
$ sudo systemctl enable kafka

You have successfully done the setup and installation of the Kafka server.

Step 6: Testing installation

In this stage, you’ll put your Kafka setup to the test. To ensure that the Kafka server is functioning properly, you will publish and consume a “Hello World” message.

In order to publish messages in Kafka, you must first:

  • A producer who allows records and data to be published to topics.
  • A person who reads communications and data from different themes.

To get started, make a new topic called TutorialTopic:

$ ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

The kafka-console-producer.sh script can be used to build a producer from the command line. As arguments, it expects the hostname, port, and topic of the Kafka server.

The string “Hello, World” should now be published to the TutorialTopic topic:

$ echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

Using the Kafka-console-consumer.sh script, establish a Kafka consumer. As parameters, it requests the ZooKeeper server’s hostname and port, as well as a topic name.

Messages from TutorialTopic are consumed by the command below. Note the usage of the —from-beginning flag, which permits messages published before the consumer was launched to be consumed:

$ ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

Hello, World will appear in your terminal if there are no configuration issues:

Hello, World

The script will keep running while it waits for further messages to be published. Open a new terminal window and log into your server to try this.
Start a producer in this new terminal to send out another message:

$ echo "Hello World from Sammy at Hevo Data!" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

This message will appear in the consumer’s output:

Hello, World
Hello World from Sammy at Hevo Data!

To stop the consumer script, press CTRL+C once you’ve finished testing.
On Ubuntu 20.04, you’ve now installed and set up a Kafka server.

You’ll do a few fast operations to tighten the security of your Kafka server in the next phase.

Step 7: Hardening Kafka Server

You can now delete the Kafka user’s admin credentials after your installation is complete. Log out and back in as any other non-root sudo user before proceeding. Type exit if you’re still in the same shell session as when you started this tutorial.

Remove the Kafka user from the sudo group:

$ sudo deluser kafka sudo

Lock the Kafka user’s password with the passwd command to strengthen the security of your Kafka server even more. This ensures that no one may use this account to log into the server directly:

$ sudo passwd kafka -l

Only root or a sudo user can log in as Kafka at this time by entering the following command:

$ sudo su - kafka

If you want to unlock it in the future, use passwd with the -u option:

$ sudo passwd kafka -u

You’ve now successfully restricted the admin capabilities of the Kafka user. You can either go to the next optional step, which will add KafkaT to your system, to start using Kafka.

Step 8: Installing KafkaT (Optional)

Airbnb created a tool called KafkaT. It allows you to view information about your Kafka cluster and execute administrative activities directly from the command line. You will, however, need Ruby to use it because it is a Ruby gem. To build the other gems that KafkaT relies on, you’ll also need the build-essential package. Using apt, install them:

$ sudo apt install ruby ruby-dev build-essential

The gem command can now be used to install KafkaT:

$ sudo CFLAGS=-Wno-error=format-overflow gem install kafkat

To suppress Zookeeper’s warnings and problems during the kafkat installation process, the “Wno-error=format-overflow” compiler parameter is required.

The configuration file used by KafkaT to determine the installation and log folders of your Kafka server is.kafkatcfg. It should also include a KafkaT entry that points to your ZooKeeper instance.

Make a new file with the extension .kafkatcfg:

$ nano ~/.kafkatcfg

To specify the required information about your Kafka server and Zookeeper instance, add the following lines:

{
  "kafka_path": "~/kafka",
  "log_path": "/home/kafka/logs",
  "zk_path": "localhost:2181"
}

You are now ready to use KafkaT. For a start, here’s how you would use it to view details about all Kafka partitions:

$ kafkat partitions

You will see the following output:

[DEPRECATION] The trollop gem has been renamed to optimist and will no longer be supported. Please switch to optimist as soon as possible.
/var/lib/gems/2.7.0/gems/json-1.8.6/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
...
Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         0             0         [0]             [0]
__consumer_offsets    0               0           [0]                           [0]
...
...

You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

To learn more about KafkaT, refer to its GitHub repository.

Conclusion

This article gave you a comprehensive guide to Apache Kafka and Ubuntu 20.04. You also got to know about the steps you can follow to Install Kafka on Ubuntu. Extracting complex data from a diverse set of data sources such as Apache Kafka can be a challenging task, and this is where Hevo saves the day! Looking to install Kafka on Mac instead? Read through this blog for all the information you need.

Extracting complex data from a diverse set of data sources such as Apache Kafka can be a challenging task, and this is where Hevo saves the day!

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations such as data warehouses but also transform & enrich your data, & make it analysis-ready.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable Hevo pricing that will help you choose the right plan for your business needs.

Hope this guide has successfully helped you install kafka on Ubuntu 20.04. Do let me know in the comments if you face any difficulty.

mm
Software Engineer, Hevo Data

With around a decade of experience, Sarad has designed and developed fundamental components of Hevo. His expertise lies in building lean solutions for various software problems, mentoring fellow engineers and exploring new technologies

No-code Data Pipeline for your Data Warehouse

Get Started with Hevo