Deploying Elasticsearch with Chef Solo


This tutorial has been revised and thoroughly updated in December 2012.

Elasticsearch is trivially easy to install and run: you just a download and extract an archive and run a simple script.

It’s a long way from there to production, though. You have to customize the configuration. You want to install some plugins. You’d like to ensure Elasticsearch starts on system boot. You want to monitor that the Java process is running and does not eat too much resources… and many other things.

And then you have to repeat all the steps for each and every node in your cluster.

It would be cool if you could do this in an automated, mechanized manner, wouldn’t it?

As it happens, there’s lots of both open source and commercial or vendor-specific infrastructure provisioning tools available, which make tasks like these a snap.

We’ll focus on Chef, an open-source framework for infrastructure provisioning and management, maintained and supported by Opscode.

What Is Chef?

This article can’t be a full introduction into Chef. You’ll find many learning materials on the Chef wiki, but for our purposes, we’ll manage with some absolute minimum.

The first important thing to understand is that there are actually two different “chefs”:

  1. Chef Server, a central repository for all your infrastructure information and configuration data, which is used with the chef-client tool, and
  2. Chef Solo, which uses a standalone chef-solo tool, which does not need a Chef server.

In the context of this article, we’ll be using Chef Solo, which means we can’t use certain advanced features, such as full text search of our server attributes, executing the same command over SSH on multiple servers at once, or using a web-based GUI, but we’ll still be able to automate without breaking a sweat.

The essential concepts of Chef are the same between the “server” and “solo” variants.

The first of these is a node. A node is simply an abstract configuration for a server, reachable by SSH. You can picture node as a document containing some attributes, such as a name, the port number for an Apache web server or the list of software we want to have installed. A “physical representation” of the node is the virtual or physical server itself. (In Chef Server, things are a bit more complicated, but that doesn’t concern us right now.)

Every node can have one or multiple associated roles. A role joins together various configuration options for a certain type of machine: for instance, you can have a “webserver” role which would describe that you want to install an Apache webserver, a Varnish proxy, etc. A role contains recipes, or other roles. We won’t be using roles in this tutorial, though.

The most important concept is a cookbook, containing various recipes which describe, in detail, how we like our node to be set up. A recipe uses a variety of resources to describe these details, such as setting some default node properties, creating directories, creating configuration files with specific content, installing packages, downloading files from the internet, or executing arbitrary scripts and commands. Cookbooks hold together recipes, template files, Chef extensions, etc.

Have a look at the Elasticsearch cookbook we’ll be using in this tutorial, to get a sense of how cookbooks are organized and how do they work. The recipe is written in a simple Ruby-based domain specific language, and should be pretty understandable. Check out also the cookbook templates.

A recipe can also load additional data from data bags. Data bags are simple JSON documents, and can contain arbitrary information, such as user credentials, API tokens and other things not specific to a certain recipe. We won’t be using data bags in this tutorial, because we will store all information directly in the node configuration, as a JSON document.

Our goals

OK, we’re familiar with the essential parts of Chef. What are our goals, then? How would we like to have our Elasticsearch server to be set up?

In fact, we would like a number of things:

  • First of all, install a specific version of Elasticsearch on the node
  • Create a elasticsearch.yml file with custom configuration
  • Create a separate user to run Elasticsearch
  • Register a service to start Elasticsearch automatically on server boot
  • Increase the open files limit for the elasticsearch user
  • Configure the memory limits and other settings for the JVM
  • Install an Elasticsearch plugin
  • Monitor the Elasticsearch process and cluster health with Monit
  • Install the Nginx web server and use it as a proxy for Elasticsearch
  • Store user credentials for HTTP authentication with Nginx

And optionally:

  • Install the AWS Cloud plugin
  • Configure the AWS Cloud plugin with proper credentials to use the EC2 discovery
  • Create, format and mount an EBS volume to store our data
  • Use an existing EBS snapshot to create the volume from a data backup

As you can see, not a short list of tasks. If we would be doing them manually, we could easily spend whole afternoon with that. By using Chef, we should be done in under five minutes, once we get hold of it.

One important thing to emphasize is that we will use the Amazon EC2 service to create virtual servers to deploy Elasticsearch nodes, and we will use some features in Elasticsearch specific to Amazon Web Services (AWS).

You’re not limited to the EC2 platform in any way, though: any virtual or physical server accessible by SSH will be absolutely perfect for the purposes of this tutorial — you’ll just need to configure the node a little bit differently.

Preparation

Before we really start cooking, we must prepare all the tools and ingredients. Assuming EC2, we need to:

  • download and edit the scripts and configuration files used in this tutorial,
  • create a dedicated security group in AWS,
  • launch an instance which we’ll be provisioning via Chef,
  • download the SSH key used for accessing the instance.

We’ll begin by downloading the files need for this tutorial from the following gist: http://gist.github.com/2050769. We might as well do it with one command:

mkdir deploy-elasticsearch-with-chef && cd deploy-elasticsearch-with-chef
curl -# -L -k https://gist.github.com/2050769/download | tar xz --strip 1 -C .

Your current directory should now contain couple of files: let’s review them briefly:

The bootstrap.sh file is a generic Bash script, which we’ll use for basic setup of the machine (installing packages and Chef, downloading cookbooks, etc). The patches.sh script is used to fix some problems in community cookbooks (and will hopefully be removed from this tutorial soon). The solo.rb file contains configuration for Chef Solo. You don’t have to edit these files.

The node-example.js file contains an example configuration for the whole node: list of cookbooks we want to install, Elasticsearch configuration, list of plugins we want to install, your AWS credentials, username and password for the Nginx HTTP authentication, your e-mail address for Monit notifications, etc. We’ll start with a much smaller configuration, though.

Information for non-AWS environments

From now on, we will assume that we’re working with Amazon Elastic Cloud (EC2), provisioning an Amazon Linux operating system.
If you’d like to work in a different environment (a VPS on Rackspace or Linode,
local virtual machine in VirtualBox or custom hardware), and with a different operating system,
you’ll have to tweak couple of things.

First, you’ll have to make sure you can access the server via SSH and update the SSH_OPTIONS
environment variable according to your specific credentials.

Second, in the tutorial, we assume the server already has a working Java installation.
When it’s not the case, it must be installed as a part of the bootstrap process.

A bootstrap script and instructions for the Ubuntu operating system are available
in this gist.

On Amazon EC2, we’ll start by creating a dedicated security group for our Elasticsearch cluster in the AWS console. We will name the group elasticsearch-test.

Make sure the security group allows following connections:

  • Port 22 for SSH is open for external access (the default 0.0.0.0/0)
  • Port 8080 for the Nginx proxy is open for external access (the default 0.0.0.0/0)
  • Port 9300 for in-cluster communication is open for access only to servers running in the same security group (use the “Group ID” for this group, available on the “Details” tab, such as sg-1a23bcd)

The form for setting up the security group is pictured below.

Create Security Group

Important: Don’t forget to click “Apply Rule Changes” so the changes are, in fact, applied.

Now, we’ll launch a new server instance at EC:

  • Use a meaningful name for the instance. We will use test-elasticsearch-chef-1.
  • Create a new “Key Pair” for the instance, and download it immediately. We will be using a key named elasticsearch-test.
  • Use the Amazon Linux AMI (ami-1b814f72). Amazon Linux comes with Ruby and Java pre-installed.
  • Use the m1.large instance type. You may use the small or even the micro instance type, but the process would take a bit longer).
  • Use the security group created in the first step (elasticsearch-test).

The quicklaunch screen for creating the instance is pictured below:

Create Server Instance

Dont’ forget to download the newly created SSH key!

Don’t forget to click the “Edit Details” link on the next screen and set the proper instance type (“m1.large”) in the “Instance Details” pane, and the proper security group (“elasticsearch-test”) in the “Security Settings” pane.

Check Instance Details

Now you can click “Launch” to create and start your server.

While the server is being created in EC2, we will copy the SSH key downloaded from AWS console to the tmp/ directory of this project and make sure it has proper permissions:

mkdir -p ./tmp
cp ~/Downloads/elasticsearch-test.pem ./tmp/
chmod 600 ./tmp/elasticsearch-test.pem

Once our server is in the running state, copy its “Public DNS” value in the AWS console (eg. ec2-123-40-123-50.compute-1.amazonaws.com) to clipboard.

Bootstrapping the Machine

We can begin the bootstrap and install process now.

First, we’ll setup the connection details for convenient passing into scp and ssh commands:

HOST=<REPLACE WITH YOUR PUBLIC DNS>
SSH_OPTIONS="-o User=ec2-user -o IdentityFile=./tmp/elasticsearch-test.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"

We’ll check that we can connect to the machine via secure shell:

ssh $SSH_OPTIONS $HOST

You should be successfully logged into the machine; log out by pressing Ctrl+D.

If you have trouble in this step, double-check that the security group is properly set up, that you’re using the correct SSH key, etc.

We will now create a simple configuration JSON file for the machine:

echo '
{
  "run_list": [ "recipe[elasticsearch]" ],
  "elasticsearch" : {
    "cluster_name" : "elasticsearch_test_with_chef",
    "mlockall"     : false
  }
}
' > ./node.json

As you can see, we’re starting with a really simplified configuration: in the run_list property, we’re saying we want the Elasticsearch cookbook installed and that our cluster will be named elasticsearch_test_with_chef.

Let’s copy all the required files to the machine via secure copy:

scp $SSH_OPTIONS bootstrap.sh patches.sh node.json solo.rb $HOST:/tmp

We can bootstrap the machine, now – install neccessary packages and Chef, download cookbooks from the internet, etc:

time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/bootstrap.sh"

You’ll see lots of lines flying by in your terminal. We’re running the bootstrap script remotely over SSH; it should take about a minute.

We’re left with running the patches.sh script, which will fix some problems from the community cookbooks (create neccessary directories or users, etc):

time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/patches.sh"

Installing and Configuring Elasticsearch

OK – our server is now ready to be provisioned by Chef Solo. We will do it with the following command:

time ssh -t $SSH_OPTIONS $HOST "sudo chef-solo --node-name elasticsearch-test-1 -j /tmp/node.json"

This command will perform all the steps necessary for a bare bones Elasticsearch installation; it will create the directories at /usr/local/var/data/elasticsearch, create the elasticsearch user, download Elasticsearch and configure it, and finally, start the service.

Let’s have a look around on the server. Is Elasticsearch, in fact, running?

ssh -t $SSH_OPTIONS $HOST "curl localhost:9200"

We can also use the provided service script to check its status:

ssh -t $SSH_OPTIONS $HOST "sudo service elasticsearch status -v"

You can see that our cluster is named elasticsearch_test_with_chef, and that our node is named elasticsearch-test-1, and that the number of open files is 64000. In fact, let’s have a look at the elasticsearch.yml configuration file:

ssh -t $SSH_OPTIONS $HOST "cat /usr/local/etc/elasticsearch/elasticsearch.yml"

That’s all well and good – we have automated the Elasticsearch installation process, downloading the package, extracting it, registering it as a service, and properly configuring it. Not bad for a couple of commands and minutes of work.

But our goals are much more ambitious than that! We want monitoring, and the Nginx proxy, and proper AWS setup with EC2 discovery, and EBS-based persistence!

Seems like the right time to edit the node.js file has come.

The Full Installation

Let’s start with overwriting our current node.js with the provided example:

cp node-example.json node.json

We have to edit the file and replace the following properties:

  • elasticsearch.cloud.aws.access_key with your AWS Access Key
  • elasticsearch.cloud.aws.secret_key with your AWS Secret Key
  • monit.notify_email with your e-mail address

You’ll find the access and security keys on the “Security Credentials” page, accessible from the drop-down menu under your name in the top right corner.

All right, let’s upload the updated file to the machine:

scp $SSH_OPTIONS node.json $HOST:/tmp

And let’s run the provisioning script again:

time ssh -t $SSH_OPTIONS $HOST "sudo chef-solo --node-name elasticsearch-test-1 -j /tmp/node.json"

You should see, once again, many lines in your terminal flying by, installing Monit and Nginx, downloading the “AWS Cloud plugin” for Elasticsearch, configuring the Nginx proxy, and finally, restarting Elasticsearch itself.

Let’s try the new configuration by accessing the Nginx proxy running on port 8080:

curl http://USERNAME:PASSWORD@$HOST:8080

Pretty nice, right? Notice how trying to shut down the cluster via the proxy is 403 Forbidden, because Nginx is configured so:

curl -X POST http://USERNAME:PASSWORD@$HOST:8080/_shutdown

Anyway, we can index some documents through the proxy just fine:

curl -X POST "http://USERNAME:PASSWORD@$HOST:8080/test_chef_cookbook/document/1" -d '{"title" : "Test 1"}'
curl -X POST "http://USERNAME:PASSWORD@$HOST:8080/test_chef_cookbook/document/2" -d '{"title" : "Test 2"}'
curl -X POST "http://USERNAME:PASSWORD@$HOST:8080/test_chef_cookbook/document/3" -d '{"title" : "Test 3"}'
curl -X POST "http://USERNAME:PASSWORD@$HOST:8080/test_chef_cookbook/_refresh"

Let’s try to perform a search:

curl "http://USERNAME:PASSWORD@$HOST:8080/_search?pretty"

Perfect. We can also check that Elasticsearch is running smoothly via Monit:

ssh -t $SSH_OPTIONS $HOST "sudo monit reload && sudo monit status -v"

(If the Monit daemon is not running, start it with `sudo service monit start` first. Notice the daemon has a startup delay of 2 minutes by default.)

You can see that the Elasticsearch process is running and that the connection to port 9200 is online with all services. But what about the elasticsearch_cluster_health check? It says Connection failed. In fact, that’s expected:

ssh -t $SSH_OPTIONS $HOST "curl localhost:9200/_cluster/health?pretty"

Since we’re running with the default setting of one replica with only one Elasticsearch node, the cluster health is yellow: there’s no other server where the cluster can place the replica on.

Time to create another node in our cluster!

Adding Another Node

We’ll launch another node on EC2, using the “Launch More Like This” feature, available under the “Instance Actions” menu:

Launch More Like This

Name the second node test-elasticsearch-chef-2 and make sure that it runs under the elasticsearch-test security group.

Once the new instance is running, copy its “Public DNS” value. We will again store this value as the HOST environment variable:

HOST=<REPLACE WITH THE PUBLIC DNS FOR THE NEW SERVER>
SSH_OPTIONS="-o User=ec2-user -o IdentityFile=./tmp/elasticsearch-test.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"

Now, let’s run all the provisioning steps on the new machine, making it the elasticsearch-test-2 node:

scp $SSH_OPTIONS bootstrap.sh patches.sh node.json solo.rb $HOST:/tmp
time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/bootstrap.sh"
time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/patches.sh"
time ssh -t $SSH_OPTIONS $HOST "sudo chef-solo --node-name elasticsearch-test-2 -j /tmp/node.json"

The whole process should be finished under two minutes. Allow Elasticsearch couple of seconds to boot and check the cluster health again:

ssh -t $SSH_OPTIONS $HOST "curl localhost:9200/_cluster/health?pretty"

You may see the number of relocating_shards briefly increase, and then the cluster health should be green, and the number_of_nodes should be 2.

Because Chef also installed the Paramedic plugin, we can inspect the state of the cluster and our test index visually: just open the following URL in your browser:

open "http://USERNAME:PASSWORD@$HOST:8080/_plugin/paramedic/"

Not bad! We now have a fully operational, two-node Elasticsearch cluster, with convenient service scripts, an Nginx proxy for external access, a specific plugin installed and a Monit-based supervision.

Going Further

Time to play some tricks with out setup. The first thing we’re going to try is to kill the Elasticsearch process on one of the nodes and see how it’s being started again by Monit.

First, let’s kill the process:

ssh -t $SSH_OPTIONS $HOST "cat '/usr/local/var/run/elasticsearch/elasticsearch_test_2.pid' | xargs -0 sudo kill -9"

If we check the Elasticsearch service status, it should not be running:

ssh -t $SSH_OPTIONS $HOST "sudo service elasticsearch status"

If we check the status in Monit after a while, when the next Monit tick fires off, it should also report the process not running and complain about all sorts of other problems:

ssh -t $SSH_OPTIONS $HOST "sudo monit status"

If you configured the e-mail address for Monit properly, you’ll also receive a e-mail notification telling you about the incident (most probably in your Spam folder). (Provided you’ve not yet hit limits EC2 imposes on sending e-mail from instances.)

If we now repeatedly check the process status, it will go trough “Does Not Exist” and “Execution Failed” states, and after two or three minutes (based on the default Monit poll period), you should see the process in the running state again:

ssh -t $SSH_OPTIONS $HOST "sudo monit reload && sudo monit status"

So, our monitoring system seems to work quite well!

On EC2, we can try another trick. Since we’re using an EBS disk for persistence, we can create a snapshot and use it for recovery on a new, freshly built server.

So, let’s create the snapshot first: in the AWS console, we need to load the Volumes screen, and find the EBS volume named elasticsearch-test-1. Open the Actions drop-down menu, and choose “Create Snapshot”:

Create EBS Snapshot

Name the snapshot elasticsearch-1 and switch to the Snapshots screen via the left menu.

Once the snapshot is completed, copy it’s ID (something like snap-12ab34567).

Now you can terminate both instances in the AWS console and create a new, fresh instance again. Don’t forget to use the correct security group and SSH key.

Locate the elasticsearch.data.devices./dev/sda2.ebs part of the node.json file, and add the snapshot ID into it. The configuration should look like this:

"ebs" : {
  "size"                  : 25,
  "delete_on_termination" : true,
  "type"                  : "io1",
  "iops"                  : 100,
  "snapshot_id"           : "snap-12ab34567"
}

Once the fresh instance is running, copy the “Public DNS” setting of the new server, and repeat the whole provisioning process, which by now should be almost second nature:

HOST=<REPLACE WITH THE PUBLIC DNS VALUE>
scp $SSH_OPTIONS bootstrap.sh patches.sh node.json solo.rb $HOST:/tmp
time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/bootstrap.sh"
time ssh -t $SSH_OPTIONS $HOST "sudo bash /tmp/patches.sh"
time ssh -t $SSH_OPTIONS $HOST "sudo chef-solo --node-name elasticsearch-test-fresh -j /tmp/node.json"

Allow couple of seconds for Elasticsearch to boot and load the data, and perform the search on the freshly provisioned server:

curl "http://USERNAME:PASSWORD@$HOST:8080/_search?pretty"

You should now see the three documents, which we inserted long ago, on the now destroyed servers, displayed in your terminal; the new EBS, mounted at /usr/local/var/data/elasticsearch/disk1, contains all the data from previous servers and Elasticsearch happily loaded it all. With the recently announced EBS copy snapshot API, this makes disaster recovery or migration during Amazon outages just a bit easier.

On a similar note, we can try another, potentially live-saving trick: adding disk space to the node. Usually, running out of disk space is hard to combat without noticeable service disruption. In Amazon, you can snapshot the volume, create a bigger EBS from the snapshot, mount it, etc – but it takes time.

Because Elasticsearch can work with multiple data locations, the elasticsearch.data.devices and elasticsearch.data_path configurations can contain multiple values, and allows us to add a disk to the recently created node. Here’s what we need to do:

First, we need to update the the elasticsearch.data_path configuration in the node.json file like this:

"data_path" : ["/usr/local/var/data/elasticsearch/disk1","/usr/local/var/data/elasticsearch/disk2"]

Then, we will add a new device to the elasticsearch.data.devices configuration – let’s simply copy over the configuration for /dev/sda2 (without the snapshot_id key):

"data" : {
  "devices" : {
    "/dev/sda2" : {
      // ...
    },
    "/dev/sda3" : {
      // ...
    }
  }
}

And, finally, let’s upload the updated configuration and re-run the provisioning code:

scp $SSH_OPTIONS node.json $HOST:/tmp
time ssh -t $SSH_OPTIONS $HOST "sudo chef-solo --node-name elasticsearch-test-fresh -j /tmp/node.json"

Allow some time for Elasticsearch to restart, and check the data locations and some statistics with the Nodes Stats API:

curl "http://USERNAME:PASSWORD@$HOST:8080/_cluster/nodes/stats?pretty&clear&fs"

We can also check the disks stats directly on the system:

time ssh -t $SSH_OPTIONS $HOST "df -h"

Conclusions

Congratulations! By following this tutorial, you were able to:

  • Establish a repeatable and reliable process for setting up an Elasticsearch server
  • Bootstrap, install and configure production-ready Elasticsearch cluster without manual intervention
  • Summarize the whole server configuration in the node.json file

The first thing to take from this exercise is, of course, that automation beats manual labor every single time, and by a long shot. When you’re provisioning a Elasticsearch server for the third time, it’s so painless you don’t even notice it.

When working with a provisioning tool such as Chef, resist the urge to tinker with the system manually, editing configuration files in vim and installing software manually — except in clearly determined cases when you’re trying something out.

It is of course faster to make a small change directly on the system itself, instead of performing all the provisioning steps. But the whole point of Chef is to make your system predictable, to summarize everything what’s needed for its operation in one place, and to eradicate manual intervention.

Notice how we added lots of configuration details in “The Full Installation” chapter, uploaded the updated node.json file to the system, and then just ran the same command as previously. Chef discovered it needs to update the elasticsearch.yml file, did so, and restarted the Elasticsearch process to pick up the new configuration. This pattern of “update, sync, run, repeat” is very powerful, because it takes manual fiddling and “hacking” out of the process; once you got it right, it will be right for every future system provisioned from the same code.

The same applies for changes in the cookbook: when the Elasticsearch cookbook is updated at GitHub, the bootstrap script will fetch the changes, and the next chef-solo run will reflect them on the system.

The second thing to notice is how powerful a tool like Chef really is. We didn’t paid too much attention to Chef specifics, but let’s have a look at a small illustration. You should notice the memory settings for the JVM in the elasticsearch-env.sh file:

ssh -t $SSH_OPTIONS $HOST "cat /usr/local/etc/elasticsearch/elasticsearch-env.sh"

Where does the value Xmx4982m, or nearly 5 GB, come from? How does Chef know this value? Well, the Ruby code in the Elasticsearch cookbook did the computation, based on the total available memory on the EC2 large instance type (7.5 GB):

allocated_memory = "#{(node.memory.total.to_i * 0.6 ).floor / 1024}m"
default.elasticsearch[:allocated_memory] = allocated_memory

Thanks to the Ohai tool, Chef knows many of these “automatic attributes” of the node, and can take them into consideration when provisioning the server. The Elasticsearch cookbook we have worked with makes use of these attribues in other places, for example when setting the node.name.

The final conclusion of this experiment is how open the whole Chef ecosystem is.

We didn’t have to use the hosted Chef Server product to use the other parts of its architecture. The concepts and principles are the same between Chef Solo and Chef Server, and allow you to reuse the most important part: the cookbooks.

Most of the cookbooks are available on the Opscode community page under permissive licenses. For a complex infrastructure, you’re most likely to adapt cookbooks to your needs, adjusting their recipes or templates. As we have seen, it’s trivial to mix “vendor” cookbooks with our own cookbooks: we have downloaded the “stock” Monit and Nginx cookbooks from the internet to the /var/chef-solo/site-cookbooks/ directory, while cloning a custom Elasticsearch cookbook to the /var/chef-solo/cookbooks directory.

The Chef domain specific language uses Ruby, a popular and expressive programming language, which makes adjusting and customizing cookbooks very easy. We can fork most cookbooks at Github and participate in the growing common knowledge of efficient infrastructure provisioning.

Enjoy your cooking!


Oh, and one more thing. In the downloaded gist, here’s the code
to create the server and provision it automatically by running a single command.

If you have a working Ruby and Rubygems installation on your machine, you can try it out. First, install the
Bundler gem and then all required gems:

gem install bundler
bundle install

Then run this command:

time bundle exec rake create NAME=elasticsearch-from-cli FLAVOR=m1.large

It will load AWS credentials from your `node.json` file, create the instance in EC2, bootstrap and configure it,
and perform the Elasticsearch installation and configuration we just did manually. Open the URL printed at the end!