elasticsearch on EC2


Overview

There are usually some differences between running an application locally on a single machine, on a cluster of servers on a private network, or in the cloud. Most of the differences have to do with network restrictions placed on virtual machines in the cloud, or the heterogeneous nature of nodes in the cloud. elasticsearch is specifically designed to work well on Amazon’s EC2 platform and this tutorial should get you started.

elasticsearch had originally been developed to work with multiple cloud platforms by leveraging the jclouds project. However after version 0.8.0, the development effort has been focused on the use of Amazon Web Services (AWS) and running elasticsearch on Amazon’s EC2 platform. To this end, elasticsearch will leverage the AWS API for two purposes, discovery and gateway persistence.

Discovery

Discovery is the process of locating other nodes in the cluster and elasticsearch will either perform dynamic discovery (Multicast), declarative discovery (Unicast), or a hybrid of the two (EC2) which uses specific EC2 apis instead of multicast to find other potential cluster members..

Declarative Discovery

If you have assigned static, known IP addresses to all of your virtual machines that comprise your cluster, you can configure elasticsearch with a list of these addresses in its configuration file. Of course, you don’t have to be using any EC2-specific services to accomplish this type of discovery. You are simply using EC2 in much the same way you might use a common VPS or
dedicated hosting solution, and you will configure elasticsearch using the unicast discovery method.

Dynamic Discovery

Because you are running your cluster on EC2, multicast discovery is not an option. Most hosting services do not allow the multicast protocol to be used, and Amazon is no exception. This is why the EC2 discovery option is important.

Amazon EC2 is a system built with failure in mind. At any moment Amazon may decide that your running node instance has to be shut down, and you have to plan for this failure condition. Depending on how you have EC2 configured, it may spin up a new instance to replace the one it terminated. Fortunately, elasticsearch has been developed to be resilient in these common use cases.

With the proper libraries installed and the appropriate configuration, your elasticsearch nodes will be able to discover each other in a fully dynamic manner.

Gateway

The other feature you may want to take advantage of using the EC2 platform is the S3 Gateway. The elasticsearch gateway is responsible for long term persistence of the cluster meta data and indexed data across all shards in the cluster. If something happens where an index loses its local copy of data because the nodes containing its primary shard and all of its replica shards were terminated, the data could still be restored from the gateway when the shards were reallocated.

This is a very important concept, especially in a cloud-based environment. Out of the box, elasticsearch uses a local gateway. This means that the local / mounted hard disk stores a copy of the cluster metadata, and the local index data is used during recovery of an index. This works very well for nodes that are tied to physical machines, or more importantly, physical hard disks. Things can be much different in the cloud.

Amazon EC2 has the concept of a permanent block storage mechanism called Elastic Block Storage (EBS). An EBS instance has a lifecycle independent of the node on which it is mounted. This means that if a node dies, you can instantiate a new one and mount the former node’s storage volume. This type of setup is no different than if you were maintaining your own server farm, so you can continue to use the local gateway as long as you manage your EBS volumes carefully.

Some people may have a need to run elasticsearch on EC2 instances without the benefit of an EBS volume For these cases, there is the S3 Gateway. This gateway configuration creates multiple (five by default) connections
between each node in the cluster and Amazon’s S3 service. S3 is an extremely simple key/value storage system which stores all values in a flat format into a storage location referred to as a bucket. elasticsearch asynchronously writes changes in cluster state and index data to the S3 service.

If you were using pure memory storage or local storage without an EBS volume, and you terminated all of your cluster nodes (or if Amazon decides to do that for you), a restart of your nodes would begin to recover cluster state from S3. Eventually, your index data would also be recovered. This process is much slower than using a local index. There are no formal benchmarks on how long it takes to recover a GB of data from an S3 gateway, but it would be a welcome addition to this document. Also, the S3 gateway does come with an overhead of having to persist the delta changes happening to the index to S3, which incurs network overhead.

So, which one to use? The local gateway with EBS provides better performance compared to the S3 gateway, both in terms of less network used while the cluster is up and faster full cluster restarts. The S3 gateway allows a more persistent backup mechanism, and is simpler to start with compared to setting up EBS. More users of elasticsearch use EBS.

Adding the EC2 Plugin

The libraries necessary to communicate with Amazon’s EC2 APIs are installed in different ways depending on your environment. If you run EC2 from the command line or from a shell script, you can install the cloud
libraries using the plugin script explained here.

If you are using Maven and embedding the elasticsearch libraries in a web application or a GUI client, you would add the cloud dependency to the elasticsearch dependency you already are using.

    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>${elasticsearch-version}</version>
    </dependency>

    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-cloud-aws</artifactId>
        <version>${elasticsearch-version}</version>
    </dependency>

Note: The elasticsearch Maven artifacts are stored on Sonatype’s repository , not Maven central. These include also the aws cloud plugin.

Creating an EC2 instance

This document will deal with only those config issues related to EC2. It also assumes you have some working knowledge of AWS including EC2 and S3, the concept of access keys, creating instances, and SSH. If you need any help on these things, consult the AWS documentation .

elasticsearch is pretty simple to install considering you just download and extract it into a directory. When running on an EC2 instance, there are a few additional configuration changes you may want to make. Navigate to
the AWS console , and select the Launch Instance option.

Create a new instance of the Basic 32-bit Amazon Linux AMI. You can choose the 64-bit version if you would rather play around with that. The next step is to select the instance type. I chose Micro, but you can choose whatever you would like to pay for. For purposes of this demo, set your region to US East (Virgina). The next page of instance details can be configured as you see fit. I chose the defaults for this demo. The third page of instance details deals with assigning key/value tags. I added a tag with the key ‘stage’ and the value ‘dev’ for use later.

The next section of the AWS instance creation requires the generation of a key pair. You cannot recover the public key once it has been downloaded, so put it in a safe place.

The next section involves the creation of a security group. This is where you control your AWS firewall, and it is important that we open some ports for the EC2 elasticsearch instances to communicate with each other. Create a new security group which opens ports 9300 (elasticsearch transport), 22 (SSH), and port 9200 (for HTTP testing). You can change your security group rules at any time.

The next screen will review your settings and allow you to launch your new instance. Make it so.

If all went well, your instance should be up and running in a minute or so. Now we will SSH into it to make sure all is well. In order to find the DNS name of your running instance, highlight the instance in the console. At the bottom of the screen are the properties for the instance. You want the address listed after Public DNS. The default user is ec2-user.

    $ ssh -i es-demo.pem ec2-user@ec2-50-19-128-95.compute-1.amazonaws.com
    The authenticity of host 'ec2-50-19-128-95.compute-1.amazonaws.com (50.19.128.95)'
    can't be established.
    RSA key fingerprint is ef:19:54:bf:0b:59:6f:15:29:d8:12:95:fd:8b:89:89.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'ec2-50-19-128-95.compute-1.amazonaws.com,50.19.128.95'
    (RSA) to the list of known hosts.

           __|  __|_)  Amazon Linux AMI
           _|  (     /    Beta
          ___|___|___|

    See /usr/share/doc/system-release-2011.02 for latest release notes. :-) 

    [ec2-user@domU-12-31-39-02-6D-65 ~]$

Installation of elasticsearch

At the time this was written, the latest release of elasticsearch was 0.17.6. Let’s get it onto our elasticsearch instance.

    $ wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.19.0.zip
    $ sudo unzip elasticsearch-0.19.0.zip -d /usr/local/elasticsearch
    $ cd /usr/local/elasticsearch/elasticsearch-0.19.0
    $ sudo bin/plugin -install elasticsearch/elasticsearch-cloud-aws/1.4.0

If all goes well, elasticsearch is now installed and ready to be
configured.

Note: There are a few other things you should be doing when installing elasticsearch to run locally on a server, especially increasing the number of file handles, disabling swap, creating a proper log directory, and creating an elasticsearch user account and the accompanying security chmods. This is not the focus of this document.

The Simplest Configuration

Like all settings for elasticsearch, there are multiple ways to configure the nodes. Since we are running the standard download of elasticsearch, we will be editing the elasticsearch.yml file in the config directory. By default, every setting is commented out. We will give our cluster a unique name, and add a few settings to enable EC2 discovery and the S3 Gateway.

    cluster.name: es-demo
    cloud:
        aws:
            access_key: <lookup in AWS>
            secret_key: <lookup in AWS>
    discovery:
        type: ec2
    gateway:
        type: s3
        s3:
            bucket: <create bucket in S3>

Note: Your access key id and secret access key are found in the Account section of the AWS website. You will also have to create an S3 bucket for the S3 gateway to function (use the console ). I used the name es-demo, so you will have to pick something else. An interesting side note is that bucket names are not assigned per account; they are global to the entire S3 platform. Note, this only applies if you plan to use the s3 gateway.

Depending on your instance type, it is also important to configure the memory used by the elasticsearch instance. This can be done by setting ES_MIN_MEM and ES_MAX_MEM environment variables. It is recommended to set them to the same value, with a good default value of about half of the instance memory. For example, set them to 8gb on an 16gb instance. Another recommended setting is to set bootstrap.mlockall to true in order to make sure the elasticsearch process will not swap. Also it might be possible that you need to specify the server jvm via the additional -server parameter for the jvm.

Also, the gateway.recover_after_... settings are important to set, especially when using the S3 gateway in order to try and reuse as much local data as possible on full cluster restart.

When you have problems with discovery of the instances specify a region in cloud.aws.region or use cloud.aws.ec2.endpoint if your region is not supported. E.g. us-east-1

Let’s go ahead and increase our log levels so we can see what is occurring at startup. Edit the config/logging.yml file and make sure these lines are present.

    gateway: DEBUG
    org.apache: WARN
    discovery: TRACE

Running the Server

Once configured, we can start the elasticsearch server from the root installation directory using this command.

    sudo bin/elasticsearch -f

On the screen you should see a bunch of log statements with many pertaining to the state of the discovery and gateway services. Somewhere near the end of this log you should see the successful start of the elasticsearch
node represented by the “started” message.

    [node] {elasticsearch/0.17.6}[20085]: started

Discovery

Let’s take a look at a few key statements that indicate things are working well.

Indicates that the cloud plugin was properly installed.

    [plugins] loaded [cloud-aws], sites []

The amount of time the discovery will wait to hear a ping response from each node in the cluster.

    [discovery.ec2] using ping.timeout [3s]

A regurgitation of our ec2 settings. This is a bare bones example.

    [discovery.ec2] using host_type [PRIVATE_IP],
                    tags [{}],
                    groups [[]] with any_group [true],
                    availability_zones [[]]

This is the value of the initial_state_timeout property. I couldn’t find much documentation regarding this setting, but it appears to be the total amount of time discovery will look for other nodes in the cluster before giving up and declaring the current node master.

    [discovery] waiting for 30s for the initial state to be set by the discovery

Now we are getting to the good stuff. elasticsearch is about to make an EC2 API call (DescribeInstances) to get a list of all running EC2 instances running under your EC2 account. Note: elasticsearch could choose to filter this list be tags and groups at the time of querying EC2. My guess is elasticsearch has chosen to get all instances, then filter the list manually because additional logging can be provided to let the developer know why things like filtering are occurring.

    [discovery.ec2] building dynamic unicast discovery nodes...

The private ip address of a particular EC2 instance has been discovered matching our group and tag restrictions. If this is the only EC2 instance running on your account, then we have discovered ourself.

    
    [discovery.ec2] adding i-bf072ede, address 10.248.114.147, 
                    transport_address inet[/10.248.114.147:9300]

elasticsearch now switches to unicast discovery mode in an attempt to make a connection to the discovered server on port 9300.

    [discovery.ec2] using dynamic discovery nodes [[#cloud-i-bf072ede-0][inet[/10.248.114.147:9300]]]
    [discovery.zen.ping.unicast] [1] connecting to [Donald Pierce]
                                 [vhhIkdAFQwKP7hWmojytAQ][inet[/10.248.114.147:9300]]
    [discovery.zen.ping.unicast] [1] received response from [Donald Pierce]
                                 [vhhIkdAFQwKP7hWmojytAQ][inet[/10.248.114.147:9300]]:
                                 [ ping_response{
                                     target [
                                        [Donald Pierce]
                                        [vhhIkdAFQwKP7hWmojytAQ]
                                        [inet[/10.248.114.147:9300]]
                                     ],
                                     master [null],
                                     cluster_name [es-demo]
                                   },
                                   ping_response{
                                     target [
                                        [Donald Pierce]
                                        [vhhIkdAFQwKP7hWmojytAQ]
                                        [inet[/10.248.114.147:9300]]
                                     ],
                                     master [null],
                                     cluster_name [es-demo]
                                   }
                                 ]

We have successfully connected to the node (ourself in this example), and received a response from the node named Donald Pierce. Note, that this node had not yet discovered any master nodes in the cluster.

    [discovery.ec2] ping responses: {none}

This doesn’t actually mean there were no ping responses. It should probably read
ping responses: [no masters]. This process repeats three times (discovery.zen.fd.ping_retries) until it is clear there is no master yet in the cluster.

Our new node is elected as the master.

    [cluster.service] new_master [Donald Pierce]
                      [vhhIkdAFQwKP7hWmojytAQ][inet[/10.248.114.147:9300]],
                      reason: zen-disco-join (elected_as_master)

S3 Gateway

Next, let’s look at the log statements that will give us some indication that the S3 gateway is working as expected. The following logs show what is happening the first time you start up your node with no elasticsearch data in your S3 bucket.

These are our barebones settings for controlling the S3 gateway data.

    [gateway.s3] using bucket [es-demo],
                 region [null],
                 chunk_size [100mb],
                 concurrent_streams [5]

elasticsearch has discovered that there is no content in the S3 bucket.

    [gateway.s3] Latest metadata found at index [-1]
    [gateway.s3] reading state from gateway [snip] ...
    [gateway.s3] read state from gateway [snip], took 24ms
    [gateway.s3] no state read from gateway

So, it writes the latest index information and cluster config into an S3 bucket.

    [gateway.s3] writing to gateway [snip]...

Curious as to what it stores? Well, it writes the following key and value:

        key: /es-demo/metadata/metadata-0
        value: ZV +{"meta-data":{"templates":{},"indices":{}}}

The first four bytes stored are 5A 56 00 00

    [gateway] recovered [0] indices into cluster_state

This indicates that number of indices that has been recovered from the metadata stored in the S3 gateway and applied to the currently
running cluster.

    [gateway.s3] wrote to gateway [snip], took 225ms

If you weren’t fortunate enough to see these successful log messages, please help us build out the Troubleshooting portion of the document by
adding your issue.

Test the Single Node

Before moving on to the cool stuff, let’s make sure we can index some data. We will go with the most simple example.

    curl -XPUT 'http://<public EC2 dns>:9200/twitter/tweet/1' -d '{
      "user": "kimchy",
      "post_date" : "2011-08-18T16:20:00",
      "message" : "trying out Elastic Search"
    }'

I would like you to try running this curl request from your local computer. You can use a Rest client if you don’t have curl, and there is a cool
Chrome extension to do this as well. The point to make is that we can directly query elasticsearch in this manner from anywhere on the Internet. This is because we opened port 9200 on the EC2 security group. You obviously would not want to do this if you don’t want others to mess with your data.

If all went well, you should see a success response.

    {"ok":true,"_index":"twitter","_type":"tweet","_id":"1","_version":1}

Even more interesting is to check your S3 bucket. You should now see there is a new “directory” in your bucket named indices, and also your metadata folder has new metadata. elasticsearch has updated the gateway with the necessary information to recreate your entire cluster. But still, it is a cluster of a single node. Not terribly exciting, so let’s do something about it.

Clustering

We really haven’t done anything interesting yet. We still need to spin up another instance on EC2 and make sure they can successfully cluster with each other.

Since we already have a Linux VM running, you can launch another instance of this AMI by going to the EC2 console, select the running AMI and choose Launch more like this from the action combobox.

Unfortunately, this is a version of the AMI before we installed and configured elasticsearch. You will have to repeat the installation and config steps on this new instance. I’ll wait while you make it so. Ready?

So now our second node should be configured exactly the same as our first node, and our first node should still be running. Now we are ready to start elasticsearch on our second node.

    $ sudo bin/elasticsearch -f

Let’s take a look at the log file for this new node.

Both nodes are pointed to the same S3 bucket. This is where state can be restored if the cluster is fubar.

    [gateway.s3] using bucket [es-demo], region [null],
                 chunk_size [100mb],concurrent_streams [5]

Just an acknowledgement that there is some index data stored in the gateway. Nothing is restored yet.

    [gateway.s3] Latest metadata found at index [2]

elasticsearch is going to use the AWS API to query for EC2
nodes under this account.

    [discovery.ec2] building dynamic unicast discovery nodes...

elasticsearch found two EC2 instances running (or in pending state). It will attempt to connect to these to see if they match this node’s group, tag and cluster name settings. Of course, one of these nodes is our current node.

    [discovery.ec2] adding i-bf072ede, address 10.248.114.147,
                    transport_address inet[/10.248.114.147:9300]
    [discovery.ec2] adding i-aba883ca, address 10.220.137.71,
                    transport_address inet[/10.220.137.71:9300]
    [discovery.ec2] using dynamic discovery nodes
                    [[#cloud-i-bf072ede-0][inet[/10.248.114.147:9300]],
                     [#cloud-i-aba883ca-0][inet[/10.220.137.71:9300]]]

elasticsearch asynchronously attempts to connect to each node. Note that 10.246.114.147 is the first node, and 10.220.137.71 is the second node (this node) which doesn’t know anything yet.

    [discovery.zen.ping.unicast  ] [1] connecting to
                                   [#cloud-i-bf072ede-0][inet[/10.248.114.147:9300]]
    [discovery.zen.ping.unicast  ] [1] connecting to [All-American]
                                   [VdpvmA8ARvqeBHm_yacQOw][inet[/10.220.137.71:9300]]

The first node responded with the important information. Note what I don’t show the response from the current node, because it is long and didn’t provide any information. It has no idea who might be the master.

    [discovery.zen.ping.unicast ] [1] received response from
              [#cloud-i-bf072ede-0][inet[/10.248.114.147:9300]]:
              [
               ping_response{
                 target[[All-American][VdpvmA8ARvqeBHm_yacQOw][inet[/10.220.137.71:9300]]],
                 master [null], cluster_name[es-demo]
               },
               ping_response{
                 target[[All-American][VdpvmA8ARvqeBHm_yacQOw][inet[/10.220.137.71:9300]]],  
                 master [null], cluster_name[es-demo]
               },
               ping_response{
                 target[[Ogre][N8Xbr35zS8CukC46-ZS2HQ][inet[/10.248.114.147:9300]]],
                 master [[Ogre][N8Xbr35zS8CukC46-ZS2HQ][inet[/10.248.114.147:9300]]],
                 cluster_name[es-demo]
               }]

elasticsearch has figured out who the master node is in the cluster. It can now use information the master contains to discover other nodes in the cluster (if there were others).

    [discovery.ec2] ping responses: --> target [[Ogre]
                   [N8Xbr35zS8CukC46-ZS2HQ][inet[/10.248.114.147:9300]]],
                   master [[Ogre][N8Xbr35zS8CukC46-ZS2HQ][inet[/10.248.114.147:9300]]]

A thread is created to monitor the health of the master node. The master node also will poll each node in the cluster to make sure they are healthy as well.

                   
    [discovery.zen.fd] [master] starting fault detection against master 
                       [[Ogre][N8Xbr35zS8CukC46-ZS2HQ]
                       [inet[/10.248.114.147:9300]]], 
                       reason [initial_join]

The second node has no knowledge of the cluster data, so it has two options; get its state from the master or the S3 gateway. As it turns out, it is usually much faster to get state from other nodes running in the cluster than going to S3.

    [discovery.ec2] got a new state from master node, though we are
                    already trying to rejoin the cluster
    [discovery    ] initial state set from discovery

The most authoritative technique to use to verify the cluster is up and working correctly is to perform a health check.

    $ curl -XGET 'http://<ec2 dns of a node>:9200/_cluster/health?pretty=true'
    {
      "cluster_name" : "es-demo",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 2,
      "number_of_data_nodes" : 2,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }

As we can see, we are in a green status meaning that all shards and replicas are allocated appropriately across the cluster. By default, we are running with 5 shards and 1 replica. It is worth noting that a cluster health check before we added our second node would have resulted in a yellow status, because in a single-node cluster the replicas can’t live on the same node as its associated primary shard.

Creating a Custom AMI

Before we move on to tearing down nodes to see how the cluster responds, let’s take a quick diversion and create our own personal AWS virtual machine so we can quickly get a new node up and running. If one of your nodes is currently running, SSH into it and follow the next steps to prepare it to be a new AMI. If you killed your nodes already, create a new one and configure elasticsearch.

Install as Service

We need to make sure elasticsearch starts as a service when the virtual machine is started. We don’t want to have to keep shelling into each remote server and manually starting elasticsearch. There are obviously more than one way to skin this cat, but we will use the elasticsearch-servicewrapper technique .

    $ cd ~
    $ sudo wget https://github.com/elasticsearch/elasticsearch-servicewrapper/zipball/master
    $ sudo unzip master
    $ sudo mv elasticsearch-elasticsearch-servicewrapper-3e0b23d/service/ 
      /usr/local/elasticsearch/elasticsearch-0.19.0/bin/
    $ cd /usr/local/elasticsearch/elasticsearch-0.19.0/
    $ sudo nano bin/service/elasticsearch.conf

We have to add our path to the elasticsearch installation to the conf file. Find the property to set the set.default.ES_HOME parameter:

    set.default.ES_HOME=/usr/local/elasticsearch/elasticsearch-0.19.0

Once the conf file is modified, you can install the service.

Note that this step requires you to choose from two different script files depending in whether you are running on a 32-bit or 64-bit operating system. elasticsearch32 and elasticsearch64 were not executable when I extracted them from the archive. I had to execute sudo chmod a+x bin/service/elasticsearch32 bin/service/elasticsearch64.

    $ sudo bin/service/elasticsearch32 install
    Detected Linux:
    Installing the ElasticSearch daemon..

Even though elasticsearch should be installed as a service at this point, it is a good idea to make sure it can run.

    $ sudo bin/service/elasticsearch32 start
    Starting ElasticSearch...
    Waiting for ElasticSearch......
    running: PID:1243

Check the logs and make sure everything started as you would expect by comparing them to our earlier set of logs. Finally, stop the running elasticsearch server by using the service call to stop the node.

    $ sudo bin/service/elasticsearch32 stop
    Stopping ElasticSearch...
    Stopped ElasticSearch.

General cleanup

There are just a couple things we will want to do once we have elasticsearch installed and configured as a service. The first is to delete the data and logs directories that were created when we ran elasticsearch the first time. Make sure elasticsearch is not running and delete the $ES_HOME/data/ and $ES_HOME/logs/ directories.

    $ sudo rm -fr data/ logs/

Remove the Private Key

Whenever creating a new AMI (especially a public AMI) you will want to delete the private key installed on the instance.

    $ sudo rm ~/.ssh/authorized_keys

Create the New AMI

At this time, we are ready to create our new elasticsearch AMI. Go to the AWS EC2 console and view your running instances. If you are following along with this tutorial carefully, you will see two of them. Select the one that represents the instance we just modified.

Note: Sometimes it is hard to relate which node appears in the console, and which one with which you have an ssh session. You can always type hostname or ifconfig in your ssh session to see the private DNS name or IP address of the virtual machine. Then you can select the instance in the EC2 console and match it using the properties for the instance.

With the appropriate node selected in the EC2 console, from the instance actions combobox, select Create image (EBS AMI). This will open a dialog where you must give your image a name. I called mine ‘es-demo’. Select the Create Image button. It will take a few moments to create your image, and it will terminate the currently running instance. Please take note of the name of your pending AMI instance so you can find it later. You can also search by name.

Simulating a Total Cluster Failure

At this time, lets do the unspeakable and simulate what any system administrator fears. Select all of your running EC2 elasticsearch instances, and from the Instance Actions combobox, select terminate. While this isn’t exactly the same abrupt stop as cutting the power to a running instance, it will get our point across.

In under a minute, your instances should show that they have been terminated. This means that our cluster metadata and index data only exists in the S3 gateway.

Starting our New AMI

Launching a new instance of our AMI is very similar to how we launched the Amazon Linux AMI earlier. Except when we are done launching the instance this time there will be no installation or configuration os elasticsearch required.

Select Launch Instance from the EC2 console. When the dialog opens this time, select the My AMIs tab, and you should see an AMI that matches the ID you wrote down earlier. If you forgot the AMI id, you can filter the AMI list by the name you gave your instance. If you don’t see your AMI, try changing the Viewing drop down to Private Images and see if it shows up in the list. For the rest of the launch process, select the options as we did
earlier in the tutorial.

At the close of the dialog, your new instance should be listed as pending in the My Instances section of the console. Once it is running, let’s SSH into the new instance and start our elasticsearch node. The logs will look a little different this time.

Our newly started node is elected as the master because no other masters were detected.

    [discovery.ec2  ] ping responses: {none}
    [cluster.service] new_master [Magus]
                      [2s-VhyRrQoG81S8xlvKCSw][inet[/10.249.118.94:9300]],
                      reason: zen-disco-join (elected_as_master)

This time, we restored (recovered?) state from S3. Our cluster is back in business.

    [gateway.s3] reading state from gateway [snip] ...
    [gateway.s3] read state from gateway [snip], took 97ms
    [gateway]    recovered [1] indices into cluster_state

Let’s do our cluster health check.

    $ curl -XGET 'http://ec2-50-19-34-86.compute-1.amazonaws.com:9200/_cluster/health?pretty=true'
    {
      "cluster_name" : "es-demo",
      "status" : "yellow",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 5,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5
    }

You can see that the cluster is in yellow state which means our primary shards are allocated, but replicas are not. Those five unassigned shards represent our replicas. And how about querying for that tweet we added so long ago?

    $ curl -XGET 'http://ec2-50-19-34-86.compute-1.amazonaws.com:9200/twitter/tweet/1'
    {"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
    "_source" : {
      "user": "kimchy",
      "post_date" : "2011-08-18T16:20:00",
      "message" : "trying out Elastic Search"
    }}

Isn’t life good?

Massive Clustering

Let’s push the envelope a bit and create nine more elasticsearch instances for a total of 10. Go to the EC2 console and put a check next to the instance we just launched created with our custom AMI. From the Instance Actions combobox, select Launch more like this. In the now familiar launch dialog, select the usual suspects, but increase the number of instance to 9. Once you are done, look with awe upon the EC2 console showing your pending, and soon to be running instances!

Because we are running an elasticsearch configuration using 5 shards with 1 replica, there are a maximum of 10 shards under management. With our ten node cluster we have maximized the utilization possible under elasticsearch using this default configuration. Let’s check the health of our cluster and make sure all nodes are participating in the cluster.

    $ curl -XGET 'http://<public ip>:9200/_cluster/health?pretty=true'
    {
      "cluster_name" : "es-demo",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 10,
      "number_of_data_nodes" : 10,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }

If everything went as expected, you will see the number of data nodes equal to the number of instances started, which in our case is ten.

Advanced Cases

Working with elasticsearch on EC2, there are some other settings which can be tweaked to optimize the discovery and gateway procedures.

Filtering the Discovery

Many people who run EC2 installations don’t just have elasticsearch running in there enviroment. There is often a heterogeneous environment with additional node instances for perhaps a cluster of web servers and one or more database instances. All of these servers are under the same AWS account.

If you recall the discovery process we discussed early in the tutorial, elasticsearch uses the AWS DescribeInstances API call to get a list of potential virtual machines to test as elasticsearch cluster members. Because elasticsearch does not filter this list, these other server types may be included in the resulting list. It may be possible because of this situation, for elasticsearch to use all of its time interrogating MySQL servers and timeout before discovering a node in the elasticsearch cluster.

Tags and groups are two ways that elasticsearch uses to filter down the list of servers it will interrogate. When you create your EC2 instances, remember the security group name we assign our instances? That is the group setting. Because we open a very specific port, the likelihood that we will create a specific group for just our elasticsearch instances is very high. So, groups do a very good job of filtering down the list of nodes to just those that are running elasticsearch.

In the elasticsearch configuration file, you can specify a group filter
using this syntax:

    discovery:
        type: ec2
            groups: es-demo[,optional other groups]

This does not imply that every EC2 instance running elasticsearch and assigned the elasticsearch security group belongs to the same cluster. For example, it is not uncommon to run several clusters under a single AWS account. Perhaps, they are used for different customers, or maybe they reflect a staging (dev, qa, prod) level. In this case, the security group may be the same for each instance.

You can solve this problem by using a different cluster name for each group, but remember that elasticsearch has to interrogate each server instance to determine its cluster name. That is exactly what leads to the discovery process we are attempting to resolve. Instead, this is a good use case for the use of tags. Just like security groups, tags are also specified during the launch of an instance, and you can have multiple tags.

    discovery:
        type: ec2
        ec2:
            tag:
                stage: dev

Troubleshooting

Add your troubleshooting questions here