Tuesday, July 29, 2014

ElasticSearch Resiliancy and Failover

ElasticSearch is designed with the assumption that things break. Hardware fails and software crashes. That’s a fact of life. ElasticSearch mainly deals with this through the use of clustering. That means that many machines can be combined together to work as a single unit.

Data can be replicated or copied across multiple servers so that the loss of one or more nodes can be tolerated by the cluster and still have the cluster respond to requests in a timely manner. This obviously depends on the exact configuration.

The primary mechanism for ElasticSearch resiliency is divided into 3 parts - nodes, shards, and replicas. Nodes are the individual servers and as nodes are added to the cluster, data is spread across them in terms of shards and replicas. A shard is a slice or section of the data and is specified when an index is created and cannot be changed. A replica is simply copy of a shard. The number of replicas can be changed on the fly.

If a cluster is formed from a single node, then all primary shards on are on that node. If there is a failure of the node or of a shard, then data will likely be lost.

When you add a node to the cluster so you now have 2 nodes, then primary shards will be spread across both nodes. If a node is lost, then some data will not be available until that node is restored. If that node is lost, then data will be lost.

The next step would be to add a replica. Again, a replica is a copy of the data. In a 2-node scenario, the primary shards will be spread across both nodes, but the a replica of the same shard will be allocated to the alternate node. The effect is that both nodes will have a full copy of the data, but one node will not hold all of the primary shards. If a node is lost in this case, the remaining node can handle all requests. When the failed node is restored or a replacement node is added to the cluster, ElasticSearch will replicate shards to the second node and achieve equilibrium again.

Adding more nodes to the cluster will spread the 2 data copies across all 3 nodes now. The cluster can handle the failure of a single node, but not 2 nodes.

If a second replica is added (making 3 copies of the data), then each node would effectively have a copy of the data. This configuration should be able to handle the loss of 2 of the 3 nodes, though you really don’t want that to happen.

ElasticSearch has some features that allow you to influence shard allocation. There are a couple of different algorithms that it can use to control shard placement. Recently, they have added the size of the shard into the allocation strategy such that one node does not end of having lots of large shards while another node has primarily small shards.

ElasticSearch also has a rack awareness feature. This allows you to tell ElasticSearch that some nodes are physically close to each other while other nodes are physically far apart. For example, you can have some nodes in one data center and other nodes of the same cluster in another data center. ElasticSearch will try to keep requests localized as much as possible. Having a cluster split across data centers is not really recommended for performance reasons, but is an option.

ElasticSearch has a federated search feature. This allows 2 clusters to be in physically separate data centers while essentially a third cluster will arbitrate requests across clusters. This is a very welcome feature for ElasticSearch.

ElasticSearch has also added a snapshot-restore feature as of the 1.0 release. This allows an entire cluster to be backed up and restored, or one or more indices can be specified for a particular snapshot. Snapshots can be taken on a fully running cluster and can be scripted and taken periodically.

Once the initial snapshot occurs, subsequent snapshots taken are incremental in nature. http://www.elasticsearch.org/blog/introducing-snapshot-restore/

One of the great things about snapshot/restore is that it can be used to keep a near-line cluster that mirrors the production cluster. If the production cluster goes down, the backup cluster can be brought back online at roughly the same point in time document wise.

This feature works by shutting the snapshots to the near-line cluster, then applying the snapshots to that cluster. Since the snapshots are incremental, the process should be pretty quick, even for clusters with a good amount of volume.

There are a variety of ways that ElasticSearch can be made resilient depending on the exact architecture. ElasticSearch continues to evolve in this regard as more and more companies rely on it for mission critical applications.

  - Craig