You are on page 1of 1

Operationally, we do the following for a healthy ZooKeeper installation:

• Redundancy in the physical/hardware/network layout: try not to put them all in the same rack,
decent (but don't go nuts) hardware, try to keep redundant power and network paths, etc. A
typical ZooKeeper ensemble has 5 or 7 servers, which tolerates 2 and 3 servers down,
respectively. If you have a small deployment, then using 3 servers is acceptable, but keep in
mind that you'll only be able to tolerate 1 server down in this case.
• I/O segregation: if you do a lot of write type traffic you'll almost definitely want the transaction
logs on a dedicated disk group. Writes to the transaction log are synchronous (but batched for
performance), and consequently, concurrent writes can significantly affect performance.
ZooKeeper snapshots can be one such a source of concurrent writes, and ideally should be
written on a disk group separate from the transaction log. Snapshots are written to disk
asynchronously, so it is typically ok to share with the operating system and message log files.
You can configure a server to use a separate disk group with the dataLogDir parameter.
• Application segregation: Unless you really understand the application patterns of other apps that
you want to install on the same box, it can be a good idea to run ZooKeeper in isolation (though
this can be a balancing act with the capabilities of the hardware).
• snapshots can become large, and large snapshots affect recovery time. In fact, if the snapshot
Use care with virtualization: It can work, depending on your cluster layout and read/write
patterns and SLAs, but the tiny overheads introduced by the virtualization layer can add up and
throw off ZooKeeper, as it can be very time sensitive
• ZooKeeper configuration: It's java, make sure you give it 'enough' heap space (We usually run
them with 3-5G, but that's mostly due to the data set size we have here). Unfortunately we don't
have a good formula for it, but keep in mind that allowing for more ZooKeeper state means that
becomes too large (a few gigabytes), then you may need to increase the initLimit parameter to
give enough time for servers to recover and join the ensemble.
• Monitoring: Both JMX and the 4 letter words (4lw) commands are very useful, they do overlap
in some cases (and in those cases we prefer the 4 letter commands, they seem more predictable,
or at the very least, they work better with the LI monitoring infrastructure)
• Don't overbuild the cluster: large clusters, especially in a write heavy usage pattern, means a lot
of intracluster communication (quorums on the writes and subsequent cluster member updates),
but don't underbuild it (and risk swamping the cluster). Having more servers adds to your read
capacity.

You might also like