We use Elasticsearch 2.4 with Java 8 + Grafana 3.11 for dashboards used in average by ~500 users, in peaks 1000+ users. And we have (2018/2) ~2,5G documents with total size ~460GB. I do not know if those are big numbers or not. (Old versions are used due to policy/legacy reasons – do not want me to explain it.)

You probably read in Elasticsearch documentation that heap should not be set over 32 GB due to the Java internal functionality. I read it too and it made sense but lately our elasticsearch instance went into deep problem with error “Data too large, data for [@event] would be larger than limit of [23263589171/21.6gb]]]”. And monitoring (done with Telegraf + InfluxDB + Grafana) showed crazy values for “tripped breakers”.

I changed settings of “indices.breaker.fielddata.limit” several times until another change would set it into 100% which does not make sense. At one moment number of these errors was suddenly too high so customers were getting too many of these errors and almost no data.

Of course it happened at the evening so I had no other choice then to try “crazy emergency solution” from my home computer. I upgraded GCE instance to give it much more memory and restarted ES with heap set to 50 GB.

So far so good:

  • monitoring is back to normal numbers
  • heap usage which was previously still on maximum without any changes is now changing based on load
  • I see only usual 1 tripped breaker no higher numbers
  • JVM garbage collection time ration is back to usual miliseconds – in emergency moments monitoring showed ratio of minutes
  • responses in Grafana look very good

I write this text just to inform you that elasticsearch 2 seems to work very well with heap bigger then 32GB.

 

Links:

  1. Heap: Sizing and Swapping
  2. A Heap of Trouble
  3. FIELDDATA Data is too large