Projects tigase _server server-core Issues #521
Investigate GC settings (#521)
Artur Hefczyc opened 9 years ago
Due Date
2016-06-28

Looks like concurrent GC that we use by default is no longer as affective as it used to be. Even on our own production, public XMPP service the memory collections are not very effective. We see steady memory usage grow and then large collections from time to time.

This causes some problems of service delays during long collections and even OOM on some systems.

We need to find better GC settings which would really give us concurrent and steady collections while the server is running.

Artur Hefczyc commented 9 years ago

Testing and experimenting different GC settings is extremely time consuming and expensive. I think we should start with reviewing recent changes in GC for JVM8 and propose best settings this have most logical sense for a typical high load Tigase installation. Then, if we have time and resources we can experiment with different settings.

wojciech.kapcia@tigase.net commented 9 years ago

Agreed.

I think it could also be prudent to include short description of each GC setting that we would include in tigase.conf so it could be possible to enable them selectively (by uncommenting lines) only with intended settings.

wojciech.kapcia@tigase.net commented 9 years ago

Could you provide me with details about Tsung installation used for our daily load tests? I would like to utilize it for this task. I remember Tigase is on cXX machines - correct? And cron job runs sometime in (my) morning - between 1-6am CET ?

Eric Dziewa commented 9 years ago

Cron jobs start at 2:50am and end at 7:10am your timezone. c400.xmpp-test.net is the controller.

You might prefer using the hardware machines which include backup.tigase.org, hw2.xmpp-test.net, and v33.tigase.org. They can create a bigger load, and faster than the VM machines.

Backup runs from 6am to 8am your time, this machine is idle the rest of the time. Login to tigase@hw2.xmpp-test.net and use comparison6.xmpp-test.net-ssl-tsung.xml as a template for two machine setup.

If you want to use all 3 use tigase@v33.tigase.org as controller comparison-72K-2HW.xml . v33 hosts the tsung cluster c40x so don't use it during cron jobs 2:50am - 8:00am.

wojciech.kapcia@tigase.net commented 9 years ago

Currently running a couple of tests on our live installations as it was relatively difficult to simulate this problem using dedicated ones.

%kobit please check my question regarding YourKit

wojciech.kapcia@tigase.net commented 9 years ago

More tests

wojciech.kapcia@tigase.net commented 9 years ago

A couple of general, short remarks from the tests:

  • it's not only that default ratio of YoungGen/TenuredSpace is bad (2:1 by default) - it can get automatically adjusted and as a result usually ends up in way worse ratio (for example 150M when whole heap was configured as 6G!)

  • CMS seems to be slightly better on our installations while G1GC results in more and longer pauses;

  • our statistics doesn't play well with GC, especially when history is enabled (will create separate ticket for this later on):

    • statistics labels are not interned hence resulting in thousands of duplicated strings:
String                                                          Count   Size [B]
<All duplicate strings>                                         3259070 263459992
"IN_QUEUE processed IQ http://jabber.org/protocol/disco#info"   28080   4492640
"OUT_QUEUE processed IQ http://jabber.org/protocol/disco#info"  23760   3801440
"OUT_QUEUE processed IQ http://jabber.org/protocol/disco#items" 21600   3628632
  • numeric values are hold as strings (different data set):
String                  Count  Size [B]
<All duplicate strings> 412460 30871224
"2"                     40025  1921152
"1"                     36089  1732224
"4"                     22008  1056336
"3"                     16561  794880

To be done:

  • more tests with different scenarios;

  • process and share results;

  • update etc/tigase.conf to reflect conclusions;

  • finish Admin guide related to recommended JVM settings.

Artur Hefczyc commented 9 years ago

Wojciech Kapcia wrote:

** statistics labels are not interned hence resulting in thousands of duplicated strings:

** numeric values are hold as strings (different data set):

That's a good point and actually, relatively easy to fix.

wojciech.kapcia@tigase.net commented 9 years ago

Artur Hefczyc wrote:

Wojciech Kapcia wrote:

** statistics labels are not interned hence resulting in thousands of duplicated strings:

** numeric values are hold as strings (different data set):

That's a good point and actually, relatively easy to fix.

I've created task for that: #4256 and assigned it to version 7.1.0

The result of this ticket is a report from the tests: https://projects.tigase.org/documents/60 as well as updated documentation page "JVM settings and recommendations" in the Admin guide.

Artur Hefczyc commented 9 years ago

One last bit is missing. We need a concrete JMV settings that we recommend for our installations and for our customers.

wojciech.kapcia@tigase.net commented 9 years ago

Artur Hefczyc wrote:

One last bit is missing. We need a concrete JMV settings that we recommend for our installations and for our customers.

As concluded (and described in JVM settings and recommendations) - there is no one perfect setting that would be ideal for all installations. For majority of medium-to-big installations CMS with enforced NewSize=2 (or adjusted ratio depending on particular usage pattern) should be best choice. In addition, while describing it in the linked guide I've also updated the settings in etc/tigase.conf with those recommendation -- should it be made even more explicit?

Artur Hefczyc commented 9 years ago

The top of the page contains old GC settings. So this should be updated.

The rest of the documentation is very good because it gives some default settings and suggestions on how to further tweak them.

However, what I would like to see is example GC settings for a few use-cases:

  1. Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)

  2. Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)

  3. A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU

  4. Something else if you have any idea

The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.

wojciech.kapcia@tigase.net commented 9 years ago

Artur Hefczyc wrote:

The top of the page contains old GC settings. So this should be updated.

Done.

However, what I would like to see is example GC settings for a few use-cases:

Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)

Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)

A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU

Something else if you have any idea

The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.

This may be related in a way to #3254.

At any rate - I've updated the documentation with settings for particular setups, which should be a good starting point.

Artur Hefczyc commented 8 years ago

I can still see following at the beginning of the page:

GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelCMSThreads=2 -XX:-ReduceInitialCardMarks"

Is this what we recommend?

wojciech.kapcia@tigase.net commented 8 years ago

Where exactly? I've checked http://docs.tigase.org/tigase-server/snapshot/Administration_Guide/html_chunk/jvm_settings.html and it says:

#GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=2 -XX:+CMSIncrementalMode -XX:-ReduceInitialCardMarks -XX:CMSInitiatingOccupancyFraction=70"

And this was done in the latest commit 5 days ago: https://projects.tigase.org/projects/tigase-server/repository/revisions/53220581636a58b07ebfa300eb42a77e4ad01eb4/diff/modules/documentation/adminguide/asciidoc/text/Admin_Guide_13_-Configuration-E-_JVM_settings.asciidoc

Artur Hefczyc commented 8 years ago

Ok, I can now see updated settings. I was probably looking at the page before it was updated on our website.

issue 1 of 1
Type
Task
Priority
Normal
Assignee
RedmineID
3248
Version
tigase-server-7.1.0
Spent time
585h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#521
Please wait...
Page is in error, reload to recover