-
Testing and experimenting different GC settings is extremely time consuming and expensive. I think we should start with reviewing recent changes in GC for JVM8 and propose best settings this have most logical sense for a typical high load Tigase installation. Then, if we have time and resources we can experiment with different settings.
-
Cron jobs start at 2:50am and end at 7:10am your timezone. c400.xmpp-test.net is the controller.
You might prefer using the hardware machines which include backup.tigase.org, hw2.xmpp-test.net, and v33.tigase.org. They can create a bigger load, and faster than the VM machines.
Backup runs from 6am to 8am your time, this machine is idle the rest of the time. Login to tigase@hw2.xmpp-test.net and use comparison6.xmpp-test.net-ssl-tsung.xml as a template for two machine setup.
If you want to use all 3 use tigase@v33.tigase.org as controller comparison-72K-2HW.xml . v33 hosts the tsung cluster c40x so don't use it during cron jobs 2:50am - 8:00am.
-
A couple of general, short remarks from the tests:
-
it's not only that default ratio of YoungGen/TenuredSpace is bad (2:1 by default) - it can get automatically adjusted and as a result usually ends up in way worse ratio (for example 150M when whole heap was configured as 6G!)
-
CMS seems to be slightly better on our installations while G1GC results in more and longer pauses;
-
our statistics doesn't play well with GC, especially when history is enabled (will create separate ticket for this later on):
- statistics labels are not interned hence resulting in thousands of duplicated strings:
String Count Size [B] <All duplicate strings> 3259070 263459992 "IN_QUEUE processed IQ http://jabber.org/protocol/disco#info" 28080 4492640 "OUT_QUEUE processed IQ http://jabber.org/protocol/disco#info" 23760 3801440 "OUT_QUEUE processed IQ http://jabber.org/protocol/disco#items" 21600 3628632
- numeric values are hold as strings (different data set):
String Count Size [B] <All duplicate strings> 412460 30871224 "2" 40025 1921152 "1" 36089 1732224 "4" 22008 1056336 "3" 16561 794880
To be done:
-
more tests with different scenarios;
-
process and share results;
-
update
etc/tigase.conf
to reflect conclusions; -
finish Admin guide related to recommended JVM settings.
-
-
Artur Hefczyc wrote:
Wojciech Kapcia wrote:
** statistics labels are not interned hence resulting in thousands of duplicated strings:
** numeric values are hold as strings (different data set):
That's a good point and actually, relatively easy to fix.
I've created task for that: #4256 and assigned it to version 7.1.0
The result of this ticket is a report from the tests: https://projects.tigase.org/documents/60 as well as updated documentation page "JVM settings and recommendations" in the Admin guide.
-
Artur Hefczyc wrote:
One last bit is missing. We need a concrete JMV settings that we recommend for our installations and for our customers.
As concluded (and described in JVM settings and recommendations) - there is no one perfect setting that would be ideal for all installations. For majority of medium-to-big installations CMS with enforced NewSize=2 (or adjusted ratio depending on particular usage pattern) should be best choice. In addition, while describing it in the linked guide I've also updated the settings in
etc/tigase.conf
with those recommendation -- should it be made even more explicit? -
The top of the page contains old GC settings. So this should be updated.
The rest of the documentation is very good because it gives some default settings and suggestions on how to further tweak them.
However, what I would like to see is example GC settings for a few use-cases:
-
Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)
-
Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)
-
A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU
-
Something else if you have any idea
The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.
-
-
Artur Hefczyc wrote:
The top of the page contains old GC settings. So this should be updated.
Done.
However, what I would like to see is example GC settings for a few use-cases:
Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)
Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)
A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU
Something else if you have any idea
The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.
This may be related in a way to #3254.
At any rate - I've updated the documentation with settings for particular setups, which should be a good starting point.
-
Where exactly? I've checked http://docs.tigase.org/tigase-server/snapshot/Administration_Guide/html_chunk/jvm_settings.html and it says:
#GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=2 -XX:+CMSIncrementalMode -XX:-ReduceInitialCardMarks -XX:CMSInitiatingOccupancyFraction=70"
And this was done in the latest commit 5 days ago: https://projects.tigase.org/projects/tigase-server/repository/revisions/53220581636a58b07ebfa300eb42a77e4ad01eb4/diff/modules/documentation/adminguide/asciidoc/text/Admin_Guide_13_-Configuration-E-_JVM_settings.asciidoc
Type |
Task
|
Priority |
Normal
|
Assignee | |
RedmineID |
3248
|
Version |
tigase-server-7.1.0
|
Spent time |
0
|
Looks like concurrent GC that we use by default is no longer as affective as it used to be. Even on our own production, public XMPP service the memory collections are not very effective. We see steady memory usage grow and then large collections from time to time.
This causes some problems of service delays during long collections and even OOM on some systems.
We need to find better GC settings which would really give us concurrent and steady collections while the server is running.