wojciech.kapcia@tigase.net opened 4 years ago
|
|
@andrzej.wojcik you reworked somewhat pubsub and handling of the publication but this part hasn't been implement from what I saw (and you removed delay and forced GC altogether: https://github.com/tigase/tigase-pubsub/blob/stable/src/main/java/tigase/pubsub/modules/PublishItemModule.java#L559) - correct? |
|
@wojtek Yes, delay and force GC was having a very negative impact on some installations leading to performance issues and creation of bottlenecks caused by constant calls to GC, so it was removed already. New code (delayed publication of notifications) is not implemented at this point. |
|
@wojtek I've added initial implementation of rate limiter but we would need to clarify a few things. Things to discuss:
Please review the code and answer above questions. Then we will be able to move forward with this task. |
|
@wojtek Do we need to do anything more? or can we close it? |
|
Determining exact values would probably require running a load test. However, by the rule of thumb we always said that Tigase can process 10k packets per second on single core. So maybe we could calculate it based on that?
Maybe we should log an
It should be enough for now. It would still be improvement over what we have now. In the future we could ponder utilising
Sounds reasonable.
Agreed, though... somewhat similar to Executors in this aspect?
I'd say that we should at least check heap usage (to avoid overloading the installation) but definitely we should avoid calls to GC. Rate limiting here will be more than enough. A side comment: admin and development guide: how to use, how to configure (priority per pubsub node) and in dev - how to create own (though, extending javadoc here should be more than enough. Btw. for bigger changes and code reviews please use dedicated Pull Requests - will make checking them easier. |
|
I've decided to set the default to half of this value per core, so 5k packets per second per core. We need to keep in mind that PubSub publications may be stored in the database or sent via S2S, so using half of 10k should make it work better than the actual 10k.
I've added info about starting throtting (including publication rate and current queue size) and info about throttling ending.
We could, but I think that would be pointless. As I've stated, calling
Yes, as it executes stuff, but it has a priority queue while none of the executors have. Moreover, it has throttling, which is not available in any executor.
Sure, but what should be the correct value? As I remember we had a lot of issues based on checking percentage usage if admin decided not to set max memory usage as JDK was thinking it was using over 90% of memory (and started throttling) while JDK was still being able to resize heap.
We do not have a way now to configure the priority of execution based on the node - that was just an idea for further development. But executor has support for that to make it possible (as I think it could be useful). As for documentation, I've added a description of how to configure the publishing rate, but Javadoc is in place - see |
|
AFAIR the issue was somewhat different - the mechanism with forced GC could hammer the system by almost constantly calling GC (which quite often couldn't free-up enough memory so with the subsequent check there would be another call to the GC). With the throttled published mechanism in case of forcing GC we would simply prevent publish more packets (hence avoiding running into OOM with packet flood). I think that with the above addition this can be closed. I created #issue #117 as a followup with the targed in next minor version. |
|
I've implemented memory throttling in the separate branch I've added 2 memory usage marks:
With those entries, I've applied the following rules:
|
|
PR awaits for review and merge https://github.com/tigase/tigase-pubsub/pull/3 |
|
Thank you for the change, it looks OK. I added documentation of the limits and merged it. |
Type |
New Feature
|
Priority |
Normal
|
Assignee | |
Version |
tigase-server-8.2.0
|
Spent time |
13h
|
Currently the only thing that somewhat limits publication rate is naive mechanism preventing GC, which pauses publication for a period of time if certain memory limit is reached. This at best only prevents OOM, but still can overwhelm the system with lots of notification being generate in period of time. What's more, incorrect configuration of current settings can lead to completely dysfunctional installation (low memory limit would effectively cause constant GC and constant paused-state of the JVM)
The idea is to have dedicated Publishing Executor, which would operate in dedicated threads and would have it's rate limit (derived from system/machine parameters). There are a couple of benefits of this solution:
@andrzej.wojcik a couple of things to consider: