Projects tigase _server server-core Issues #1230
C2SIOService are not removed after resumption and StreamManagement OutQueue grows unreasonable (#1230)
Andrzej Wójcik (Tigase) opened 4 years ago
Affected versions
tigase-issue #8.2.0

On tigase.org we have got an OOM exception (due to exhausted HEAP memory space). After analysis of the OOM, it looks like it was mostly caused by C2S connection instances not being removed after they resume or expire.

Additionally, OutQueue in StreamManagement in some cases can grow without limits (in this case up to 40MB in 15minutes without stanza confirmation).

Andrzej Wójcik (Tigase) commented 4 years ago

I've replaced remove with get (of TimerTask) as before changes in #issue #1224 remove of TigaseTask was used with put true (so contains() returned true instead of false now). This could lead to "leakage" of instances of C2SIOService.

Additionally, I've added a constrain to limit OutQueue keep packets for only max-resumption-timeout (about 90s) if the size of the queue is bigger than 30 packets (some clients confirm delivery in bulks, after each 5-10 packets). That should solve this issue with growing OutQueue as if a new packet would appear and over 30 previous packets (with oldest older than 90s) would not be confirmed, then the connection will be stopped.

wojciech.kapcia@tigase.net commented 4 years ago

The changes look ok.

Andrzej Wójcik (Tigase) commented 4 years ago

Deployed on tigase.org

Andrzej Wójcik (Tigase) commented 4 years ago

Before disconnection, we should send <policy-violation/> stream error to notify a user about that and forbid resumption from happening.

wojciech.kapcia@tigase.net commented 4 years ago

tigase.org was updated with the hotfix, AMI: tigase.org-xmpp-14c

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Version
tigase-server-8.2.0
Spent time
3h 15m
Subsystem
c2s
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#1230
Please wait...
Page is in error, reload to recover