Projects tigase _server server-core Issues #1277
HUGE out queue in StreamManagementIOProcessor (#1277)
Andrzej Wójcik (Tigase) opened 3 years ago

In some rare cases queues in SMIOP may grow to very large size even in small amount of time.

Andrzej Wójcik (Tigase) commented 3 years ago

I've added a hard limit on 10000 items in the queue. After that connection will be stopped.

Moreover, I've decided to forceStop() connection if the queue is full or timed out instead of calling close() as calling close() while having items in the socket write queue (like in case of broken connection) will wait to try to send those data and eventually stop connection if TCP connection will be detected as closed. By using forceStop() we will remove this delay, and while we may not send stream error (will just terminate the connection), this will have no impact on the client-side as the most likely queue was too large or not acked due to networking issues and error would not be delivered anyway.

Andrzej Wójcik (Tigase) commented 3 years ago

This limit is configurable and can be adjusted in the config file.

wojciech.kapcia@tigase.net commented 3 years ago

It looks like Monal is ignoring location attribute when we enable SM which results in resumption of the session on random nodes which could contribute to the issue (example below, digit after timestamp indicates the cluster node), reported to Monal: https://github.com/monal-im/Monal/issues/737

What's more, looking at the packets queue and ack values reported by the client it looks like the user is on rather slow connection which probably results in failure to receive and confirm everything.

[2021-09-03 06:56:08.240] 2 [DEBUG] [    pool-35-thread-2] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = c36b7861-7b2c-4571-a49d-90f7df4663fd with 28 packets waiting for ack, local h = 39 and remote h = 11 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.10.118_42822, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.10.118_42822, connected Socket[addr=/172.31.10.118,port=42822,localport=5223]]
[2021-09-03 06:57:57.897] 3 [DEBUG] [    pool-35-thread-9] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = f4da00ae-f2b1-43a4-bb70-fad75125934f with 54 packets waiting for ack, local h = 65 and remote h = 11 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.42.200_47762, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-38-91.us-west-2.compute.internal/172.20.0.2_5223_172.31.42.200_47762, connected Socket[addr=/172.31.42.200,port=47762,localport=5223]]
[2021-09-03 06:58:29.998] 2 [DEBUG] [    pool-35-thread-4] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = ccf8a179-80e3-4a91-a9be-9514fdbec9ad with 142 packets waiting for ack, local h = 159 and remote h = 17 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.42.200_11022, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.42.200_11022, connected Socket[addr=/172.31.42.200,port=11022,localport=5223]]
[2021-09-03 07:00:35.919] 1 [DEBUG] [    pool-35-thread-1] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 13760245-2bf5-4231-8962-72c9d851ad7c with 2,073 packets waiting for ack, local h = 2,093 and remote h = 20 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.42.200_11254, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.42.200_11254, connected Socket[addr=/172.31.42.200,port=11254,localport=5223]]
[2021-09-03 07:02:02.563] 1 [DEBUG] [    pool-35-thread-3] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = c5d13b0e-5084-4b5e-8687-fa8bbf7d533b with 54 packets waiting for ack, local h = 67 and remote h = 13 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_3064, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_3064, connected Socket[addr=/172.31.31.162,port=3064,localport=5223]]
[2021-09-03 07:03:46.890] 3 [DEBUG] [    pool-35-thread-4] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 7049199b-3494-4304-b08f-8d1cb1a2977d with 5,098 packets waiting for ack, local h = 5,111 and remote h = 13 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.42.200_48510, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-38-91.us-west-2.compute.internal/172.20.0.2_5223_172.31.42.200_48510, connected Socket[addr=/172.31.42.200,port=48510,localport=5223]]
[2021-09-03 07:06:04.747] 1 [DEBUG] [    pool-35-thread-8] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 022a5166-1fd0-442e-84b7-2c59d0ef5ef1 with 9,045 packets waiting for ack, local h = 9,202 and remote h = 157 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_3612, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_3612, connected Socket[addr=/172.31.31.162,port=3612,localport=5223]]
[2021-09-03 07:08:10.873] 1 [DEBUG] [    pool-35-thread-9] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 0d995a1f-a54f-45a6-bc0a-4c30b5577471 with 667 packets waiting for ack, local h = 758 and remote h = 91 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_3880, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_3880, connected Socket[addr=/172.31.31.162,port=3880,localport=5223]]
[2021-09-03 07:10:04.522] 1 [DEBUG] [    pool-35-thread-4] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = bf325477-8da3-4a6d-a25a-ff2b795c6a1a with 11,990 packets waiting for ack, local h = 12,017 and remote h = 28 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_4128, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_4128, connected Socket[addr=/172.31.31.162,port=4128,localport=5223]]
[2021-09-03 07:10:52.651] 2 [DEBUG] [    pool-35-thread-9] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 2c78e5bb-ecf7-447c-8686-ffe38f883488 with 384 packets waiting for ack, local h = 412 and remote h = 28 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_7016, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_7016, connected Socket[addr=/172.31.31.162,port=7016,localport=5223]]
[2021-09-03 07:11:08.586] 2 [DEBUG] [    pool-35-thread-5] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 2c78e5bb-ecf7-447c-8686-ffe38f883488 with 12,079 packets waiting for ack, local h = 12,238 and remote h = 169 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_7052, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_7052, connected Socket[addr=/172.31.31.162,port=7052,localport=5223]]
[2021-09-03 07:14:04.657] 1 [DEBUG] [    pool-35-thread-9] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = ec01772f-d91e-4a64-aa23-d237f4385011 with 4,599 packets waiting for ack, local h = 4,642 and remote h = 43 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.42.200_12992, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.42.200_12992, connected Socket[addr=/172.31.42.200,port=12992,localport=5223]]
[2021-09-03 07:17:17.901] 1 [DEBUG] [    pool-35-thread-7] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 15d58ac8-3de9-4f2f-96fe-2be415ebb5fb with 3,369 packets waiting for ack, local h = 3,418 and remote h = 50 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_5156, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_5156, connected Socket[addr=/172.31.31.162,port=5156,localport=5223]]
[2021-09-03 07:17:48.655] 1 [DEBUG] [    pool-35-thread-8] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 15d58ac8-3de9-4f2f-96fe-2be415ebb5fb with 24,256 packets waiting for ack, local h = 24,306 and remote h = 384 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_5226, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_5226, connected Socket[addr=/172.31.31.162,port=5226,localport=5223]]
[2021-09-03 07:18:16.407] 1 [DEBUG] [    pool-35-thread-6] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = 15d58ac8-3de9-4f2f-96fe-2be415ebb5fb with 12,214 packets waiting for ack, local h = 58,502 and remote h = 651 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_5294, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-0-99.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_5294, connected Socket[addr=/172.31.31.162,port=5294,localport=5223]]
[2021-09-03 07:18:53.027] 2 [DEBUG] [    pool-35-thread-7] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = d1c1ed8b-6565-4fc9-941a-469e5fd761cd with 23 packets waiting for ack, local h = 57 and remote h = 34 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_8204, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_8204, connected Socket[addr=/172.31.31.162,port=8204,localport=5223]]
[2021-09-03 07:20:04.702] 2 [DEBUG] [    pool-35-thread-9] t.s.x.StreamManagementIOProcessor.resumeStream(): Resuming stream with id = d1c1ed8b-6565-4fc9-941a-469e5fd761cd with 1,012 packets waiting for ack, local h = 1,046 and remote h = 35 [jid: mikaela@tigase.im, C2SIOService, UniqueId: 172.20.0.2_5223_172.31.31.162_8406, type: <- incoming (accept), TLS: SocketIO, ID: c2s@ip-172-31-20-109.us-west-2.compute.internal/172.20.0.2_5223_172.31.31.162_8406, connected Socket[addr=/172.31.31.162,port=8406,localport=5223]]
wojciech.kapcia@tigase.net commented 3 years ago

It seems that hard, lower SM queue limit fixed the issue.

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Version
tigase-server-8.2.0
Spent time
11h 30m
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#1277
Please wait...
Page is in error, reload to recover