There is a race condition between WatchDog stale connection checks and setting value of the last package receive time of a connection.
Watchdog checks stale connections in tigase.server.ServiceChecker#check method.
When a new connection established tigase.xmpp.XMPPIOService instance is being created. tigase.xmpp.XMPPIOService#addReceivedPacket method calls tigase.xmpp.XMPPIOService#setLastXmppPacketReceiveTime
to set last received package time. There is a few milliseconds of time diff between tigase.xmpp.XMPPIOService instance creation and setLastXmppPacketReceiveTime().
In some cases WatchDog checks a connection before setLastXmppPacketReceiveTime() is being called. Since default value of the lastXmppPacketReceivedTime is 0, connection will be closed.
I assume default value of the tigase.xmpp.XMPPIOService#lastXmppPacketReceivedTime could be set as
private long lastXmppPacketReceivedTime = System.currentTimeMillis(); to resolve this issue.
To Reproduce
Continuously establish connections and disconnect.
Establishing multiple connections at the same time and decreasing the watchdog-delay will increase the chance of the race condition reoccur.
Impact
In my long running load tests I observed that around %0.5 of the 200 parallel connections could be closed by WatchDog.
Expected behaviorlastXmppPacketReceivedTime shouldn't be 0 for a newly established connection. So, connection should not be closed before watchdogmax inactivity time.
Screenshots
Details (please complete the following information):
Tigase version: 8.1.0
JVM flavour and version : openjdk 11.0.7 2020-04-14
Operating system/distribution/version : Ubuntu 18.04.1 LTS
INTEL(R) XEON(R) PLATINUM 8259CL CPU @ 2.50GHZ
GNU/Linux - 1 CPU - 2 virtual CPU - 4.06G RAM
Unknown commented 2 years ago
Thank you for the analysis and suggestion, commited.
Describe the bug
There is a race condition between WatchDog stale connection checks and setting value of the last package receive time of a connection.
Watchdog checks stale connections in
tigase.server.ServiceChecker#check
method. When a new connection establishedtigase.xmpp.XMPPIOService
instance is being created.tigase.xmpp.XMPPIOService#addReceivedPacket
method callstigase.xmpp.XMPPIOService#setLastXmppPacketReceiveTime
to set last received package time. There is a few milliseconds of time diff betweentigase.xmpp.XMPPIOService
instance creation andsetLastXmppPacketReceiveTime()
.In some cases WatchDog checks a connection before
setLastXmppPacketReceiveTime()
is being called. Since default value of thelastXmppPacketReceivedTime
is 0, connection will be closed.I assume default value of the
tigase.xmpp.XMPPIOService#lastXmppPacketReceivedTime
could be set asprivate long lastXmppPacketReceivedTime = System.currentTimeMillis();
to resolve this issue.To Reproduce Continuously establish connections and disconnect. Establishing multiple connections at the same time and decreasing the
watchdog-delay
will increase the chance of the race condition reoccur.Impact In my long running load tests I observed that around %0.5 of the 200 parallel connections could be closed by WatchDog.
Expected behavior
lastXmppPacketReceivedTime
shouldn't be 0 for a newly established connection. So, connection should not be closed before watchdogmax inactivity time
.Screenshots
Details (please complete the following information):