-
I am afraid this is not enough. My investigation with the cluster connection problem lead to a conclusion that the data was received somewhere in IOService or XMPPIOService but still could not be received by the connection manager. I have no idea how this is possible but apparently there is some hidden bug.
Your suggestion would be probably the most efficient and easiest to implement but I think it may not solve the problem, therefore I think we need some solution on connection manager.
Maybe the best approach would be to modify the Watchdog class inside the ConnectionManager class to do as following:
-
Add configurable parameters for connections testing intervals, we would leave existing default intervals for c2s and other connections and we could make it more frequent (1min or 10 secs) for clustered connections
-
Add a configuration option to test all connections instead of idle connections only
-
Add a configuration option to send some sort of XMPP ping instead of a whitespace ping, this could be actually useful even for c2s connections as I received some votes from our clients for a better client's connections tests. The whitespace seems not enough in some cases
-
We would need some extra code in the ConnectionManager to intercept responses for the xmpp pings we send and mark the connections as OK
-
Creating a classic Java timeout for each ping might be too resources consuming and hard to manage, therefore we can set some kind of time markers in the IOService.getSessionData(), the watchdog could check the markers and would know if the timeout for XMPP ping has been exceeded and could stop the connection in such a case
What do you think?
-
-
As per discussion - can you re-check the logic and whether the global watchdog settings don't override cluster one which could trigger periodic disconnection of cluster connections?
e.g.:
--watchdog_timeout=320000 --watchdog_delay=300000 --watchdog_ping_type=xmpp c2s/max-inactivity-time[L]=360
-
I agree, cluster connections should be independent from other connection types. If we need any configuration for cluster connections, then it should be separate from configuration for other connections. (I know it is not in some cases now, such as throttling but in the future it should be separate.)
However, I would prefer to avoid having any configuration here and instead setup connections in an automated way based on the environment properties (no of CPUs, memory amount, number of cluster nodes, traffic....). But this is something to look at in future versions.
Type |
New Feature
|
Priority |
Major
|
Assignee | |
RedmineID |
1436
|
Version |
tigase-server-7.1.0
|
Spent time |
0
|
Tigase opens a few connections between each pair of the cluster nodes. At the moment there is no way to detect that one of the connections is broken, except the simple whitespace test performed by the ConnectionManager Watchdog.
It seems this is not enough. It happened once that a connection was working only one way and did not work the other way. Data were lost somewhere in IOService area.
It looks like we need a more comprehensive connections testing on a higher level than IOService (ClusterConnectionManager). Each connection should be tested in each direction.
We need to implement some kind of XMPP ping between ClusterConnectionManager but we have to make sure the ping is sent on a specific connection. A response to the ping can actually arrive on any other connection assuming that both sides would periodically run the test from their side.