Projects tigase _server server-core Issues #212
Cluster connections testing (#212)
Artur Hefczyc opened 1 decade ago
Due Date
2015-09-25

Tigase opens a few connections between each pair of the cluster nodes. At the moment there is no way to detect that one of the connections is broken, except the simple whitespace test performed by the ConnectionManager Watchdog.

It seems this is not enough. It happened once that a connection was working only one way and did not work the other way. Data were lost somewhere in IOService area.

It looks like we need a more comprehensive connections testing on a higher level than IOService (ClusterConnectionManager). Each connection should be tested in each direction.

We need to implement some kind of XMPP ping between ClusterConnectionManager but we have to make sure the ping is sent on a specific connection. A response to the ping can actually arrive on any other connection assuming that both sides would periodically run the test from their side.

Andrzej Wójcik (Tigase) commented 1 decade ago

Maybe we should think about using some solution on XMPPIOService level with testing connection by sending packet to other endpoint and waiting for response so it will be possible to test XMPPIOService in both ways?

Artur Hefczyc commented 1 decade ago

I am afraid this is not enough. My investigation with the cluster connection problem lead to a conclusion that the data was received somewhere in IOService or XMPPIOService but still could not be received by the connection manager. I have no idea how this is possible but apparently there is some hidden bug.

Your suggestion would be probably the most efficient and easiest to implement but I think it may not solve the problem, therefore I think we need some solution on connection manager.

Maybe the best approach would be to modify the Watchdog class inside the ConnectionManager class to do as following:

  1. Add configurable parameters for connections testing intervals, we would leave existing default intervals for c2s and other connections and we could make it more frequent (1min or 10 secs) for clustered connections

  2. Add a configuration option to test all connections instead of idle connections only

  3. Add a configuration option to send some sort of XMPP ping instead of a whitespace ping, this could be actually useful even for c2s connections as I received some votes from our clients for a better client's connections tests. The whitespace seems not enough in some cases

  4. We would need some extra code in the ConnectionManager to intercept responses for the xmpp pings we send and mark the connections as OK

  5. Creating a classic Java timeout for each ping might be too resources consuming and hard to manage, therefore we can set some kind of time markers in the IOService.getSessionData(), the watchdog could check the markers and would know if the timeout for XMPP ping has been exceeded and could stop the connection in such a case

What do you think?

Artur Hefczyc commented 10 years ago

Do you think we can have it by the end of Jan 2015? If not please move it for next version.

Andrzej Wójcik (Tigase) commented 10 years ago

I'm moving this to next release as I'm not sure if it will be ready and tested by end of Jan 2015.

Andrzej Wójcik (Tigase) commented 9 years ago

I create implementation in which each cluster connection is tested in both ways using XMPP ping packet.

By default ping packet is sent every 30 seconds and it is checked if in last 3 minutes we received ping packet.

Artur Hefczyc commented 9 years ago

Solution, approach seems good to me.

wojciech.kapcia@tigase.net commented 9 years ago

As per discussion - can you re-check the logic and whether the global watchdog settings don't override cluster one which could trigger periodic disconnection of cluster connections?

e.g.:

--watchdog_timeout=320000
--watchdog_delay=300000
--watchdog_ping_type=xmpp
c2s/max-inactivity-time[L]=360
Andrzej Wójcik (Tigase) commented 9 years ago

As I checked code it looks that ClusterConnectionManager default settings override settings from properties you provided.

I think this is rather good as cluster connections are specific type of connections for which default settings should not apply.

Artur Hefczyc commented 9 years ago

I agree, cluster connections should be independent from other connection types. If we need any configuration for cluster connections, then it should be separate from configuration for other connections. (I know it is not in some cases now, such as throttling but in the future it should be separate.)

However, I would prefer to avoid having any configuration here and instead setup connections in an automated way based on the environment properties (no of CPUs, memory amount, number of cluster nodes, traffic....). But this is something to look at in future versions.

wojciech.kapcia@tigase.net commented 9 years ago

Andrzej - thank you.

issue 1 of 1
Type
New Feature
Priority
Major
Assignee
RedmineID
1436
Version
tigase-server-7.1.0
Spent time
39h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#212
Please wait...
Page is in error, reload to recover