Projects tigase _server server-core Issues #160
ops: graceful shutdown of external components (#160)
tom quas opened 1 decade ago
Due Date
2016-12-15

scenario: gracefully shut down particular MUC component in a cluster

environment: cluster of session managers, several external components (MUC, PubSub) running on several nodes

admin should be able to modify runtime configuration of session manager during uptime, so that it

  • does not continue to create new rooms on this particular component

  • keeps track of whether the component is still in use

  • notifies the admin(s) once the component is ready for shutdown

should work the same for all component types, such as PubSub

Artur Hefczyc commented 8 years ago

We already have a graceful shutdown for cluster nodes. I am not sure if it covers MUC and PubSub components too, %andrzej.wojcik please comment on this.

So as the functionality is already implemented on the cluster node level we will most likely not be working on such functionality for external components.

Andrzej Wójcik (Tigase) commented 8 years ago

Graceful shutdown works on cluster node, so if particular cluster node is going to be shutdown we will reconnect users to other nodes using @see-other-host@.

This will work fine with ACS versions of PubSub and MUC as well.

However if MUC or PubSub is configured as an external clustered component and one of this nodes will be stopped - then it will just be stopped.

Artur Hefczyc commented 8 years ago

Even MUC/PubSub are internal components the main question is how do we handle shutdown of a cluster node for these components? I guess, right now not special shutdown for them is implemented. However, in the future we should handle it somehow:

  1. MUC - depending in strategy there might be actually nothing special to do, but in some cases we might want to migrate open rooms onto other cluster nodes

  2. PubSub - not sure if this is relevant, but maybe we should make sure other cluster nodes takeover pubsub channels from the node which is in shutdown mode

Andrzej Wójcik (Tigase) commented 8 years ago

Marked as private since this is part of ACS.

When PubSub or MUC component is marked as internal then we already have code in strategies to handle node shutdown/disconnection.

  • MUC:

    • ShardingStrategy - room information is located on single node.

In this case when one node detects disconnection of other node it sends notification to local occupants that this room is gone.

  • ClusteredRoomStrategyV2 - room information is stored on every cluster node.

If remote cluster node is disconnected we send info to local occupants that occupants from remote node are no longer participants of a chat.

  • PubSub:

In this case we never do anything as there is nothing to be done

  • PartitionedStrategy - PubSub node information is stored on single cluster node

If one of cluster nodes is gone when our hashing algorithm adapts routing so other node will take over PubSub nodes from disconnected cluster node.

  • ClusteredStrategy - actions executed on PubSub node are executed on same cluster node as user is connected - we only broadcast notification to other nodes

As requests are processed locally then all is fine here.

Only possible issue with MUC are users connected from remote hosts over S2S. However I created mechanism which selects single cluster node which handles information about particular remote user and it should get notified as well.

Due to that I think all is already in place.

Artur Hefczyc commented 8 years ago

Andrzej, ok I think we can close the ticket about external components but I have a few comments to your last comment.

You describe behavior in case of the cluster node shutdown/disconnection but the topic is about graceful shutdown. So, as i understand we have a cluster node which we decide to take out of the cluster soon. We flag it as is being in shutdown mode. All users disconnect and reconnect to other nodes while this cluster node is still technically working. I believe there is something more we could do in such a case to MUC and maybe PubSub too.

At least for MUC ShardingStrategy we could move all rooms from the node being shutdown to other nodes, so for all users in rooms on this cluster node, node shutdown would be invisible and would not affect their service.

I guess for PubSub it does not really matter too much as it does not affect PubSub service for users subscribed to PubSub channels on cluster node being shutdown.

By the way, it looks to me like MUC's ShardingStrategy is more or less the same or very similar in concept to PubSub's PartitionedStrategy. Am I correct? If yes, then maybe could unify naming convention for MUC and PubSub strategies?

issue 1 of 1
Type
New Feature
Priority
Normal
Assignee
RedmineID
848
Version
tigase-server-8.0.0
Estimation
40h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#160
Please wait...
Page is in error, reload to recover