Projects tigase _server server-core Issues #467
Generate user redirection on node shutdown (#467)
Closed
wojciech.kapcia@tigase.net opened 9 years ago
Due Date
2016-03-30

It may be prudent to generate see-other-host redirection upon shutting down the cluster node.

Artur Hefczyc commented 9 years ago

Looks like this might be very important feature. This should consist of 2 elements:

  1. Send a redirect to all currently connected users/devices

  2. Set some kind of internal field in Tigase to not accept any new connections and redirect all new incoming connection to a different cluster node.

  3. it should also notify other cluster nodes to not redirect clients to this cluster node. Kind of taking the node out of the Tigase LB pool. It could be implemented through a database. Maybe we can have additional metadata in the cluster nodes table which marks a node as offline/standby or something like that.

I think it should implemented as an ad-hoc command (available through REST API and Admin UI) which redirects users to a different cluster node(s). It should be possible to either give redirection parameter as kind of:

  1. Some very specific machine (or a list of machines)

  2. Or, kind of "all other cluster nodes)

This is important to implement a graceful machine shutdown for maintenance purposes.

Artur Hefczyc commented 9 years ago

Looks like Wojciech will have no time for this task. Andrzej, please provide work estimation for this.

Andrzej Wójcik (Tigase) commented 9 years ago

I added adhoc command for graceful shutdown which used in cluster mode will redirect user to other nodes using used see-other-host strategy. I decided to not allow user to point all users to particular node as in this case this node would use see-other-host strategy and redirect users once again - so I used see-other-host strategy to redirect user directly to node which should be used.

Additionally this feature allows to notify all users connected to shutdown cluster nodes that this nodes will be shut down - this might be good idea to warn users as still not all XMPP clients support see-other-host feature.

Shutdown will be executed after selected timeout - timeout is used to notify users before shutdown and to allow other nodes to react on fact that this node will be shut down.

If shutdown will be executed in way that it would result in shutdown of whole cluster (every node) or it will be executed on single server instance than see-other-host error will not be sent but this feature will send shutdown stream error as in RFC.

Node for command is used from XEP-0133: Service Administration as there is specified adhoc Shutdown command.

Artur Hefczyc commented 9 years ago

Andrzej Wójcik wrote:

I added adhoc command for graceful shutdown which used in cluster mode will redirect user to other nodes using used see-other-host strategy. I decided to not allow user to point all users to particular node as in this case this node would use see-other-host strategy and redirect users once again - so I used see-other-host strategy to redirect user directly to node which should be used.

Additionally this feature allows to notify all users connected to shutdown cluster nodes that this nodes will be shut down - this might be good idea to warn users as still not all XMPP clients support see-other-host feature.

Shutdown will be executed after selected timeout - timeout is used to notify users before shutdown and to allow other nodes to react on fact that this node will be shut down.

If shutdown will be executed in way that it would result in shutdown of whole cluster (every node) or it will be executed on single server instance than see-other-host error will not be sent but this feature will send shutdown stream error as in RFC.

Node for command is used from XEP-0133: Service Administration as there is specified adhoc Shutdown command.

Ok, this looks good. I have a few questions though:

  1. I understand that this implementation redirects all new user's connections which are coming to the node which is in shutdown state, is this correct?

  2. What about users who are currently connected? Will they get a redirect as well? I can see that right now there is an option to send a notification to existing users. I think it would be a good idea to send a notification and then after some time (30 - 60 seconds) a redirect.

  3. What about other cluster nodes, are they aware that one of the cluster nodes is in shutdown state? Will they redirect users to the node which is being shutdown?

Andrzej Wójcik (Tigase) commented 9 years ago

OK, so about your questions:

  1. yes, this is correct

  2. Current implementation allows to notify users (optionally) that node will be shut down and after selected time node will be shutdown but first node will send to every user information to which node it should reconnect - see-other-host stream error.

  3. Other nodes are aware that node is in shutdown state. They are notified in same time you confirm shutdown - so during timeout users will not be directed to this node.

Artur Hefczyc commented 9 years ago

Daniel, please add this information to our documentation.

Daniel Wisnewski commented 9 years ago

Basic documentation uploaded, I will add more detail to clustering writeup. I am currently unable to test functionality so will leave in QA for the time being. %wojtek please test when you have time.

wojciech.kapcia@tigase.net commented 9 years ago

This feature has Target version set to tigase-issue #7.2.0 hence it is included in origin/master branch. Therefore I've made a build from this branch (@tigase-server-dist-7.2.0-20160121.134704-12@, Jenkins is not building it automatically now). With this build I've checked functionality and I confirm it works correctly:

<message from="xmpp-test.com" to="admin@xmpp-test.com/Psi+/2">
<body>Server will be restarted.
During restart you will be disconnected from XMPP server.</body>
</message>

<stream:error>
<see-other-host xmlns="urn:ietf:params:xml:ns:xmpp-streams">node1.xmpp-test.net</see-other-host>
</stream:error>

~

%daniel , given the above we need to change the documentation in release branch back (it's possible to reverse commit) and include similar information in the master branch in the release notes that will be included in version @7.2.0@.

Daniel Wisnewski commented 9 years ago

I've reverted the change from v7.1.0 documentation, closing issue for now, but will remain assigned to me until we start separating v7.1.0 and v7.2.0 changes.

Artur Hefczyc commented 9 years ago

My suggestion is to not close it, but instead adjust Due date to some future point. This way you will get notification when the ticket is due. Otherwise it will be forgotten.

issue 1 of 1
Type
New Feature
Priority
Major
Assignee
RedmineID
3071
Version
tigase-server-8.0.0
Estimation
30h
Spent time
138h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#467
Please wait...
Page is in error, reload to recover