Projects tigase _server server-core Issues #1046
Cluster Node not correctly unregisterded (#1046)
Closed
wojciech.kapcia@tigase.net opened 5 years ago
Affected versions
tigase-issue #8.1.0

tl,dr; There is an issue with registering/unregistering cluster nodes (wrong state, duplication)


Details:

After a flurry of monitor notifications it turned out that for some reasons some cluster nodes may not be correctly unregistered.

Current instances:

Instance            Tigase group  Name                    State      Public DNS
i-070996b522632c5e2 tigase-main-xmpptigase-main-xmpp        running   ec2-52-35-187-124.us-west-2.compute.amazonaws.com
i-0ad9bc21f4b829342 tigase-main-xmpptigase-main-xmpp        stopping  ec2-18-237-84-154.us-west-2.compute.amazonaws.com
i-0af2254eaa6208fce tigase-main-xmpptigase-main-xmpp        running   ec2-34-220-178-249.us-west-2.compute.amazonaws.com

(stopping was the faulty one: Public DNS (IPv4): ec2-18-237-84-154.us-west-2.compute.amazonaws.com, Private DNS: ip-10-0-14-118.us-west-2.compute.internal)

Current cluster_nodes content:

mysql> select hostname,secondary,last_update from tig_cluster_nodes;
+-------------------------------------------+----------------------------------------------------+----------------------------+
| hostname                                  | secondary                                          | last_update                |
+-------------------------------------------+----------------------------------------------------+----------------------------+
| ip-10-0-13-118.us-west-2.compute.internal | ec2-34-220-178-249.us-west-2.compute.amazonaws.com | 2019-07-01 09:34:46.497000 |
| ip-10-0-25-189.us-west-2.compute.internal | ec2-52-35-187-124.us-west-2.compute.amazonaws.com  | 2019-07-01 09:34:52.355000 |
+-------------------------------------------+----------------------------------------------------+----------------------------+
2 rows in set (0.00 sec)

And internal state says

<iq to="cl-comp@tigase.org" id="ad89a" type="get">
<query xmlns="http://jabber.org/protocol/disco#items" node="http://jabber.org/protocol/commands"/>
</iq>

<iq to="wojtek@tigase.org/atlantis/city/psi+" id="ad89a" from="cl-comp@tigase.org" type="result">
<query xmlns="http://jabber.org/protocol/disco#items">
<item name="tigase:cluster connected" node="ip-10-0-14-118.us-west-2.compute.internal" jid="cl-comp@tigase.org"/>
<item name="tigase:cluster disconnected" node="ip-10-0-35-169.us-west-2.compute.internal" jid="cl-comp@tigase.org"/>
<item name="tigase:cluster disconnected" node="ip-10-0-8-231.us-west-2.compute.internal" jid="cl-comp@tigase.org"/>
<item name="tigase:cluster connected" node="ip-10-0-13-118.us-west-2.compute.internal" jid="cl-comp@tigase.org"/>
</query>
</iq>

while being connected to ip-10-0-25-189.us-west-2.compute.internal:

<iq to="stats@tigase.org" id="adf2a" type="set">
<command xmlns="http://jabber.org/protocol/commands" node="stats"/>
</iq>

<iq to="wojtek@tigase.org/atlantis/city/psi+" id="adf2a" from="stats@tigase.org" type="result">
<command xmlns="http://jabber.org/protocol/commands" node="stats" status="executing">
<x xmlns="jabber:x:data" type="form">
…
<field var="message-router/Local hostname">
<value>ip-10-0-25-189.us-west-2.compute.internal</value>
</field>
…
</x>
<actions execute="complete">
<complete/>
</actions>
</command>
</iq>

And Tigase tries to deliver packets to stale connection:

2019-07-01 09:06:16.802 [in_6-cl-comp]     ClusterConnectionManager.writePacketToSocket()  WARNING: No cluster connection to send a packet: from=null, to=null, DATA=[cluster id="cl-135757" to="sess-man@ip-10-0-14-118.us-west-2.compute.internal" type="set" pr="HIGH" from="sess-man@ip-10-0-25-189.us-west-2.compute.internal" xmlns="tigase:cluster"][control][visited-nodes][node-id]sess-man@ip-10-0-25-189.us-west-2.compute.internal[/node-id][/visited-nodes][method-call name="sess-man-packet-forward-sm-cmd"][par name="packet-from"]sess-man@ip-10-0-25-189.us-west-2.compute.internal[/par][/method-call][first-node]sess-man@ip-10-0-25-189.us-west-2.compute.internal[/first-node][/control][data][message id="ac7fa" to="andrzej.wojcik@tigase.org/MacBook Pro (Andrzej).local" type="chat" xmlns="jabber:client" from="wojtek@tigase.org/atlantis/city/psi+"]
[active xmlns="http://jabber.org/protocol/chatstates"/]
[/message][/data][/cluster], SIZE=768, XMLNS=tigase:cluster, PRIORITY=HIGH, PERMISSION=NONE, TYPE=set
issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Spent time
2h 30m
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#1046
Please wait...
Page is in error, reload to recover