-
I've implemented "ghostbuster" and adjusted
PresenceCollectorRepository
to keep additional values and have a better API (classes instead of Map<Map>). I've adjusted synchronization to properly removeUserEntry
from a mapServiceEntry
without race conditions (multiple concurrent locks based on hashCode() of bare JIDs).I still need to improve the synchronization of
lastSeen
between cluster nodes, but even without that feature will work (just ghostbuster could work with a delay).However, I'm not sure if we should sync
lastSeen
, maybe having it different between cluster nodes will have a small to no impact. Only initial "ping" after reconnection of cluster nodes would be delayed by an hour but after that, it would just work as if we synced this value. -
I think that there is a problem with addressing:
[2021-05-27 15:59:40:869] [WARNING ] [ in_1-s2s ] S2SConnectionManager.processPacket(): Packet processing exception tigase.xmpp.PacketInvalidTypeException: The packet has already 'error' type: from=…@jabb.im/…, to=tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@39f703a, DATA=<iq xmlns="jabber:client" from="…@jabb.im/…" type="error" id="4a714a16-8180-41a7-892b-435de599b162" to="tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@39f703a"><ping xmlns="urn:xmpp:ping"/><error code="406" type="modify"><not-acceptable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xml:lang="en" xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">S2S - Incorrect source address (39f703a) - none of any local virtual hosts or components.</text></error></iq>, SIZE=484, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=NONE, TYPE=error, STABLE_ID=null at tigase.xmpp.Authorization.getResponseMessage(Authorization.java:480) at tigase.server.xmppserver.S2SConnectionManager.processPacket(S2SConnectionManager.java:215) at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1404)
-
@wojtek It should be fixed now in Tigase PubSub and in Tigase ACS PubSub. The main issue was with the serialization of entries to sync items between cluster nodes but PubSub was missing a method to retrieve service JID from the item. I've added a missing method and fixed the issue in ACS.
-
Currently we have this version deployed:
root@ad4950537f90:/home/tigase/tigase-server# for jar in `ls jars/*pubsub*jar` ; do unzip -qc ${jar} META-INF/MANIFEST.MF | grep "Implementation-Version" ; done Implementation-Version: 3.0.0-SNAPSHOT-b153/9dad8da7 Implementation-Version: 5.0.0-SNAPSHOT-b790/9db75655
It should contain the fix (https://github.com/tigase/tigase-acs-pubsub/commit/9dad8da78429e57762ef32d844919f5440e53ca7) yet the errors continue:
[2021-05-28 09:40:14:490] [FINEST ] [ in_5-s2s ] MessageRouter.processPacket() : Processing packet: from=null, to=null, DATA=<iq xmlns="jabber:client" to="e…k@z….im/1716812579775850753812799012" type="get" id="8797a4d5-431c-4fab-b3b6-c77ca2595993" retryCount="15" from="tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@358189b5" delay="1"><ping xmlns="urn:xmpp:ping"/></iq>, SIZE=271, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=NONE, TYPE=get, STABLE_ID=null [2021-05-28 09:40:15:013] [WARNING ] [ in_7-s2s ] S2SConnectionManager.processPacket(): Packet processing exception tigase.xmpp.PacketInvalidTypeException: The packet has already 'error' type: from=e…k@z….im/1716812579775850753812799012, to=tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@358189b5, DATA=<iq from="e…k@z….im/1716812579775850753812799012" xmlns="jabber:client" to="tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@358189b5" type="error" id="8797a4d5-431c-4fab-b3b6-c77ca2595993"><ping xmlns="urn:xmpp:ping"/><error code="406" type="modify"><not-acceptable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" xml:lang="en">S2S - Incorrect source address (358189b5) - none of any local virtual hosts or components.</text></error></iq>, SIZE=509, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=NONE, TYPE=error, STABLE_ID=null at tigase.xmpp.Authorization.getResponseMessage(Authorization.java:480) at tigase.server.xmppserver.S2SConnectionManager.processPacket(S2SConnectionManager.java:215) at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1404)
I checked the CloudWatch logs and it has entries like this:
021-05-28 08:50:13.711 TRACE [scheduler_pool-12-thread-2-pubsub] t.p.Ghostbuster.ping(): for tigase.pubsub.repository.PresenceCollectorRepository$ServiceEntry@7a434434 sending ping to 74@jwchat.org/169967308614608423461620839002768433
and looking at the code it would mean that the actual JID stored in
entry.getServiceJid()
is incorrect (?!)It seems it should be ok and fixed. After taking a look at the amount of errors it seems those are declining. Should we assume that the incorrect values were added to the collections and then during the upgrade those faulty values were synchronised back but with the pass of time those should be removed and the issue should resolve itself over time?
Captura de pantalla 2021-05-28 a las 13.58.43.png Captura de pantalla 2021-05-28 a las 14.12.21.png
-
Yes, you are correct. Invalid values were in the cache and were propagate to new cluster nodes when they joined the cluster. Now, as we do not have old/bad nodes, we need to wait for a new nodes to clean up those entries and everything should be back to normal.
Type |
Task
|
Priority |
Normal
|
Assignee | |
Version |
Candidate for next minor release
|
Spent time |
0
|
As per our discussion in chat - it could be convenient to have a mechanism similar to MUC's ghostbuster for PEP.