Projects tigase _server server-core Issues #207
Nodes fail to join the cluster when there's large numbers of existing connections due to DOS protections in XML parser (#207)
Unknown opened 1 decade ago

We restarted our cluster to deal with the session-close performance problem, but just encountered another issue: when there's a large number of existing users on other nodes, new nodes can't join the cluster since the initial cluster cache sync packet is too large for the XML parser, trips the DOS protections and then causes the cluster connection to be closed. Effectively this means that if a node goes down (as this one did) it won't rejoin the cluster but will still accept c2s connections, leading to a "split brain" where anyone on that node is isolated from the rest of the cluster.

Here's the log message in question, I won't paste in the full packet since it's far too large:

2013-06-02 17:50:33 ClusterConnectionManager.xmppStreamOpened() INFO: Stream opened: {id=c7c9b815-9957-4f74-8ff6-ef538b49c5e3, to=8.gs.ea.com, xmlns:stream=http://etherx.jabber.org/streams, from=2.gs.ea.com, xmlns=tigase:cluster}

2013-06-02 17:50:33 XMPPIOService.processSocketData() INFO: null, type: connect, Socket: nullSocket[addr=5.gs.ea.com/10.82.107.125,port=5277,localport=38639], Incorrect XML data: ="http://jabber.org/protocol/caps" node="http://ea.com/xmpp" hash="sha-1"/></presence {packet continues for several hundred lines} , stopping connection: null, exception:

tigase.xmpp.XMPPParserException: Too many elements for staza, possible DOS attack.

    at tigase.xmpp.XMPPDomBuilderHandler.newElement(XMPPDomBuilderHandler.java:379)

    at tigase.xmpp.XMPPDomBuilderHandler.startElement(XMPPDomBuilderHandler.java:347)

    at tigase.xml.SimpleParser.parse(SimpleParser.java:230)

    at tigase.xmpp.XMPPIOService.processSocketData(XMPPIOService.java:456)

    at tigase.net.IOService.call(IOService.java:235)

    at tigase.net.IOService.call(IOService.java:87)

    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

    at java.util.concurrent.FutureTask.run(FutureTask.java:138)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

    at java.util.concurrent.FutureTask.run(FutureTask.java:138)

    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

    at java.lang.Thread.run(Thread.java:662)
Artur Hefczyc commented 1 decade ago

What version of the code do you use on your installation, in particular we have to know both the Tigase XMPP Server code version and XML Tools.

Unknown commented 1 decade ago

Server is 5.1.3 and xmltools is 3.4.2 (pulled in via the server package).

Done some checking though and the issue is that the XML parser has a hardcoded limit of 1000 elements total (in XMPPDomBuilderHandler) while the cluster sync is broken up into chunks of 1000 records as well. Except each cluster sync record is 7+ elements. Changing SYNC_MAX_BATCH_SIZE to 100 in SessionManagerClustered appears to keep it under the limit.

wojciech.kapcia@tigase.net commented 1 decade ago

implemented configuration of DoS protection in XML parser on per ConnectionManager basis, increased defaults for ClusterConnectionManager, more information [--elements-number-limit](http://www.tigase.org/content/elements-number-limit)

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
1364
Version
tigase-server-5.2.0
Spent time
109h 30m
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#207
Please wait...
Page is in error, reload to recover