xmpp client not send handshake.
a. client have problems but it is not malicious.
sent: </stream:stream>
b. client is malicious.
Clients attempt to connect to the server each 60 seconds.
~300 problem clients generate 5 connections per second.
Our cluster with 4 nodes has completely overloaded (LA > 80 on each node)
All connections (300,000) were reset by ping timeout.
Typical stack example:
"in_10-c2s" #107 prio=5 os_prio=0 tid=0x00007f8680501000 nid=0x32c8 runnable [0x00007f85693d2000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:326)
- locked <0x00000000e7fda428> (a java.lang.Object)
- locked <0x00000000e7fda488> (a java.lang.Object)
at tigase.io.SocketIO.read(SocketIO.java:235)
at tigase.io.TLSIO.read(TLSIO.java:199)
at tigase.io.TLSIO.writeBuff(TLSIO.java:487)
at tigase.io.TLSIO.write(TLSIO.java:343)
at tigase.net.IOService.writeData(IOService.java:1313)
at tigase.xmpp.XMPPIOService.writeRawData(XMPPIOService.java:370)
at tigase.server.xmppclient.ClientConnectionManager.processCommand(ClientConnectionManager.java:594)
at tigase.server.xmppclient.ClientConnectionManager.processPacket(ClientConnectionManager.java:664)
at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1570)
May be need async implementation to fix case (b).
May be our night hotfix for case (a) will helps to explain the problem:
Could you tell us at least the version of Tigase XMPP Server which you are running? We need to have the exact same source code to check the stack trace and review possible issues.
Unknown commented 4 years ago
tigase server 7.1.1
Unknown commented 4 years ago
@stefandt were you able to reproduce it with the current stable, i.e. 8.0.0?
Unknown commented 4 years ago
Sorry, I can not. This is a production environment.
In 8.0.0 we have 1000 cycles (100000 -> 1000). Perhaps this will have less effect on the server.
In any case, may be we have a vulnerability to attack. (case b)
P.S. in our case 20 cycles are almost insensible.
Unknown commented 4 years ago
We've reviewed the code and your logs and improved handling of this use case (when the TLS handshake is not ready while the server wants to send data).
In the linked commit, there is a possible fix, which we are going to check before merging it to the next version (8.2.0) as current version 8.1.0 is already being prepared for a release.
Unknown commented 4 years ago
I think that it is important to release a 8.1.1 with this fix, no?
Unknown commented 4 years ago
In version 8.0.0 there was an improvement already in place so the impact of that to 8.1.0 is very limited. Due to that and severity of change, additional improvement was done in 8.2.0-SNAPSHOT and may be released at some point in 8.1.1 when it will be verified to work properly.
So there is no need to release this as a bugfix in 8.1.1 at this point.
Steps to reproduce the behavior:
<starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>
).Clients attempt to connect to the server each 60 seconds. ~300 problem clients generate 5 connections per second. Our cluster with 4 nodes has completely overloaded (LA > 80 on each node) All connections (300,000) were reset by ping timeout.
Typical stack example:
May be need async implementation to fix case (b). May be our night hotfix for case (a) will helps to explain the problem:
fix-overload-cpu-when-connections-dropped.txt