server falls down when some tls connections finished without full completed handshake (#1399)

Unknown opened 5 years ago

Steps to reproduce the behavior:

xmpp client requested tls connection (<starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>).
xmpp client not send handshake. a. client have problems but it is not malicious. sent: </stream:stream> b. client is malicious.

Clients attempt to connect to the server each 60 seconds. ~300 problem clients generate 5 connections per second. Our cluster with 4 nodes has completely overloaded (LA > 80 on each node) All connections (300,000) were reset by ping timeout.

Typical stack example:

 "in_10-c2s" #107 prio=5 os_prio=0 tid=0x00007f8680501000 nid=0x32c8 runnable [0x00007f85693d2000]
	java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.NativeThread.current(Native Method)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:326)
	- locked <0x00000000e7fda428> (a java.lang.Object)
	- locked <0x00000000e7fda488> (a java.lang.Object)
	at tigase.io.SocketIO.read(SocketIO.java:235)
	at tigase.io.TLSIO.read(TLSIO.java:199)
	at tigase.io.TLSIO.writeBuff(TLSIO.java:487)
	at tigase.io.TLSIO.write(TLSIO.java:343)
	at tigase.net.IOService.writeData(IOService.java:1313)
	at tigase.xmpp.XMPPIOService.writeRawData(XMPPIOService.java:370)
	at tigase.server.xmppclient.ClientConnectionManager.processCommand(ClientConnectionManager.java:594)
	at tigase.server.xmppclient.ClientConnectionManager.processPacket(ClientConnectionManager.java:664)
	at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1570)

May be need async implementation to fix case (b). May be our night hotfix for case (a) will helps to explain the problem:

/src/main/java/tigase/io/TLSIO.java
diff --git a/src/main/java/tigase/io/TLSIO.java b/src/main/java/tigase/io/TLSIO.java

fix-overload-cpu-when-connections-dropped.txt

Activities

Unknown commented 5 years ago

Could you tell us at least the version of Tigase XMPP Server which you are running? We need to have the exact same source code to check the stack trace and review possible issues.
Unknown commented 5 years ago

tigase server 7.1.1
Unknown commented 5 years ago

@stefandt were you able to reproduce it with the current stable, i.e. 8.0.0?
Unknown commented 5 years ago

Sorry, I can not. This is a production environment. In 8.0.0 we have 1000 cycles (100000 -> 1000). Perhaps this will have less effect on the server. In any case, may be we have a vulnerability to attack. (case b)

P.S. in our case 20 cycles are almost insensible.
Unknown commented 5 years ago

We've reviewed the code and your logs and improved handling of this use case (when the TLS handshake is not ready while the server wants to send data).

In the linked commit, there is a possible fix, which we are going to check before merging it to the next version (8.2.0) as current version 8.1.0 is already being prepared for a release.
Unknown commented 5 years ago

I think that it is important to release a 8.1.1 with this fix, no?
Unknown commented 5 years ago

In version 8.0.0 there was an improvement already in place so the impact of that to 8.1.0 is very limited. Due to that and severity of change, additional improvement was done in 8.2.0-SNAPSHOT and may be released at some point in 8.1.1 when it will be verified to work properly.

So there is no need to release this as a bugfix in 8.1.1 at this point.
Login to comment

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/server-core#1399