Projects tigase _server server-core Issues #514
NPE on StreamManagementIOProcessor (#514)
Daniele Ricci opened 9 years ago
Due Date
2015-06-04

I'm having multiple NPE in StreamManagementIOProcessor.processIncoming, all of them happening during the "stopping connection" phase.

From what I've seen from the code, looks like e.g. the server gave up the connection (or a new one came through, invalidating the old one), but the old one sent ack anyway. However, StreamManagementIOProcessor doesn't find the OutQueue instance because it has already been deleted by StreamManagementIOProcessor.sendErrorsForQueuedPackets. Am I getting this right?

Here is the stacktrace:

2015-06-03 10:56:52.551 [pool-11-thread-2]  XMPPIOService.processSocketData()  INFO:    c2s@beta.kontalk.net/178.62.44.19_5999_162.218.20.42_48058, type: accept, Socket: ZLIB: TLS: c2s@beta.kontalk.net/178.62.44.19_5999_162.218.20.42_48058 Socket[unconnected], jid: USER_JID_UNDISCLOSED@beta.kontalk.net/7d357507eefb4483, Incorrect XML data: <presence id='k6pWS-21' type='unavailable'></presence><r xmlns='urn:xmpp:sm:3'/><a xmlns='urn:xmpp:sm:3' h='8'/></stream:stream>, stopping connection: c2s@beta.kontalk.net/178.62.44.19_5999_162.218.20.42_48058, exception:
java.lang.NullPointerException
        at tigase.server.xmppclient.StreamManagementIOProcessor.processIncoming(StreamManagementIOProcessor.java:197)
        at tigase.xmpp.XMPPIOService.addReceivedPacket(XMPPIOService.java:618)
        at tigase.xmpp.XMPPIOService.processSocketData(XMPPIOService.java:720)
        at tigase.net.IOService.call(IOService.java:253)
        at tigase.net.IOService.call(IOService.java:94)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I'm using Tigase 7.0.2, but I can see the code of that class didn't change.

Andrzej Wójcik (Tigase) commented 9 years ago

Yes, you are getting this right. But it looks like really rare case delivery for packet and reconnection would need to be very fast which is rather uncommon.

Looking at stacktrace you provided and your analysis, we can only ignore incoming packet as this OutQueue was already processed and we cannot do anything else. For sure we should not keep OutQueue after processing by connection close as it would create bigger issues as queue could be already empty or taken by other connection which resumed stream.

I would like to look deeper into this issue and find out what really happened with this connection and how it was closed before processing packet, could you provide bigger part of logs which would show why this happened? ie. that other connection appeared and forced this one to close or anything similar?

Daniele Ricci commented 9 years ago

Thanks for the feedback. I will provide more logs as soon as I can, probably this evening.

Daniele Ricci commented 9 years ago

I reproduced it in my local environment by starting 100 connections with a random interval from each other (250ms +- 1000ms) and after a while I got this:

http://beta.kontalk.net/tigase-debug.txt

My debug options are:

--debug=server,xmpp.impl

I'm sorry I can't provide more information but inspecting Tigase logs can be really exhausting (believe me, I tried). The original log file was much larger (a few MBs), I only uploaded logs around the exception (especially before the exception).

Andrzej Wójcik (Tigase) commented 9 years ago

Thank you for logs. Part you attached helped me find root cause of this issue.

I would like to point it here that this occured only in specific case when client enabled Stream Management without resumption, sent:

<r xmlns='urn:xmpp:sm:3'/><a xmlns='urn:xmpp:sm:3' h='3'/></stream:stream>

and closed TCP connection just after sending stream termination.

This while it is done by many XMPP clients and client libraries, it is not proper stream termination according to RFC 6120: Section 4.4. Closing a Stream in which you can read that:

If the parties are using either two streams over a single TCP

connection or two streams over two TCP connections, the entity that

sends the closing stream tag MUST behave as follows:

  1. Wait for the other party to also close its outbound stream before
terminating the underlying TCP connection(s); this gives the
other party an opportunity to finish transmitting any outbound
data to the closing entity before the termination of the TCP
connection(s).
  1. Refrain from sending any further data over its outbound stream to
the other entity, but continue to process data received from the
other entity (and, if necessary, process such data).
  1. Consider both streams to be void if the other party does not send
its closing stream tag within a reasonable amount of time (where
the definition of "reasonable" is a matter of implementation or
deployment).
  1. After receiving a reciprocal closing stream tag from the other
party or waiting a reasonable amount of time with no response,
terminate the underlying TCP connection(s).

which forces XMPP client to wait reasonable amount of time before closing TCP connection after sending </stream:stream>. As in your case Tigase XMPP Server was processing incoming after stream termination I suppose that your XMPP client closed TCP connection just after sending </stream:stream>

I would suggest to fix this if possible (or report to client library project you are using) as it may cause lost of data as XMPP server may send you data which your client will never receive due to closing TCP connection before receiving </stream:stream> from server.

However I changed logic in Stream Management to:

  • deal with not existing OutQueue instance and only reporting this cased

  • do not start connection failure logic when it was not possible to send reponse on (as this I suppose caused OutQueue to be processed and removed)

This should eliminate issue you are expiriencing right now and allow proper close of stream without sending errors back to senders of non acked stanzas as stream was successfully closed.

Daniele Ricci commented 9 years ago

Thanks! I will notify Smack developers to address this.

Daniele Ricci commented 9 years ago
issue 1 of 1
Type
Bug
Priority
Normal
Assignee
RedmineID
3164
Version
tigase-server-7.1.0
Spent time
24h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#514
Please wait...
Page is in error, reload to recover