Projects tigase _server server-core Issues #424
Tigase not closing websocket connnections (#424)
Sam Wright opened 10 years ago
Due Date
2015-02-06

We're currently having an issues with tigase not closing open tcp connections. From my research this the kernel waiting on the process (tigase) to fully close the connection before cleaning up the socket connection. Are you aware of any conditions that would allow this to happen or any updates web browsers that can cause this?

[root@chat1 ~]# netstat -anl | grep 5290 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

CLOSE_WAIT 2286

ESTABLISHED 117

LISTEN 1

chat-websockets.png

Artur Hefczyc commented 10 years ago

First of all, why this is a problem for you that connections are not being closed or, most likely they are being closed with some delay?

To my research and knowledge this is not a bug in Tigase. The socket closing delay is a combination of many factors:

  1. Various TCP/IP level timeouts

  2. OS level settings

  3. Also Tigase almost never closes connection by itself. The connection closure is always initiated by the client side, unless connection idle timeout is reached on the server side. Even if the connection XMPP stream closure is initiated by the client, Tigase still does not close the socket because it expects (RFC specifies that the server should expect) that the client may send some last data or some packets may be still in transfer.

Usually the TCP/IP connection is in CLOSE_WAIT state for a duration of timeout specified on the OS level. Right now I do not remember the default value, but it can be anything between 300 and 10 seconds.

Sam Wright commented 10 years ago

This appears to be more of a leak. Over time the number of open_wait connections grow. Eventually this will eat up file descriptors plus these consumed ports seem to be inactive and extraneous. This is definitely not a trend I want continuing.

I've been able to "track" a source ip:port combination which has been on a server for multiple days. So there doesn't seem to be a delay in closing it's simply not closing any at all.

I've also located that the close_wait timeout is set to 60 seconds.

net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60

Artur Hefczyc commented 10 years ago

You are correct. Multiple days is certainly not a delay in closing. Tigase normally does not close a socket if it "thinks" a user is connected to the server and is using the socket. For the XMPP protocol it is quite common to have users connected to the server for several days. But this is usually common for standard XMPP connections, not web connections such as Bosh or Websockets.

We are involved in several projects using Tigase on a great scale for millions of online users but Websockets are quite new and probably none of the use Websockets extensively. We will investigate the problem to see what we can do about it.

Andrzej, please take a look at it. What would be the best way to handle the issue.

Sam Wright commented 10 years ago

Andrezj feel free to add to this ticket if you need anything from my end.

Andrzej Wójcik (Tigase) commented 10 years ago

I tried to replicate this issue with WebSocket connections not being properly closed (CLOSE_WAIT).

Using older version 5.3.0-SNAPSHOT from end of June of 2014 and I was able to get TIME_WAIT after closed connection on client side. Since I expect this might be similar issue to yours, I tried to replicate this issue on newest Tigase XMPP Server 7.0.0-SNAPSHOT, but I was not able to do this. As in 7.0.0-SNAPSHOT we improved a WebSocket support implementation, I suppose this was fixed by one of changes.

For further analysis of issue I would suggest to check logs of Tigase XMPP Server for exceptions which may suggest cause of this issue in Tigase XMPP Server 5.2.1.

Sam Wright commented 9 years ago

I haven't verified with tigase 7.0 yet and I'm still having an issue. However, I do not think it is a performance issue like I once thought it was. This ticket could be closed. (Or at least remove me as the assignee, I'm getting tons of emails about a due date)

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
RedmineID
2645
Spent time
6h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#424
Please wait...
Page is in error, reload to recover