Projects tigase _server server-core Issues #393
MUC occupant list over BOSH (#393)
Closed
Robert Larsen opened 10 years ago
Due Date
2014-11-14

I am seeing problems with busy MUCs over BOSH. This is what I see:

First test:

  1. I join an empty room with a BOSH based client (I have used Strophe.js and a gloox based one).

  2. I then connect 300 users directly on port 5222 in quick succession (using a node.js based client with node-xmpp) and have them join the same room.

  3. The BOSH based client now thinks there are around 100-160 occupants in the room.

Second test:

  1. I connect 300 users directly on port 5222 and have them join the room.

  2. I connect using a BOSH based client (Strophe.js and gloox) and join the room.

  3. The occupant list is sent 15 items at a time with a delay between batches. Sometimes I get my own presence after about 30 seconds but before the entire occupant list has been received, other times it never shows up.

Third test:

  1. I connect 300 users directly on port 5222 and have them join the room.

  2. I connect using a WebSocket based client (Strophe.js) and join the room.

  3. In less than a second I have an occupant list with 301 entries as expected.

Fourth test:

  1. I join an empty room with a WebSocket based client.

  2. I then connect 300 users directly on port 5222 in quick succession

  3. The WebSocket based client now has an occupant list with 301 entries as expected.

BOSH keeps failing in this regard. I use a Tigase from the dev. branch: Tigase ver. 5.3.0-SNAPSHOT-b3662/c99b076f (2014-09-16/09:42:49)

I could not choose this in the "Target version" or "Applicable version" box so I chose the git/master one.

I have attached a tcpdump of a strophe.js based BOSH session where 115 occupants were received, then my own, but there were 300 users in the room.

300.pcap

Artur Hefczyc commented 10 years ago

Robert, Bosh is unreliable "by design". Especially under a high traffic it is unreliable. This is because of the way it works - a new, separate TCP/IP connection for each request. In case of 300 stanzas sent to web client, you have 300 opened and closed TCP/IP connections in a very short period of time. This leads to many problems. Of course, each web browser and each JS client behaves differently and presents different problems. We try to optimize it and push multiple stanzas within one connection (batching mode) but it is not always possible and does not fully solve the problem.

In most cases we investigated in the past the problem is on the client side, either in web browser or in JS clients. Of course, in the case presented by you, I am not certain whether the problem is on the client or on the server side and we would need to investigate it and run some tests. Unfortunately this is very time consuming task and I cannot promise you to look at it very soon.

Robert Larsen commented 10 years ago

No, BOSH is stable as a rock. We've been using it for years with ejabberd and it just works even thou ejabberd only sends ONE stanza per request, but ejabberd has other issues which forces us to move to something else. The clients I tested with have also been tested against ejabberd where they work perfectly.

With the "Connection: keep-alive" header you can send multiple HTTP requests on the same TCP connection, so you do not have to have 300 connections and Tigase even supports this. If you take a look at the dump file I attached you will see seven TCP streams.

But if you don't think you can fix this soon I will try PunJab or some other connection manager.

Artur Hefczyc commented 10 years ago

Ok, in such a case we will look at it by the end of this week.

Wojciech, please take a look at the issue.

Robert Larsen commented 10 years ago

Awesome! Thanks.

wojciech.kapcia@tigase.net commented 10 years ago

Robert,

I've tried to replicate the issue and I wasn't able to. I've looked over the code and did not notice potential problems. Could you share machine details, your configuration as well as server statistics after the tests (I assume you perform tests in isolated environments thus it should be possible to have statistics without external influence)?

Robert Larsen commented 10 years ago

This was likely just a problem with our setup then. We have abandoned it in favor of MongooseIM.

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
2457
Version
tigase-server-8.0.0
Spent time
27h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#393
Please wait...
Page is in error, reload to recover