Projects tigase _server server-core Issues #674
Issue with XEP-0198: Stream Management (#674)
Vijay Kaimal opened 9 years ago
Due Date
2016-05-12

Hi,

I am using nightly build from Jan 20,2016 and has enabled XEP-0198: Stream Management in the server. Everything seems to be working fine except in one scenario which is described below : -

  1. User A sends a message to User B who is offline.

  2. Then User A goes offline before User B comes on-line.

  3. User B comes on-line and receives the message. A message received acknowledgement is sent to User A.

  4. But as User A is offline at this stage, the message received acknowledgement from User B is never received by User A.

  5. Because of this, message delivery status in not updated at User A

Is there any solution / work around to fix this issue ?

Tigase.zip Tigase_May10_Build.zip

wojciech.kapcia@tigase.net commented 9 years ago

Vijay Kaimal wrote:

Hi,

I am using nightly build from Jan 20,2016 […]

You are testing nightly which is more than 3 months old. Can you check the scenario in the latest nightly?

Vijay Kaimal commented 9 years ago

I tried installing the nightly build from May 1, but I am not able to start the the XMPP server (I tried both console mode and windows service mode). Details : -

OS : Windows Server 2008 R2, 64 bit

Database : MS SQL

Java : JDK 8, 64 bit

Log files and init.properties file attached. I am not able to proceed because of this issue.

Andrzej Wójcik (Tigase) commented 9 years ago

In wrapper.log file, there is following entry:

INFO   | jvm 1    | 2002/01/17 07:34:05 | WrapperSimpleApp Error: Encountered an error running main:
INFO   | jvm 1    | 2002/01/17 07:34:05 | WrapperSimpleApp Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

which means that there is too small amount of memory assigned to JVM running Tigase XMPP Server.

Vijay Kaimal commented 9 years ago

I have deployed Tigase in a server that has 4 GB Ram. From the time I started the Tigase till it stopped ( with in 1 - 2 minutes), the memory usage never went above 900 MB and the CPU usage also didn't go over the board.

I have successfully installed older versions of Tigase Nightly builds in this environment, so I am confused.

Andrzej Wójcik (Tigase) commented 9 years ago

Maximum memory size used by JVM for heap is configured in wrapper.conf@. We ship Tigase XMPP Server with default configuration with small amount of allocated memory (*500MB* by default) in following line of @wrapper.conf file:

wrapper.java.maxmemory=500

In your case you may need more configure more memory due to enabled components, so you may need to change this line.

As far as I can remember, not so long ago, we changed number of connections to database used by default from 10 to *4 ** number_of_cpus (in your case this is 32). Due to this change it is possible that memory usage of recent versions increased - #3856.

%wojtek - if this is expected result of this change?

wojciech.kapcia@tigase.net commented 9 years ago

It's quite possible cause of the issue. In the meantime we've also changed the defaults in etc/tigase.conf to actually use defaults from JVM as they should be better in most case (and only use explicit configuration for production use) - vide #3567 and commit:91393c99bf3787c0224f01768e404a4a8d339382. Keeping that in mind I've adjusted the defaults in wrapper.conf to match settings used in tigase.conf.

Vijay Kaimal commented 9 years ago

I have taken the nightly from May 10,2016(tigase-issue #7.1.0-SNAPSHOT-b4218) and server seems to be working properly.

I was assuming that the scenario I mentioned in the first part of the bug can be fixed by enabling "Offline storage of message receipts( messages without body) using this configuration : -

*sess-man/plugins-conf/amp/msg-store-offline-paths[s]=/message/received[urn:xmpp:receipts],/message/store-offline

But this does not seem to work. Message receipts intended for an offline user are not getting stored in the Offline table. Could you please take a look?

Please note, I have enabled Stream Management also, will it cause any issue?

Attached are the init.properties and log files.

Andrzej Wójcik (Tigase) commented 9 years ago

Following configuration:

**sess-man/plugins-conf/amp/msg-store-offline-paths[s]=/message/received[urn:xmpp:receipts],/message/store-offline **

enable storage of messages without body element but only if this messages contains <received xmlns="urn:xmpp:receipts"/> or <store-offline/> element.

I reviewed your logs and was not able to found any message which would match to this condition - mostly I was able to find message with <body/> element.

It would help if you could point me to line in log which contains message which was not stored but you expected it to be stored. (may be quote of this part of log you attached) as if message was delivered to server then it should be in this log.

Daniele Ricci commented 9 years ago

I've been having delivery issues of message receipts too (release branch, commit f17d12a5), but I'm using the OfflineMessages module instead of AMP. I'll post some logs with my analysis soon.

Daniele Ricci commented 9 years ago

I noticed a considerable improvement in message reliability by fixing this code:

https://github.com/kontalk/tigase-server/commit/ed3750ec3994136c59e02d8c02bdfb04010138b1

I thought it would make more sense to actually set packets as processed only when clients are actually available for delivery.

Andrzej Wójcik (Tigase) commented 9 years ago

Daniele, first of all please do not hijack threads. As it was already stated issue reported in this thread is most likely not related to StreamManagement!

This check you changed and result was designed properly. It is there to handle messages send to bare jid and deliver this messages only to resources connected after message was originally processed to rule out possibility of redelivery of message to some resources to which message should be already delivered.

Now in your version message which maybe was properly delivered to other resource (as sessionsForMessageDelivery is not empty - check is few line about your change) it will result in redelivery of message to all connected resource. This is something which should not happen.

Daniele Ricci commented 8 years ago

Andrzej Wójcik wrote:

Daniele, first of all please do not hijack threads. As it was already stated issue reported in this thread is most likely not related to StreamManagement!

I apologize. It was not my intention.

This check you changed and result was designed properly. It is there to handle messages send to bare jid and deliver this messages only to resources connected after message was originally processed to rule out possibility of redelivery of message to some resources to which message should be already delivered.

Now in your version message which maybe was properly delivered to other resource (as sessionsForMessageDelivery is not empty - check is few line about your change) it will result in redelivery of message to all connected resource. This is something which should not happen.

I'm sorry but I still don't understand how it would work. Consider this scenario:

  1. Alice sends a message to Bob's full JID

  2. But Bob has a broken connection and he never receives the message (or it does, but the server doesn't get the ack, same thing)

  3. Bob realizes about the broken connection (the server still does not) and reconnects

  4. Server disconnects the old client for conflict (because Bob is using the same resource)

  5. At this point, the message is marked with a delivery-error and stamped with time at point 5 and begins its redelivery process

  6. Message is stored to offline storage (including the delivery-error element) because no connection is available to handle it yet

  7. Bob sends the new available presence for the connection started at 4 and server redelivers messages from offline storage

  8. C2SDeliveryErrorProcessor checks that the timestamp of the delivery error is 5 but the only available connection was started at 4, therefore discarding the message

At point 9, C2SDeliveryErrorProcessor returned true anyway even if no message was delivered. The message is now lost.

Is my reasoning right?

P.S. since we are going OT, we can move this discussion in another thread if you want it to continue.

Thanks

EDIT

Another note on this. If what I said in my comment is right, my modification is not a complete fix anyway. My patch allows for the lost message to be redelivered again at the next connection - which I don't like it either by the way, it's a dirty and temporary workaround until I can find a better solution.

Andrzej Wójcik (Tigase) commented 8 years ago

Daniele, please report this case in separate bug report. I think I have a solution for this and few ideas, however I would like to keep you informed about progress and it may be difficult if you will not report it as separate bug.

Daniele Ricci commented 8 years ago

I've opened #4380 before I found out about #4262, sorry. Since the issue seems to be fixed, please close #4380 and forgive my rush. I'll do further tests and provide more feedback if needed. Thanks!

Andrzej Wójcik (Tigase) commented 8 years ago

Closing due to inactivity/lack of response.

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
4142
Version
tigase-server-7.1.0
Spent time
3h 45m
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#674
Please wait...
Page is in error, reload to recover