Received send messages as incomming (#69)
Closed
wojciech.kapcia@tigase.net opened 7 years ago

While trying to figure out how to connect to tigase.im/sure.im I've run into weird problem:

  • I was visible as online in Psi from different account (subscription: both)

  • I was "offline" in iOS messanger (reported by app)

  • I could receive messages from other client

  • when sending messages they... got stuck

  • above messages got delivered back to me after a couple of minutes.

--- EDITED ---

  • Accounts reported account as connected...

ios-weird-chat.jpg ios1.jpg ios2.jpg

Andrzej Wójcik (Tigase) commented 7 years ago

Do you have any way to tell me how I can replicate this issue?

Is it still happening? Or it happened when we were having issues with tigase.im service being slow and it was happening on one of account hosted there?

I think that there was some issue on a server side as the app was not knowing it was "online" while it received messages. At least this is what I think from your description.

In my opinion, this is what happened here. You sent messages which were slowly processed on the server side. Then finally server decided to bounce them back with an error type, but without modified body element. Then client, when it received this message, stored it as new ignoring its type as it was a plain incoming message. Normally server appends some content to body element to notify user that there was an error, but it looks like it did not in this case.

I wonder what we should do in this case? As I may be not able to identify which message was not delivered as it may have body changed and no id.

%bmalkow What Android Messenger does in case it receives message with type @error@?

Bartosz Małkowski commented 7 years ago

Android Messenger will display received error message on red background and with description of error.

wojciech.kapcia@tigase.net commented 7 years ago

Andrzej Wójcik wrote:

Do you have any way to tell me how I can replicate this issue?

Is it still happening? Or it happened when we were having issues with tigase.im service being slow and it was happening on one of account hosted there?

I think that there was some issue on a server side as the app was not knowing it was "online" while it received messages. At least this is what I think from your description.

I was trying to reproduce it without success.

It's quite possible, however this was addressed to other contact (not on iOS) so it should simply deliver the message. It may have happened there was some problem with the slow delivery and it resulted in the bounced message (with a hidden error - is there a log?) and the type of the message was ignored.

I think that we could, if the message type is error either try to match it to the send messages (based on id if present?) and if not display it in a manner similar to Android Messenger.

Another possibility - does the push component send full message in the push payload or only count of the elements? Maybe it was somehow received from push (given that I wasn't connected)? and displayed?

Andrzej Wójcik (Tigase) commented 7 years ago

%wojtek Push notifications contain message up to 1000 chars, so it is possible that you have seen "messages" which were delivered by notifications and your account was connected but not authorized yet. Messenger, in this case, will still report it as Not connected and if it somehow tried to send a message then the server could bounce it back with not-authorized error.

Andrzej Wójcik (Tigase) commented 7 years ago

This took me a while as it occurred that I needed to implement a new way to handle error messages (direct responses and errors from the archive). During a discussion with Wojtek, we decided to mark messages which delivery failed if possible and display a plain error message if it is not possible to find a message which caused this error. Also, it was suggested to add an option to allow the user to easily resend message which delivery failed which I implemented as well.

Wojtek, this is a whole new mechanism now to handle this error which needs to be tested. I've tested and in my opinion, it works fine, but we should check more extensively handling of the received error message for 1-1 chats and MUC. Also, we need to verify that error messages stored in message archive are displayed properly when retrieved from the archive (synchronization of messages).

It will be part of a next build.

wojciech.kapcia@tigase.net commented 7 years ago

The issue is... weird... I've updated messanger to the latest build (8) and tried to test it with the same contact/chat. After opening the chat contact was marked as online (my second account on tigase.org even though I was online). I've send one message and it was displayed as received(!) without error with old date (29/8 21:36, actual time is 2017-09-07 19:30), next messages were displayed as mine (in blue), but still with old date (29th), after three messages I started getting error messages (in red) with "S2s desitnation not faound" and those were the 3 new messages I've send (still time in the past), after those three next messages were the ones which I typed for which I received error (so basically what I've typed was displayed with 3 message shift) but now with correct time.

There are a couple factors to consider (which I've realised only now):

  • my tigase.org account has privacy list (but the error should be different);

  • I had enabled push on my account in iOS (but the account was inactive);

  • messages were send from new tigase.im account.

Description of screenshots:

  • in the ios1:

    • sbhgnm - is the last message from the couple of days ago;

    • xxx - this is the first message from today;

    • ggg, ffv, sbhgnm - these are 3 messages send the other day which appeard after I tried sending new massages

    • next 3 errors (with the same message) appeared when I was trying to send new messages (ghj, fjjgh, fgb -- viva originality ;) )

  • ios2;

    • after sending yet another 3 messages the new ones from today (last listed above) finally showed with correct time
Andrzej Wójcik (Tigase) commented 7 years ago

%wojtek Quick question, have you tried killing an app and entering to the app once again to check if this could help?

I'm asking as it looks like a race condition within a caching mechanism used for data waiting to be displayed.

Andrzej Wójcik (Tigase) commented 7 years ago

I've found a few issues on your screenshots and fixed them. I think that all should work now.

wojciech.kapcia@tigase.net commented 7 years ago

I've been testing the application and couldn't reproduce the issue. But I think, that the problem may have been related to me using Privacy List on my tigase.org account (with allowing communication only from authenticated contacts) - this would explain the error, but this should be different error (not the s2s destination not found).

What's more - I was testing this more today and:

  • my account in psi correctly received service unavailable error when trying to send message to my tigase.org account (after removing subscription)

  • same account in iOS messenger didn't receive any response (just an assumption after not seeing any duplicate nor error message).

Andrzej Wójcik (Tigase) commented 7 years ago

I think that we have a different issue here. I suppose that there are some connectivity issues between some tigase.im and some tigase.org cluster nodes. Depending on the node from which you send a message over s2s from and to it works or returns s2s errors. S2S errors are usually delayed by a few minutes as Tigase tries to send the same message few times over S2S in case of a failure.

I suppose so, because, Psi+ does not support see-other-host so the server will be chosen at random or is pinpointed by you in configuration. This server s2s connectivity works fine.

Another cluster node to which see-other-host redirects you fails with s2s and so this is reproducible on a client which supports @see-other-host@.

Andrzej Wójcik (Tigase) commented 7 years ago

Please check my previous comment.

I've reviewed connectivity of sure.im cluster servers with tigase.org domain and found that orange.sure.im is damn slow. Establishing a connection to tigase.org took over 2 minutes and there was almost no load on the server (around 12:00 CEST).

I've looked into this issue and found that it takes a long time to get a response from DNS server. I've modified /etc/resolve.conf at orange.sure.im and it looks now like that:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 198.27.120.208
nameserver 8.8.8.8
#nameserver 192.99.15.31
#nameserver 213.186.33.99
search tigase.org

With this change, DNS resolution is very fast and establishing connection to tigase.org from tigase.im took less than 7 seconds instead of over 2 minutes before this change!

I've already made a similar fix on orange.sure.im a few months back but it got lost somehow. We need to make sure it is permanent this time.

This DNS issue could lead exactly to behavior which you see. As when a lot of requests for DNS are sent they are limited to a number of threads used and DNS resolution is a blocking task which leads to packet queuing. S2S connection has timeout of 15 minutes in which it needs to be established and in another case it will return error @s2s destination not found@. In case of slow DNS resolution and increasing queue, it may happen to be very slow, ie. if single resolution takes 5 second then 180 resolutions on a single thread will deny any possibility to establish S2S connectivy!

I suppose that this is our issue which you found during this tests.

Please try to execute some tests once again with my fixes for DNS at orange.sure.im applied.

wojciech.kapcia@tigase.net commented 7 years ago

Andrzej Wójcik wrote:

I've looked into this issue and found that it takes a long time to get a response from DNS server. I've modified /etc/resolve.conf at orange.sure.im and it looks now like that:

[...]

With this change, DNS resolution is very fast and establishing connection to tigase.org from tigase.im took less than 7 seconds instead of over 2 minutes before this change!

I've already made a similar fix on orange.sure.im a few months back but it got lost somehow. We need to make sure it is permanent this time.

%Eric - please make sure that Andrzej's changes to resolv.conf are permanent and won't change accidentally it happened this time.

Eric Dziewa commented 7 years ago

I've updated resolv.conf on blue, green, and orange with the changes.

The file cannot be modified unless you remove the immutable flag with chattr -i.

wojciech.kapcia@tigase.net commented 7 years ago

Thank you Andrzej and Eric.

It seems to be working now and I can't reproduce it anymore so I consider it resolved.

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
RedmineID
6035
Version
Version 2.0
Issue Votes (0)
Watchers (0)
Reference
tigase/_clients/siskin-im#69
Please wait...
Page is in error, reload to recover