Messages never received (#55)
Closed
Artur Hefczyc opened 7 years ago

2 clients running on 2 devices. Each connected to tigase.org with push enabled. Accounts are subscribed to each other and I can see status change.

I am killing client on one device and status of this account changes on another device to away. When I try to send message to this account with killed client, messages are never received. There is no push notification either.

Even if I open the client which was previously killed, messages sent in the meantime are never received.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit Could you clarify how you "killed" application? When you kill application then status should be changed to "unavailable" not "away".

How long you waited for notification? or message to be redelivered to you once you reconnected?

Was any other client connected to the same account?

Have you enabled Push on the client side? This needs to be done manually, I suppose you did but just wanted to be sure.

Artur Hefczyc commented 7 years ago

Andrzej Wójcik wrote:

%kobit Could you clarify how you "killed" application? When you kill application then status should be changed to "unavailable" not "away".

The iOS way to kill, 2 taps on home button and slide off the client. It is then being killed, is it?

How long you waited for notification? or message to be redelivered to you once you reconnected?

I sent messages within 15-30 second after killing the client. Waited 1 - 2 minutes for notification, nothing happened, so opened the client. The opened client showed all accounts as offline initially, and connected all account within a few seconds. Waited for messages, still waiting, messages never arrived. It's been 30 minutes or so. But I sent new messages in the meantime which were delivered correctly.

Was any other client connected to the same account?

No, only iOS client and only on this one device. I just registered a new account on tigase.org from this device.

Have you enabled Push on the client side? This needs to be done manually, I suppose you did but just wanted to be sure.

Yes, I enabled push on client for this account.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit

Artur Hefczyc wrote:

Andrzej Wójcik wrote:

%kobit Could you clarify how you "killed" application? When you kill application then status should be changed to "unavailable" not "away".

The iOS way to kill, 2 taps on home button and slide off the client. It is then being killed, is it?

Yes, this will kill application but will start session resumption mechanism for 90 seconds. So messages delivered within this time could be delayed about 90 seconds.

If you will leave the application running in the background it will disconnect within 3 minutes as well. I'm mentioning this as I read somewhere that for applications which are killed in iOS manually notifications are not delivered. I'm not sure if this is true.

How long you waited for notification? or message to be redelivered to you once you reconnected?

I sent messages within 15-30 second after killing the client. Waited 1 - 2 minutes for notification, nothing happened, so opened the client. The opened client showed all accounts as offline initially, and connected all account within a few seconds. Waited for messages, still waiting, messages never arrived. It's been 30 minutes or so. But I sent new messages in the meantime which were delivered correctly.

This looks like an issue with delivery of offline messages or messages which should be delivered to the offline user after stream resumption times out.

Was any other client connected to the same account?

No, only iOS client and only on this one device. I just registered a new account on tigase.org from this device.

The weird thing here is that you mentioned that you've seen this account as "away".

Will need to check behavior if the client is killed on the iOS device.

Artur Hefczyc commented 7 years ago

I just looked at the device and messages are delivered as push notifications. I do not know exactly how much time it took for them to be delivered. I will run more tests on Monday.

It looks, however, that messages were not lost, just delayed, waiting somewhere.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit Artur, I was doing some research on t2.tigase.org and I've seen that there was an issue with DNS - gateway address for APNs was not resolvable. I've also got delayed delivery of Push Notification, but it happened only if I manually killed the application. There was no delay if application left in the background.

Artur Hefczyc commented 7 years ago

Ok, I will do more testing on Monday. I guess killing the app is not the most common case. I did it because I wanted to quickly test push :-)

Artur Hefczyc commented 7 years ago

Is this issue with DNS fixed already? How do you suggest I test the push, considering I have 2 iOS devices to use?

Artur Hefczyc commented 7 years ago

One more thing. It seems like stream resumption may interfere with push notification delivery. I remember we talked about this as you have foreseen this problem already and conclusion was to disable stream resumption when push notifications are enabled. Is this what the client is doing now?

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit

Artur Hefczyc wrote:

Is this issue with DNS fixed already?

The issue there is related to responses of DNS servers which t2 and t6 are using. I'm not sure if we are using "our" DNS servers or servers provided by DNS. It is hard to tell this right now for me. I remember that there was an issue with DNS servers provided by our hosting at sure.im@/@tigase.im and we switched to use different servers.

How do you suggest I test the push, considering I have 2 iOS devices to use?

As you have 2 devices, you can try using them both with a single account and see how notifications are working. If they are delivered to both devices, if they are properly discarded, etc.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit

Artur Hefczyc wrote:

One more thing. It seems like stream resumption may interfere with push notification delivery. I remember we talked about this as you have foreseen this problem already and conclusion was to disable stream resumption when push notifications are enabled. Is this what the client is doing now?

There would be an interference if stream resumption would be used and set for the very long time. In this case, there would be no push notifications for messages not stored to offline store, ie. messages waiting in stream queue for the stream resumption process. They would be delivered after stream resumption would time out.

To deal with that I decided to use a different approach. We are still using stream resumption, but when the client is sent to the background I'm able to keep an app running for some time. Usually, up to 3 minutes and during this time messages are delivered using XMPP stream, but after this time, I'm disconnecting from the server - closing TCP stream and I'm waiting for Push notifications to be delivered.

In case if the stream is broken while connected, stream resumption works as it would. However, if the client is not able to reconnect or is in the background, then push notifications will be delivered after stream resumption times out on a server side. Now, this value is set to 90 seconds.

There is one downside of using this mechanism. Every time application is in the background for over around 180 seconds, XMPP stream is closed and as a result, the user leaves MUC room. For now, I do not have a solution for this issue. It would be best to use presence-less protocol for that, ie. MIX.

Artur Hefczyc commented 7 years ago

Andrzej Wójcik wrote:

%kobit

Artur Hefczyc wrote:

Is this issue with DNS fixed already?

The issue there is related to responses of DNS servers which t2 and t6 are using. I'm not sure if we are using "our" DNS servers or servers provided by DNS. It is hard to tell this right now for me. I remember that there was an issue with DNS servers provided by our hosting at sure.im@/@tigase.im and we switched to use different servers.

How do you suggest I test the push, considering I have 2 iOS devices to use?

As you have 2 devices, you can try using them both with a single account and see how notifications are working. If they are delivered to both devices, if they are properly discarded, etc.

Well, I was asking more about how to create a situation when push notifications are generated. Right now, I have problems with this, as it is not straightforward to go offline on the mobile client and get the push notification right away.

Artur Hefczyc commented 7 years ago

%bmalkow Bartosz, I have added you to watchers to this ticket as we are discussing notifications delivery logic. We need the same behavior for both mobile clients, Android and iOS, so, while you are working on this, please follow the discussion and make sure your implementation in the Android client is the same as in the iOS client.

Of course, if you have some suggestions or ideas to share, please do not hesitate to do so.

Artur Hefczyc commented 7 years ago

I tried to test the push notifications and to be honest it is not that easy/simple to actually get them. Additionally, there is something like app notifications which show up when the application is running (foreground or background) which is kind of confusing and difficult to tell whether this is app notification from app running in background or a push notification.

So here are my notes from testing the notifications/push notifications on the iOS client:

  1. Messages are delivered quickly if the app is still running in foreground or in background, this is good.

  2. When the client is put into background, account status changes to away, the client still receives messages and displays notifications

  3. After a while after the client is sent to background the account shows as offline contact on the other account and this when the things go not so good. I send messages to the account and nothing happens. There is no notification from the app and no push notification is showing. I waited for at least 12 minutes and nothing happens.

  4. Opened client on a different device for the same account and still nothing happens.

  5. After 15 minutes or so, opened client on the first device and once the client connects, all messages are received.

It looks to me like the stream resumption has quite a long timeout, during which, nothing is delivered, either through the XMPP connection or push notifications.

Here is what I would like to be changed from current behavior

  1. Most important thing is to deliver messages instantly, regardless the client is running on not. That's the whole point behind the push right? So, we either need to disable stream resumption when push is ON or change the implementation in such a way that, message is pushed right away, when there is no connection with the client. This is the most important part of the implementation.

  2. When the application is in foreground and used by a user, there is no need for notifications to show. If possible, please disable iOS level notifications if the app is in foreground. This is how Skype works for example. (tested more and this is actually a problem only on iPad. on an iPhone notifications are not showing when the client is used by a user).

  3. If the iOS client runs on more than 1 device chat with a contact is kept in sync if client is active on both devices. If a client is not active on one device than chat is not kept in sync across devices. I know this goes beyond message carbons specification. However, please think of a way (in future versions) how we could keep chats in sync across devices. Maybe using UA?

Andrzej Wójcik wrote:

%kobit

Artur Hefczyc wrote:

One more thing. It seems like stream resumption may interfere with push notification delivery. I remember we talked about this as you have foreseen this problem already and conclusion was to disable stream resumption when push notifications are enabled. Is this what the client is doing now?

There would be an interference if stream resumption would be used and set for the very long time. In this case, there would be no push notifications for messages not stored to offline store, ie. messages waiting in stream queue for the stream resumption process. They would be delivered after stream resumption would time out.

To deal with that I decided to use a different approach. We are still using stream resumption, but when the client is sent to the background I'm able to keep an app running for some time. Usually, up to 3 minutes and during this time messages are delivered using XMPP stream, but after this time, I'm disconnecting from the server - closing TCP stream and I'm waiting for Push notifications to be delivered.

Ok, this is OK, as long as it works. I tested and can confirm that when the app is in background the account shows as away on the contact list of another account and messages are being delivered. However, once the contact status changes to offline, there are no messages delivered either through the XMPP or via Push.

In case if the stream is broken while connected, stream resumption works as it would. However, if the client is not able to reconnect or is in the background, then push notifications will be delivered after stream resumption times out on a server side. Now, this value is set to 90 seconds.

I waited 15 minutes and nothing was received. Once the client reconnected (on the same exact device is was used previously) messages were delivered right away. Before opening the client on this device I tried open the client for the same account on another device and nothing was delivered, so I guess the stream resumption is set to a very long time on tigase.org.

Regardless, even if this is 90 seconds it is still not acceptable. Messages must be delivered right away. Remember this is our demo application/system which we will use for presentation purposes, so it must give a good impression and delivered messages instantly and flawlessly.

Of course, a preferred way is to use stream resumption AND have push notifications at the same time. As this would be the optimal solution for performance and resources usage. However, the first version may switch stream resumption off if necessary.

There is one downside of using this mechanism. Every time application is in the background for over around 180 seconds, XMPP stream is closed and as a result, the user leaves MUC room. For now, I do not have a solution for this issue. It would be best to use presence-less protocol for that, ie. MIX.

Ok, MUC is not a big problem right now, in this case. And, sure, we can work on MIX implementation for our server and clients and switch to use MIX for our internal communication when this is ready.

Artur Hefczyc commented 7 years ago

Did more tests today. Sent a few messages to an account for which the iOS client was offline. After several hours there was no push notifications. I sent messages at about 5PM and checked both mobile devices with client for this account at 9PM. No notification on any of them.

However, once I opened a client on one of the devices, messages were received just after the client connected.

Push notifications activated on both devices. I wonder if this is a problem with our push implementation or with DNS mentioned above. Andrzej, could you please let me know the domain names of the APN service? I can check DNS resolution for these domains on our tigase.org servers.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit I still see issues with DNS resolution on t2 and t6 for @gateway.sandbox.push.apple.com@. I think that we need to resolve this DNS issues before we will be able to test this feature as currently, it gives a really bad impression. I have it running on a different server and without DNS issue it works very well.

From what you describe it looks like client and push component are working fine, however DNS resolution is an issue here.

I think that this would be a task for %Eric when he will be back and for now, I modified /etc/hosts files on t2 and t6 to properly point to APNs server IP addresses for APNs domain name.

Artur Hefczyc commented 7 years ago

This was my first suspicion that DNS issues are responsible.

I looked into this and the entries you put in to the /etc/hosts file are different from what I get from DNS query.

tigase@t2:~$ cat /etc/hosts | grep gateway
17.188.136.189 gateway.sandbox.push.apple.com
17.188.137.58 gateway.sandbox.push.apple.com
17.188.137.190 gateway.sandbox.push.apple.com
17.188.132.189 gateway.sandbox.push.apple.com

But, when I query the server name I get a different set of IPs:

tigase@t2:~$ host gateway.sandbox.push.apple.com
gateway.sandbox.push.apple.com is an alias for gateway.sandbox.push-apple.com.akadns.net.
gateway.sandbox.push-apple.com.akadns.net has address 17.188.166.23
gateway.sandbox.push-apple.com.akadns.net has address 17.188.166.24
gateway.sandbox.push-apple.com.akadns.net has address 17.188.166.26
gateway.sandbox.push-apple.com.akadns.net has address 17.188.165.212
gateway.sandbox.push-apple.com.akadns.net has address 17.188.165.214
gateway.sandbox.push-apple.com.akadns.net has address 17.188.165.215
gateway.sandbox.push-apple.com.akadns.net has address 17.188.165.216
gateway.sandbox.push-apple.com.akadns.net has address 17.188.166.22

tigase@t2:~$ dig gateway.sandbox.push.apple.com

; <<>> DiG 9.8.1-P1 <<>> gateway.sandbox.push.apple.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17427
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 10, ADDITIONAL: 1

;; QUESTION SECTION:
;gateway.sandbox.push.apple.com.	IN	A

;; ANSWER SECTION:
gateway.sandbox.push.apple.com.	275 IN	CNAME	gateway.sandbox.push-apple.com.akadns.net.
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.166.22
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.166.23
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.166.24
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.166.26
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.165.212
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.165.214
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.165.215
gateway.sandbox.push-apple.com.akadns.net. 575 IN A 17.188.165.216

;; AUTHORITY SECTION:
akadns.net.		164099	IN	NS	a28-129.akadns.org.
akadns.net.		164099	IN	NS	a7-131.akadns.net.
akadns.net.		164099	IN	NS	a18-128.akadns.org.
akadns.net.		164099	IN	NS	a3-129.akadns.net.
akadns.net.		164099	IN	NS	a1-128.akadns.net.
akadns.net.		164099	IN	NS	a13-130.akadns.org.
akadns.net.		164099	IN	NS	a11-129.akadns.net.
akadns.net.		164099	IN	NS	a12-131.akadns.org.
akadns.net.		164099	IN	NS	a5-130.akadns.org.
akadns.net.		164099	IN	NS	a9-128.akadns.net.

;; ADDITIONAL SECTION:
a5-130.akadns.org.	89363	IN	A	95.100.168.130

;; Query time: 1 msec
;; SERVER: 192.95.36.80#53(192.95.36.80)
;; WHEN: Wed Jul  5 18:38:55 2017
;; MSG SIZE  rcvd: 472

It seems to me that it depends on where you are calling from you may get a different set of IPs.

Is there any command line tools we can run to better investigate the problem?

In any case the DNS query I run returns results within 1ms which seems OK to me.

Andrzej Wójcik (Tigase) commented 7 years ago

%kobit

I was executing queries on t2 and t6 when it worked and added results to /etc/hosts file. From time to time DNS did not respond or respond after 150ms with the empty response and this was causing and issue. I modified the list of DNS servers on t2 to improve the performance of DNS resolution, so now it may work fine.

I'm not sure how to fix this issue with DNS permanently. I think we should wait for %Eric and let him investigate this issue. However, I would like him to check all our servers DNS configurations as I've already made fixes to orange.sure.im a few months back as it had a similar issue.

Artur Hefczyc commented 7 years ago

I have done more testing today and my impression is that Push notifications are not working for me at all.

I have my wife's devices (iPad and iPhone) with our client installed and both configured for the same account on tigase.org. Both have push enabled.

Client on both devices was not opened for about 30h, definitely over a day. I sent a few messages from my account on tigase.org to my wife's account and nothing happened. Waited 11 minutes and no push messages showed up.

Once opened the client on iPad, the app connected to our server and both messages were delivered.

This makes me wonder if I use/test the same client version as you do? I use the publicly released version 1.0 of our client. Maybe you use your development version with some changes which affect push functionality?

Andrzej Wójcik (Tigase) commented 7 years ago

Thank you for pointing me in the right direction. As you mentioned that you use the version from AppStore and I'm using locally deployed build, I've replicated this issue you described on another device using the version from AppStrore. After analysis, I've found out the root cause of this issue on the server side.

APNs has 2 separate endpoints used for pushing notifications - production and sandbox. Our push provider was pushing notifications to sandbox environment which resulted in notifications being delivered only to devices running manually built application and not installed from AppStore. As I've checked using production environment notifications were sent only to applications installed from AppStore and not to development devices, so I've ended implementing fallback mechanism. When enabled it allows the server to try to deliver the notification to the production environment and when this process fails with an error of unknown devices id, it will try to deliver notification using sandbox environment. I've added caching to that, so on subsequent message delivery push component will know which environment it should use.

This fallback, in fact, is only for testing devices like mine, but it is essential for development.

Additionally, there was an issue with FCM connector which leads to push component main thread being locked and stopping delivery of notifications. It was solved by %bmalkow with modification in Jaxmpp and my change in a provider which allows to wait for connection pool to return a connection only for a very short time.

Fixes are already deployed at tigase.org installation.

Artur Hefczyc commented 7 years ago

Ticket is private so there is no need to mark individual comments as private.

I ran a few tests and we have success! Push notification was received! The first attempt was not full success as only one device received push notification but or subsequent tests went well.

Interesting thing I noticed. Push notifications are sent to all devices, however, once a client is opened on one device and messages are received, push notifications disappear on the other device. Does it need any special coding or it works like this out of the box?

Andrzej Wójcik (Tigase) commented 7 years ago

Artur Hefczyc wrote:

Ticket is private so there is no need to mark individual comments as private.

I know but sometimes, later on, this is changed and I thought it would be better to make some of the comments private as well.

I ran a few tests and we have success! Push notification was received! The first attempt was not full success as only one device received push notification but or subsequent tests went well.

Interesting thing I noticed. Push notifications are sent to all devices, however, once a client is opened on one device and messages are received, push notifications disappear on the other device. Does it need any special coding or it works like this out of the box?

This is not something that works out of the box, but I thought this is how it should work and it was rather simple to implement. Moreover, even if you would connect to the same account with the client not supporting push notifications (ie. Psi, Messages.app, Adium) then notifications on your devices will be dismissed as well. I assumed that once you connected you read incoming messages and there is no point to have to mark notifications as read on all devices one by one - it would be pointless.

However, to make it work it requires XMPP server to send a notification with a count of unread messages set to 0 when offline messages are retrieved from the server. Tigase XMPP Server does so.

Artur Hefczyc commented 7 years ago

Thank you for explanation. This is really cool. There is one missing feature though. But that's for a different time. I am thinking of a feature we called chat continuation, which synchronizes chats across devices. Right now, if I receive messages on iPhone and then open the client on iPad I do not have last messages. This would be very useful.

Andrzej Wójcik (Tigase) commented 7 years ago

This can be easily done with MAM which we already have implemented on the server side in 7.2.0. I was counting on implementing this feature but assumed this is not a required feature for initial client release. Artur, please create a task for implementation and add it to proper version so I will know when to implement it.

Referenced from commit 11 months ago
issue 1 of 1
Type
Bug
Priority
Normal
Assignee
RedmineID
5754
Version
Version 1.1
Issue Votes (0)
Watchers (0)
Reference
tigase/_clients/siskin-im#55
Please wait...
Page is in error, reload to recover