Long presence processing time (#322)

Artur Hefczyc opened 1 decade ago

Due Date
2014-09-03

After the server restart (all nodes) I noticed that there are no online buddies on tigase.im nodes but they show OK on sure.im nodes. I thought that this is a problem with skip-offline feature but after a long time online buddies showed on all cluster nodes. A screeshot from Tigase Monitors shows a very long processing time in presence plugin.

This is something to investigate soon. As about 12secs processing time for presence will kill our service.

The most likely cause of the long presence processing time is the fact that either the client sends presence before loading roster or send roster get and initial presence at about the same time. As a result roster is loaded within presence plugin pool thread slowing down processing of all presences. We need some way to avoid loading roster within presence thread, or if this is not possible we need to create a separate thread pool for initial presences which may trigger roster loading from DB.

I assign this to Wojciech but anybody is welcome to throw ideas here.

Andrzej: you worked on presences processing some time ago so maybe you have some suggestions.

Activities

Andrzej Wójcik (Tigase) commented 1 decade ago

Maybe it is not a good idea but I would suggest loading roster when we are binding user resource - result stanza of bind could trigger that loading in separate thread so loading should be finished before user would ask for roster or send initial presence.

Also this may be related to that we move database server from Europe to North America and now our servers in Europe (*.tigase.im) are connecting to database server in North America which may slow them down a bit. Also I have seen rather high cpu usage on our database server (db1.sure.im) - in top I have seen that cpu was used mainly by mysql (up to 45-58%) and by sshd (spikes to over 60%).
Artur Hefczyc commented 1 decade ago

Andrzej Wójcik wrote:

Maybe it is not a good idea but I would suggest loading roster when we are binding user resource - result stanza of bind could trigger that loading in separate thread so loading should be finished before user would ask for roster or send initial presence.

This is not a bad idea, however, I am afraid that it may make things even worse.

Current code loads a roster in more or less transparent way. If any plugin tries to access the roster, the code checks if the roster is loaded to memory and provides it from memory if it is there but if not, it just loads the roster from DB to RAM and then serves it to a plugin.

So, what most likely happens right now is, that a client sends an initial presence at the same time as the roster get request. Both packets are processed concurrently by Tigase and they both try to access the roster at the same time. So we end-up with 2 roster calls from DB for each user. So we generate twice the load ourselves. I am afraid that if we add the logic to load the roster in resource bind we may end-up with 3 roster calls for DB for each user as many clients (Psi) do not wait for a response from the server before sending next request.

I think a better solution would be to modify Presence packet in such a way that it never loads the roster from a DB. The roster would be loaded form DB only when roster get IQ is received. This complicates the logic for initial presence (and presence probe) broadcast in case when an initial presence is received but roster is not loaded from DB yet. This would need to be postponed until the roster is loaded. So when we receive roster get IQ stanza and roster is loaded, then the roster's logic would need to check whether initial presence was received from the user and whether it was broadcasted. If not it should trigger broadcasting initial presence and probe.
Andrzej Wójcik (Tigase) commented 1 decade ago

Is there any point in RFC saying that client needs to retrieve roster before sending initial presence? I can think of some applications in which this might not be a good think to do. But we could create some kind of cache for request for a roster so that other plugin will know about initial request done ie. by roster get request and will wait for this orginal request to finish. This way we would have only one request on database (based on Map with Future classes as values).

But I also still think that this issue is now caused by slow responding database due to distance between servers.
Artur Hefczyc commented 1 decade ago

This was discussed on Jabber mailing list a while ago. There are use cases where a client is generally not interested in loading the roster at all. So in such cases it is possible that a client will never send roster get request but it may send initial presence instead. So, in such cases my proposed logic would not work.

Your suggestion is much better as it handles both use cases but also more complex to implement. I was thinking of a relatively simple solution.

Maybe a solution is even much simpler instead. I do not remember the code well enough but maybe some kind of synchronization on the roster object (or XMPPSession or XMPPResourceConnection object) would be good enough to prevent 2 or more plugins loading from the DB at the same time for the same user? If this coded smart than we could prevent loading the roster concurrently?
Andrzej Wójcik (Tigase) commented 1 decade ago

We can do synchronization on XMPPSession or something like that but it is easier to user Futures to execute request and wait for result. I can prepare an example solution or cache-like implementation for this requests for roster if you like.
Artur Hefczyc commented 1 decade ago

Andrzej Wójcik wrote:

I can prepare an example solution or cache-like implementation for this requests for roster if you like.

Please do, I am curious to see it.

Andrzej Wójcik (Tigase) commented 1 decade ago

I think we were wrong about loading same roster by more than one processor in the same time for the same user as synchronization on XMPPResourceConnection is already in place:

	protected Map<BareJID, RosterElement> getUserRoster(XMPPResourceConnection session)
					throws NotAuthorizedException, TigaseDBException {
		Map<BareJID, RosterElement> roster = null;

		// The method can be called from different plugins concurrently.
		// If the roster is not yet loaded from DB this causes concurent
		// access problems
		synchronized (session) {
			roster = (Map<BareJID, RosterElement>) session.getCommonSessionData(ROSTER);
			if (roster == null) {
				roster = loadUserRoster(session);
			}
		}

		return roster;
	}

The only case which could lead to loading same roster by more than one thread is case in which 2 or more connections (XMPPResouceConnection) are being processed in the same time. To eliminate that we could move synchronization session to session.getParentSession() but I think it is less likely to happen.

Artur Hefczyc commented 1 decade ago

Ok, I am glad this is kind of resolved. I guess I do not remember the code well enough.

However, we can still see long processing time on presences which causes queues to build up for presence thread pool (plugin). Hence my assumption of double roster loading.

Therefore we need a solution. No DB access for presence processing plugin. Assuming that this is DB access which causes the long processing time. I am still assuming this is roster loading but maybe this is something else.
Andrzej Wójcik (Tigase) commented 1 decade ago

I agree that this is most propably due to roster loading time but I think this is related to slow responses from database due to distance between servers.
Artur Hefczyc commented 1 decade ago

Yes, I agree. However, I think this is still an issue even if the DB is local. The only difference is that the problem is less apparent or will show up at higher load. We just have processing time, let's say 100ms instead of 10secs.

We did experience the problem even before with DB server in local LAN. After service restart and when all users reconnected we had huge queues which were processing quite slowly. I thought that this is normal and expected but now I think, if we removed DB processing from presence plugin altogether it would be much better.

I think it is not a high priority problem but something we should work on sooner than later.
Andrzej Wójcik (Tigase) commented 1 decade ago
But how to deal with this issue? If we move loading roster from DB to separate thread then we may have a situation when we receive
and we start processing it while roster will still will need to be loaded (queue may build up on roster loading threads) which may lead to other issues.
Artur Hefczyc commented 1 decade ago

I am not sure if this is a good idea but the implementation seems quite simple:

If Tigase receives an initial presence packet and the roster is still not loaded (we need to check this quickly with no blocking, so the presence plugin cannot be stuck in the sync statement), then the presence plugin could generate a dummy IQ roster get request which would be processed by the roster plugin. Of course when Tigase receives this dummy roster get plugin and roster is already loaded it would do nothing. And then, the roster plugin would need to check if the presence broadcast for initial and probe presence was sent if not it should trigger sending it. It could be, again a dummy initial presence packet. I know that this additional traffic sounds stupid but this way we can still keep the code relatively clean with a good separation logic for each processing and there would be no blocking in the presence plugin.

Any other idea is very welcomed.
Andrzej Wójcik (Tigase) commented 1 decade ago
Yes you are right that there will be no blocking in presence plugin, but what about roster plugin? Whole traffic will be move to this plugin as most of a time we will not have roster loaded (in case of heavy traffic). So queues will build up on roster plugin (propably smaller than on presence plugin).

But:

Using this method we would lose content of original presence (not good). What about priority? status? type? and so on.

What if presence plugin will receive presence from other contact and we would not have roster loaded? (still need to load or what?)

I think that we may see queues building on presence plugin but real queue is building on accessing database layer. (I'm thinking of JDBCRepository). To access data we need to:

Get user id (uid)

Get node id (nid)

Retrieve data

So we have 3 database requests, where each needs to wait for previous one. So if ping to database server is 40ms then to get data we need (3240ms =) 240ms! And when we get over 160ms ping (server in US, database in Europe case) we get (32160 =) 960ms (almost a second!) And all this time is only time needed to transfer data over network so it excludes data processing on database server side. If we could reduce number of requests removing one request (ie. cache uid for bare jid!) we could speed up retrieving data (not only for roster) about 30%.

Also right now we are using virtual machines which are having assigned 2 logical CPUs (previous hardware setup was with 4 CPUs if I remember correctly). Which reduced number of queues for presence plugins (from 16 to 8) which increased number of packets in queue. Knowing that each database access is blocking it increased time needed to process. ie. for 1600 packets:

16 threads, so 1600/16 = 100 packets per thread, 100 packets * processing time (let's say 240ms for 40ms ping) and we get 24 000ms to process 1600 packets. (~24s)

8 threads, so 1600/8 = 200 packets per thread, 200 packets * processing time (let's say 240ms for 40ms ping) and we get 48 000ms to process 1600 packets (~48s)

As you can see here number of processing threads increased but due to network timeout I think that CPU will be rather idle.

So may suggestions are:

Increase concurrency for Presence plugin (from 2 to 4)

Investigate possibility to reduce numbers of queries to get data from database (ie. create procedure on database server side and call it get data in one go? cache user UID?)

On bigger installations with only one database: Think about increasing database connection pool from 10 up to number of CPU core * 2
Artur Hefczyc commented 1 decade ago

Andrzej Wójcik wrote:

Yes you are right that there will be no blocking in presence plugin, but what about roster plugin? Whole traffic will be move to this plugin as most of a time we will not have roster loaded (in case of heavy traffic). So queues will build up on roster plugin (propably smaller than on presence plugin).

That's the whole point. This is what we want. Note, there are installations where Tigase processes 100-200 logins per second (100-200 roster loading requests) but at the same time it processes but... it processes 500k presence packets per second. This is of course on multiple machines in a cluster mode.

If the DB is correctly configured than we can easily process 100-200 logins (roster loading per second) so there will no queuing. Queuing will happen at peak time or after restart but the queue size will be in hundreds maybe a few thousands after restart but there will be no packets drop (of everything is configured correctly). However if we have any DB blocking in presence plugin there is no way we can handle the load without packets loss.

But:

Using this method we would lose content of original presence (not good). What about priority? status? type? and so on.

No. When a real presence from a user comes we store it in the user's session object XMPPResourceConnection the same way as we do now. The dummy presence packet will not be stored in user's session, it is just a trigger to start some action. Well, we do not even need to send dummy presence packet or dummy roster get packet. We can have a custom IQ or something like this to communicate between plugins, something similar to what we have to communicate between connection manager and session manager.

What if presence plugin will receive presence from other contact and we would not have roster loaded? (still need to load or what?)

Let's assume that we will never access DB (load roster for example) in presence plugin. Then we will look for a solution which does not require DB access. We could store such a presence in user session data for example. But I think it is not even needed. We can, we should actually, ignore such a presence packet. "our" user did not sent his initial presence out and he did not even sent presence probe. Once the roster is loaded we will broadcast his initial presence and his presence probes. The probe will retrieve the presence we have just dropped.

Of course we could optimize the traffic to "remember" such presences and do not even send probes for users for which we already have presences.

I think that we may see queues building on presence plugin but real queue is building on accessing database layer. (I'm thinking of JDBCRepository). To access data we need to:

Get user id (uid)

Get node id (nid)

Retrieve data

So we have 3 database requests, where each needs to wait for previous one. So if ping to database server is 40ms then to get data we need (3240ms =) 240ms! And when we get over 160ms ping (server in US, database in Europe case) we get (32160 =) 960ms (almost a second!) And all this time is only time needed to transfer data over network so it excludes data processing on database server side. If we could reduce number of requests removing one request (ie. cache uid for bare jid!) we could speed up retrieving data (not only for roster) about 30%.

Also right now we are using virtual machines which are having assigned 2 logical CPUs (previous hardware setup was with 4 CPUs if I remember correctly). Which reduced number of queues for presence plugins (from 16 to 8) which increased number of packets in queue. Knowing that each database access is blocking it increased time needed to process. ie. for 1600 packets:

16 threads, so 1600/16 = 100 packets per thread, 100 packets * processing time (let's say 240ms for 40ms ping) and we get 24 000ms to process 1600 packets. (~24s)

8 threads, so 1600/8 = 200 packets per thread, 200 packets * processing time (let's say 240ms for 40ms ping) and we get 48 000ms to process 1600 packets (~48s)

As you can see here number of processing threads increased but due to network timeout I think that CPU will be rather idle.

Yes, this is all true. We can for sure optimize DB access and this is on TODO list. But it would not solve the problem. I mean it would for sure improve it a lot but not solve. Here is why:

The things is that we have something I call "fast packets" which do not require IO and can be processed below 1ms and we have "slow packets" which have to wait for IO. Let's say we have 1,100 packets, 100 of which are "slow packets" which have to wait 10ms for IO and the rest is "fast packets" which require 1ms to process them. If all of them are in a single queue (presence let's say) then the queue requires 2,000ms to be processed. So we need 2 seconds to process all the packets, even though CPU waits for about 1 second during this time.

However, if we split the queue into 2 separate queues, one for "fast packets" and one for **"slow packets", each on a separate thread, then "fast packets" can be processed within 1,000ms and "slow packets" can be processed within 1,000ms which gives us a total of 1 second to process all of them.

So, if we have such traffic of 1,000 packets per second, then with a single queue we cannot handle it on an installation but with split queues we can handle.

Please also note, that simple increasing number of queues with a separate thread for each queue to 2 is not sufficient if "slow packets" are mixed with "fast packets". For 2 queues with mixed packets we require 1,050ms to process all of them.

This is, of course, all very simplified calculation and picture, but I hope it explains why we need to separate IO bound packets from fast packets.

So may suggestions are:

Increase concurrency for Presence plugin (from 2 to 4)

This can be done easily in the configuration and it may be a good temporary solution.

Investigate possibility to reduce numbers of queries to get data from database (ie. create procedure on database server side and call it get data in one go? cache user UID?)

This is on Wojtek's TODO list to redesign and improve overall DB access API.

On bigger installations with only one database: Think about increasing database connection pool from 10 up to number of CPU core * 2

Database processing time is bound to network IO and HDD IO, therefore this helps only a little bit. The main problem is really packets waiting in a queue which do not have to wait.
Andrzej Wójcik (Tigase) commented 1 decade ago

Ok, I agree that this will speed up processing time and reduce queues,.. but it will create a possibility that next presence packet after initial presence will be processed before initial presence is sent, which make cause lost of initial presence as new presence would override it. And it is common that IM clients are sending or only in initial stanza and after that only if it changes, so our change can create lost of this information.
Artur Hefczyc commented 1 decade ago

Andrzej Wójcik wrote:

Ok, I agree that this will speed up processing time and reduce queues,.. but it will create a possibility that next presence packet after initial presence will be processed before initial presence is sent, which make cause lost of initial presence as new presence would override it.

In most cases it does not matter and it would even reduce unnecessary traffic. We are only concerned about the last user's presence status, so we can skip the previous presences which have not been broadcasted yet.

And it is common that IM clients are sending or only in initial stanza and after that only if it changes, so our change can create lost of this information.

Indeed. This is something we may need to look at. Priority is already recorded in user's session, so this information is not lost. We could add some extra logic to add priority to subsequent presence in certain cases and we could do the same for . However, I think the time window in which this can happen is very narrow, therefore it should not happen very often. Therefore this is kind of improvement we could do later.
Andrzej Wójcik (Tigase) commented 1 decade ago

I added code which will check if roster is loaded on initial outgoing presence broadcast and if roster will not be loaded yet, then Presence processor will generate packet which will be handled by Roster processor which will load roster and notify Presence processor using packet of type result to trigger initial outgoing presence broadcast after load of roster by Roster processor.
Artur Hefczyc commented 1 decade ago

Looks good to me. We need to run some testing on this once we approach closer to the final release of 5.3.0.

I guess, this behavior could be easily switched off, could you make it configurable? Just in case performance gains are not satisfactory or it cause unexpected and undesired side-effects on some installations.
Andrzej Wójcik (Tigase) commented 1 decade ago

Set new due date due to new things to be added
Andrzej Wójcik (Tigase) commented 1 decade ago

Added parameter disable-roster-lazy-loading to Presence plugin to be able to disable delegation of roster loading to Roster plugin thread
Artur Hefczyc commented 1 decade ago

Perfect, thank you.
Login to comment

Type	Bug
Priority	Normal
Assignee	Artur Hefczyc
RedmineID	1969
Version	tigase-server-7.0.0
Spent time	0

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/server-core#322

Using this method we would lose content of original presence (not good). What about priority? status? type? and so on.

What if presence plugin will receive presence from other contact and we would not have roster loaded? (still need to load or what?)

Get user id (uid)

Get node id (nid)

Retrieve data

Increase concurrency for Presence plugin (from 2 to 4)

Investigate possibility to reduce numbers of queries to get data from database (ie. create procedure on database server side and call it get data in one go? cache user UID?)

On bigger installations with only one database: Think about increasing database connection pool from 10 up to number of CPU core * 2