Artur Hefczyc opened 1 decade ago
|
|
Now we have two different implementations used to store nodes, subscriptions and affiliations in database.
Unfortunately we use common code to search in PubSubDAO and PubSubDAOJDBC which forces us to load all subscriptions of a user to load and later search in them. Because of that I think that it may be hard to search 30mln of nodes for nodes subscribed by a user. To improve performance we would need to improve database API and use only PubSubDAOJDBC or change schema used by PubSubDAO and also improve database API. I think that anyone from our team can work on this issue but I suppose that we should also consider who will work on improvements in database API we considered some time ago. |
|
Ok, please start working on this. |
|
I redesigned database schema for PubSub component, which allows us to query for affiliations and subscriptions directly on database level, which will improve performance of described features. Following change in database schema will be part of Tigase PubSub Component 3.0.0 |
|
We need some tests to see the performance improvement. Could you please run some tests for most typical PubSub use cases to see the performance difference? Ideally we would like to have comparison between all 3 PubSub data storages. If we have populate the DB with test data over XMPP we could have the same exact tool and test for all 3 DBs. Even better we could compare Tigase PubSub to other PubSub systems out there. Actually time necessary to load DB over XMPP could be also measured and should be part of the test. Eric can help you with setting up systems with Tigase with all 3 different configuration (and maybe other XMPP servers as well?????? ejabberd, openfire, prosody). Then we could run tests:
Tests: We are interested here DB access performance so, please plan a few tests which focus on DB performance, rather then generating a high traffic
Andrzej - I also have one additional question which always worries me when I think of a high PubSub usage. What would happen if you post a new publication to a node with 10M subscribers? Will this overload the Tigase? I remember trying to add some logic to PubSub which would monitor memory usage and slowly send notifications to all subscribers. I do not remember if this code is in our current implementation but please think of this. Do we have anything which prevents OOM in such a case? |
|
I will start from last question. I haven't seen any code which could prevent from OOM in case of 10M subcribers or anything like that so it would be possible that it could overload Tigase. As for a tests, do we have a mechanism which should or could be used for a test? TTS? We should test not only database performance as slowness of PubSub component was caused by not optimal database schema and a lot of synchronization inside DAO mechanisms (default single connection to DB and single writing thread for storing node configuration). Both things were improved in same task as this was part of database access layer which was needed to be improved to create better database schema. So I think we should test concurrent access - high traffic? |
|
Andrzej Wójcik wrote:
Ok, this is something to work on as well. I have created a new task for this: #1694.
TTS is not suitable for performance tests unfortunately. We could use either Tsung or write a simple command line client in Java or Groovy using JaXMPP2. With out own code we have more control over what we do and how we do it. Tsung is easier to run large and distributed load tests but probably this is not what we want to do here. There is also a built-into Tigase code for testing PubSub so you actually do not need any external tool or users connection to put some load on PubSub. PubSubTestsTask If you know how to use it, it might be good enough. In order to activate the task you have to configure StanzaReceiver component, and then connect with Psi to the server, browse service discovery for the component and create a new task. Pick PubSubTest task from the list and that's it.
I agree, this makes sense. But probably we are not interested in hundreds of thousands users connection, instead 100 or 1,000 users would be good enough. Actually we do not need users at all. It could be a single user accessing multiple nodes at the same time. What I mean is the user simulator software could be very simple thing. |
|
Referenced from commit 1 year ago
|
Type |
New Feature
|
Priority |
Normal
|
Assignee | |
RedmineID |
1667
|
Let's say we have 30 mln PubSub nodes and we want to have a list of PubSub nodes for a given subscriber. We assume that we have just a few (max 100) PubSub nodes for a single subscriber so we do not worry about loading 30 mln, the problem is just searching DB in an efficient way.
Can we do this efficiently? Given the way Tigase stores subscribers right now it might be problematic.
Andrzej, could you please review this problem and decide whether we need to do something and who should work on this if needed?