Repository statistics (#55)

Wojciech Kapcia (Tigase) opened 9 years ago

Due Date
2016-12-22

We already have more detailed statistics (#2908), but could we add also statistics related to reading data? Like

pubsub/Total writing time       1m, 2sec, 239ms
pubsub/Average DB write time [ms]       18

As well as execution time of TigPubSubGetRootNodes?

Activities

Andrzej Wójcik (Tigase) commented 9 years ago
I've set target to 7.2.0 as I do not think we will be adding this feature to 7.1.0, as it should be already released and this may be (and more likely should be) a complex task.

We can add any time of statistics, but we should consider impact of statistics gathering on performance. Additionally adding/replicating code measuring execution time of SQL queries is not a good idea.

It would be better to consider introduction of new method to DataRepository interface:

void executeStatement(String stmtId, Handler handler);

where handler would have following interface:

interface Handler { void handle(PreparedStatement stmt) { // query execution and logic here } }

This way we could measure time of execution of query and data retrieval in one place - in executeStatement() method by measuring execution time of Handle::handle() method.

While this would be easy to use and would improve code readability, it would introduce cost of creation of anonymous class or lambda statement.

As an alternative solution we could create wrapper for each repository implementation ie. wrapper for PubSubDAO class and put code measuring execution time in this wrapper. This would give us possibility to measure performance of other types of storages, ie. MongoDB, etc. Using Tigase Kernel Framework and beans it should be easy to create feature to enable/disable performance monitoring for particular repository implementation.

%wojtek %kobit %bmalkow What do you think?
Artur Hefczyc commented 9 years ago

Andrzej Wójcik wrote:

I've set target to 7.2.0 as I do not think we will be adding this feature to 7.1.0, as it should be already released and this may be (and more likely should be) a complex task.

We can add any time of statistics, but we should consider impact of statistics gathering on performance. Additionally adding/replicating code measuring execution time of SQL queries is not a good idea.

It would be better to consider introduction of new method to DataRepository interface:

[...]

where handler would have following interface:

[...]

This way we could measure time of execution of query and data retrieval in one place - in executeStatement() method by measuring execution time of Handle::handle() method.

While this would be easy to use and would improve code readability, it would introduce cost of creation of anonymous class or lambda statement.

I like this approach, especially because it allows also for switching Handler at runtime. So when we experience problems we could activate handler taking measurements. So the performance impact would be minimal during normal production run.

Also, to mitigate impact of creating of an anonymous class or lambda statement, we could pre-create Handlers and reuse them when the query is called.

As an alternative solution we could create wrapper for each repository implementation ie. wrapper for PubSubDAO class and put code measuring execution time in this wrapper. This would give us possibility to measure performance of other types of storages, ie. MongoDB, etc. Using Tigase Kernel Framework and beans it should be easy to create feature to enable/disable performance monitoring for particular repository implementation.

%wojtek %kobit %bmalkow What do you think?

Well, I understand that the above solution is more for SQL databases so any other DB would not be covered. Or maybe it would? Ideally, I think we should have a generic * DataRepository* abstract class or interface and then there could be DataRepositorySQL, DataRepositoryMongoDB, etc... This way we could have a generic way to take measurements for all DBs.
Andrzej Wójcik (Tigase) commented 9 years ago

Artur Hefczyc wrote:

Andrzej Wójcik wrote:

I've set target to 7.2.0 as I do not think we will be adding this feature to 7.1.0, as it should be already released and this may be (and more likely should be) a complex task.

We can add any time of statistics, but we should consider impact of statistics gathering on performance. Additionally adding/replicating code measuring execution time of SQL queries is not a good idea.

It would be better to consider introduction of new method to DataRepository interface:

[...]

where handler would have following interface:

[...]

This way we could measure time of execution of query and data retrieval in one place - in executeStatement() method by measuring execution time of Handle::handle() method.

While this would be easy to use and would improve code readability, it would introduce cost of creation of anonymous class or lambda statement.

I like this approach, especially because it allows also for switching Handler at runtime. So when we experience problems we could activate handler taking measurements. So the performance impact would be minimal during normal production run.

Also, to mitigate impact of creating of an anonymous class or lambda statement, we could pre-create Handlers and reuse them when the query is called.

Not really as handler needs to have query parameters passed so there is no way to pre-create Handlers. However creation of lambda is rather cheap from what I know, but there is still a penalty.

As an alternative solution we could create wrapper for each repository implementation ie. wrapper for PubSubDAO class and put code measuring execution time in this wrapper. This would give us possibility to measure performance of other types of storages, ie. MongoDB, etc. Using Tigase Kernel Framework and beans it should be easy to create feature to enable/disable performance monitoring for particular repository implementation.

%wojtek %kobit %bmalkow What do you think?

Well, I understand that the above solution is more for SQL databases so any other DB would not be covered. Or maybe it would? Ideally, I think we should have a generic * DataRepository* abstract class or interface and then there could be DataRepositorySQL, DataRepositoryMongoDB, etc... This way we could have a generic way to take measurements for all DBs.

No there is no way to this in this way. I already created interface called DataSource in 7.2.0 which is data store agnostic. However it is not possible to measure it at this level (maybe with use of @Handlers@). Best would be to wrap implementation of repositories or repository pools. It would be rather easy thing to do and should be possible to enable it at runtime in 7.2.0
Andrzej Wójcik (Tigase) commented 9 years ago

%kobit Any additional comments?

%wojtek Do you have any comments or opinions about solutions?
Wojciech Kapcia (Tigase) commented 9 years ago

Andrzej Wójcik wrote:

wojtek Do you have any comments or opinions about solutions?

I think that wrapping implementations of repositories would give us most flexibility (runtime on/off, cover more databases), while providing less overhead?
Andrzej Wójcik (Tigase) commented 9 years ago

Yes, it should give us better flexibility and single implementation of wrapper would cover every data store. In such wrapper we could also count exceptions being thrown, etc.
Artur Hefczyc commented 9 years ago

No there is no way to this in this way. I already created interface called DataSource in 7.2.0 which is data store agnostic. However it is not possible to measure it at this level (maybe with use of @Handlers@). Best would be to wrap implementation of repositories or repository pools. It would be rather easy thing to do and should be possible to enable it at runtime in 7.2.0

+1, I have no more comments. Who would be working on this?
Andrzej Wójcik (Tigase) commented 9 years ago

I think I will work on this as I worked on statistics of DB usage for PubSub, so I have some idea how to deal with it.
Wojciech Kapcia (Tigase) commented 9 years ago

Given above discussion this in a way duplicates #1477. General modifications should be carried on in #1477 and anything pubsub specific should be done in this task.
Andrzej Wójcik (Tigase) commented 9 years ago

All work was done in #1477 as I used more generic approach to this problem with same outcome.
Wojciech Kapcia (Tigase) commented 9 years ago

Closing and following up in #1477.
Login to comment

Type	Task
Priority	Normal
Assignee	Wojciech Kapcia (Tigase)
RedmineID	4491
Spent time	0

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/tigase-pubsub#55