Optimize search of subnode in UserRepository implementation (#18)

Andrzej Wójcik (Tigase) opened 9 years ago

Due Date
2016-11-17

In current implementation of UserRepository for MongoDB we use RegExp to find subnodes of particular node. This works OK, however is not optimal from performance point of view for MongoDB.

It would be better to change this by usage of new field in tig_nodes collection named parent_node which will contain name of parent node. Due to that MongoDB will be able to use index for queries using parent_node field and speed up execution of this query.

Activities

Andrzej Wójcik (Tigase) commented 9 years ago
I reviewed current implementation and from what I see in newer version of MongoDB situation changes and now Mongo uses indexes even when regex match is used in query. In our case if we are looking for subnodes of test-node we are looking for entries matching ^test-node/[^/]* and Mongo reduces number of compared documents with index to documents where node field is greater than test-node/ and smaller than test-node0 (as 0 is next ascii char after @/@).

This behaviour reduces usage of regex matching to actuals subnodes or subnodes of subnodes. And processing of subnodes of subnodes is required as MongoDB implementation do not create entries for nodes but only for nodes with keys in it. This is designed this way to remove a need for creation of separate documents for nodes structure and introduce additional operations to ensure that parent node exists when entry is set.

This new behaviour is good for small nodes of trees and in my opinion it will behave well with exception for nodes with big amount of subnodes, like in MucRepository structure. However, now when we have separate implementation for MucRepository@, I think there is no need to change this regex-based tree search as @getSubnodes() method of UserRepository is almost never called from our code.

%wojtek What do you think?

If we decide to improve performance of getSubnodes() method then I would suggest to change layout of UserRepository implementation for MongoDB. Now we have separate document for every user-node-key pair and I would suggest to change it to have one document for user-node pair where value for key would be stored in document field named same as key@. This way we would reduce number of entries and should improve performance, but will increase time needed to retrieve list of keys available for particular node - we will need to retrieve document with values for each field just to get list of keys. However method @getKeys() is not used very often (almost not used at all). Additionally MongoDB requires that every document must be smaller than 16MB. So if we would change layout then we would have limit of 16MB per node and now it is 16MB per key. With this new layout we would have single entry per each key which would make it easy to create this tree structure.

%wojtek I would like to have second opinion on this

We need to have this decided before I will work on #4693 as solution for this will depend on our decision:

If we stay with current layout then I will create separate key for each VHost entry and method to retrieve values for every node key

If we move to new layout then I will create separate subnode for each VHost entry and method to retrieve values for same key for all subnodes of particular node

+Note:+ Current layout is similar to layout in RDBMS and for RDBMS storage of values for keys may be better than with use of subnodes for every VHost.
Wojciech Kapcia (Tigase) commented 9 years ago
Given that:

MongoDB has better performance with regexp now;

proposed change may impose additional limitations;

usage of UserRepository was limited recently;

I would say that we should stay with current implementation (i.e. first option).
Andrzej Wójcik (Tigase) commented 9 years ago

I agree. Closing issue.
Login to comment

Type	Task
Priority	Normal
Assignee	Andrzej Wójcik (Tigase)
RedmineID	3996

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/tigase-mongodb#18