wojciech.kapcia@tigase.net opened 3 years ago
|
|||||||
|
|||||||
|
|||||||
Possibly use |
|||||||
~~I do not think we should use Usage of empty |
|||||||
I think we should provide 2 forms of methods for getting a child element:
I would also leave existing versions of methods accepting as a parameter |
|||||||
Here are benchmark results for the tests which I've implemented:
Following things we should consider:
Looking at those results, I think that we may consider replacing In case of In case of In case of While I would like to "simplify" our API and remove "StaticString" methods, I think we would have to keep them as removing them would slow down child elements lookup by 18% (however, it would reduce element creation penalty "String.intern()", but that is one time operation). Optionally, we could create a similar optimization to proposed "memory optimization" for known attribute names, or even check if manually created pool of queried element names would not be faster. All tests are now in a separate branch @wojtek What do you think? |
|||||||
+1 I'm somewhat split regarding Regarding Optionals and usage . The problem with "using them in non-critical paths" would be non-enforceable thus it could happen that someone would use it without knowing the penalty (This applies to |
|||||||
True, if we would "cache" all element names and attribute names, those tables would be huge. In recent Java hash tables from My optimized approach was to actually cache only "attribute names" known and commonly used (I know, this may have some drawbacks). Please note, that we do not call
We can add Usage of |
|||||||
I think we (once again) have same stance on Regarding performance penalties: I would avoid exposing |
|||||||
Maybe, just maybe, we could introduce functions:
This would allow us to implement "Optional-list" processing without the overhead of creation of the I could prepare some implementations, so we could evaluate the performance of such methods to compare with the manual retrieval of element and check for |
|||||||
I've rerun tests once again as those very high results of
As we can see, Also |
|||||||
I've fixed following tests and rerun tests (static string was passed instead of new instance, which caused equlas() to return true a lot faster than it should):
|
|||||||
As commented yesterday - I'm somewhat sceptical towards such API in Element class.
This is interesting point - maybe the GC wasn't run in the test (with increased heap size) but there was additional allocation which can be problematic during normal operation (i.e. triggering GC more often). AFAIR it should be possible to measure allocations in JMH tests - could we compare it?
Wouldn't |
|||||||
Some more current results.
NOTES:
During this iteration of tests, I've added Back to your questions:
I think that there was no real issue with GC (small short lived objects should be removed fast), but there was no As for allocations, most likely small objects may increase GC, but they are short lived so that should be fast. If you would look closer on my tests na implementation of To be honest, looking at latest results, stream with On the other hand if you would look at
Basing steam on the collection which needs to be created (or filtered) was slower in my tests. Manual creation of a To sum up.
|
|||||||
Thank you for running those tests. I made a couple of changes (from commit):
Could you please do check those changes and comment on them? I added case with more sub-children as this can happen as well, though I think that majority of the request (so when we do perform lookup would be mostly IQ queries so rather limited number of children and most of the time checking first of them; with forms being the exception here). On the other hand results can have bigger amount of children but in that case we would most likely simply produce Element like that and there would be very limited querying/using "find*" methods IMHO thus I'm not sure how focused we should be on results with bigger stanzas / lookup of random child (sub-)element. With the above I run two tests ( comparing
So, for this small subset of tests:
I'm going to run tests for all methods later on (don't want to use the machine while it runs to avoid interruptions).
Yes, those should be removed fast but still this is additional allocation and in the current JVM version (without Value Classes) I think we should keep it in mind considering Element is the most fundamental piece of Tigase. (Yes, I'm slightly leaning towards returning
To be honest I checked results from the table you provided and could confirm your observations on that data… though, while looking at the code I wasn't sure we are making exactly the same comparison in all cases (it would be better to group cases that should give same outcome but do it differently somehow for more clarity - maybe different classes?) Regarding "finding child" use case, here is the extract for the case with Element with 20 children sub-elements backed by ArrayList and returning first child (this is throughput so higher=better):
What surprised me was poor result for
I assume you mean methods that favours identity (references) comparison instead of equality?
Agreed completely from what I saw.
Will do more testing (I think that with Elements with more children Stream methods could be slower but as I said - more testing needed). Though, I think we should avoid giving to many choices and simply provide user with "single best method" to avoid someone using something "convenient" that could later on become a bottleneck. Of course more convenience methods are OK while they don't degrade performance significantly.
This seems ok.
Same conclusion. Complete result from the current set of tests:
|
|||||||
I've rerun tests on my device (just for confirmation of the results):
|
|||||||
Additionally, I've gathered String intern table metrics with
which confirms that even with small no. of entries and increased bucket size by quite a lot there was no increase of performance of From that, it is clear that the comparison of Strings using I would assume that element name is compared 10-20 times on average in a single pass of a packet, but I may be wrong. We mostly check/compare element names of "outer" elements of the stanza (close to the root), while internal elements may newer be compared. We would need to measure this somehow... any ideas? I'm not sure, if this call to |
|||||||
I wonder if we shouldn't add |
|||||||
Interesting observation regarding intern() performance. If indeed it usage is not that beneficial performance wise we could drop it in favour of something else (our own local "intern" table with common names?). Regarding usage / comparison count - while we won't compare it normally Element could do it while performing lookup. I think if we want to measure it we could add basic statistics to Element (only temporarily for the testing purpose) and run... TTS-NG against server with such custom build of tigase-xmltools. Adding StringTable statistics to our tigase.im/org deployment could also give us more information. |
|||||||
I think that knowing how many times elements (and xmlns) were interned would be good to know, but I suppose there is no way to count how many times we are comparing element names as those are instance comparisons so there is no way to measure that, right? We could check executions of |
|||||||
Yes, I was thinking about counting usage of |
|||||||
I've tested a direct comparison of performance between comparing String with
From that we can see, that comparing I will try to make some measurements to estimate how many times we are calling Alternatively, we could use the "mixed" mode, by internalizing in a global map all element names passed as parameters to Another approach could be internalizing only "known" element names (such as |
|||||||
A point in discussion https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
|
|||||||
As discussed, I've modified
It looks like |
|||||||
This seems to confirm that we should drop |
|||||||
This change could improve the performance of |
|||||||
Yup, that what's I was thinking about. Though, even in worst-cast scenario it's performance is on par. |
|||||||
Here are results including case when "known" attribute name was used ("id") - "Static". |
|||||||
I've added one more test. Attributes holder class named previously Here are results:
|
|||||||
Hmm, it doesn't seem all that faster in case of dynamic attributes and seems slower in static ones (which I think would be majority of cases)? |
|||||||
Here are some more tests to measure if usage of deduplication of element name would speed up element lookup.
To make it better visible I've created a chart From here, it looks like usage of plain My guess is that this is caused by
To sum this up, each method which has deduplication will slow down as it forces Java to calculate |
|||||||
To add some more performance data, here are the results of performance testing of
"stringNope" does nothing - just accesses string instance. From that, it is really visible that calls to It looks like, In my opinion, we could keep "deduplication" just to reduce memory usage, but as deduplication will not improve performance (see the previous comment), we will sacrifice some of the performance gains to get lower memory usage. Looking at those numbers, I wonder if we should not make "deduplication" optional. However, adding a simply "if" to decide if deduplication should be done or not, would slow down accessing string instance by ~35% (from my quick test), compared to using string without any conditions. |
|||||||
Looking at the results above, it looks like a single call to |
|||||||
@kobit Looking at the results above, we are considering dropping the usage of Also, deduplication of strings (using HashMap with a configurable list of deduplicated names) may be used only for deduplication of strings during the creation of an element or setting an attribute as usage of it during comparisons and querying would slow us down. Could you look at the comments since https://projects.tigase.net/issue/issue #14#focus=Comments-4-33480.0-0 up to this comment and let us know what you think about those changes (those changes would be part of Tigase XMPP Server 9.0.0) |
|||||||
Of course, if after tests the equals() gives better overall performance than intern() with ==, then yes, we should rework this code. Go ahead and do this. I am only worried about all these places where == is used which could be missed after removing intern(). This may result in weird errors. Please be careful. To be honest I have never tested a real performance of intern(). I just assumed a single intern() call would be about the same as a single equals(). For most small installations, performance is not a problem but memory usage is. Tigase from the start requires a lot of RAM, too much in my opinion. But I do not think this is the place to optimize for memory usage. Especially that even with zero traffic, Tigase still requires a lot of RAM. So, to sum it up. Yes, please go ahead and remove usage of intern() + == and replace it with equals(). |
|||||||
I think that TTS-NG/TTS will help here and would catch if we miss something.
Unfortunately JVM itself is responsible as well (especially on small installations) but it's changing (jlink for example, which with removal of dependencies/groovy) should help quite a lot with memory usage here. |
|||||||
As @wojtek mentioned, TTS-NG tests should help us find issues. Maybe we could try the lint plugin with some rules to proactively check that?
On the other hand, I've suggested that we could add
If I recall, we already removed usage of |
|||||||
As for the threads number. Yes, the number might be scaled down somewhat. Specifically for small installations. This part was extensively tested by me years back and the number of threads was adjusted to provide maximum throughput for large/huge installations. In short, I got much higher overall throughput with higher number of threads. Of course this translates into higher resource usage. But in such cases it is justified. For small installations we deal with a few messages per minute sometimes per second, rather than thousands per second. So, we do not need a very high throughput and we could scale threads down without impact on the end-user. |
|||||||
Regarding threads and it's impact on memory usage - project loom (Lightweight Virtual Threads) is starting to take form so in the future the whole concept how to handle this could be though in rather different light. |
|||||||
Literally different light-weight. ;-) |
|||||||
I've applied discussed changes to the Tigase XML Tools package. @wojtek please review if all discussed changes were applied. |
|||||||
I think it includes everything we talked about. I made a couple of minor comments. One thing that stood out was lack of defaultXMLNS - I think it should be included in the end. Other than that, we definitely need unit tests included as well. |
|||||||
I think we should drop defXMLNS as it is just a boilerplate code not actually needed which actually makes people wonder what it is and why XMLNS is sometimes set and sometimes (when constructed manually) it is not set. We should remove this confusion. |
|||||||
I've answered commends also with a few suggestions. |
|||||||
I'm not sure what's the conclusion on default XMLNS is - as said before - I do see certain sense and logic in it (and it does seem follow XML specs). The only remaining changes are simplifications of constructors. |
|||||||
As we discussed removing "constructors" as we have now builder-like API for adding attributes and children, I've run some test to see if there will be any performance penalty. Here are results
I would expect "Builder" to be slower than "StaticArray" and I would assume it is slower or at least has similar performance as I've had "builder" slower during a few runs when I was running less tests at once. For sure, we should not use methods accepting Taking those results into account, I would say we should have following constructors:
I do not see a point in having other constructors as other cases can easily be taken care of by using builder pattern. The only issue may be during "transition" as we had a following constructor
which could be easily mistaken with the second one which I've proposed. Due to that I could lean on using just 2 constructors:
The second one is responsible for creation of a shallow copy. |
|||||||
I can restore this defXMLNS, but I'm still not convinced that we need this and as I've pointed out, current implementation has flaws and I do not know how to fix them without dropping support for defXMLNS. |
|||||||
+1 for 2 constructors (with name and As for default XMLNS - I vaguely remember that there were weird issues with missing xmlns (especially for |
|||||||
I would say, that let's drop defXMLNS and if needed we will restore it as we are just at the begining of changes for 9.0.0 and we have time. I'll modify the constructors list tomorrow. |
|||||||
@wojtek Could you review the changes once again? I've dropped unnecessary constructors and The last remaining thing will be adding test cases. |
|||||||
I've added test cases and implemented a builder pattern for |
|||||||
It looks good IMHO. A couple of further comments:
Pushed some of the changes to the branch. |
|||||||
I would prefer to use
See performance comparison of using
+1
The checks for null were there, but I think there is no need for it as other methods guard against adding null. Due to that I've commented them out and left FIXME to verify that.
I've left them there to make it easy to find and verify that this is what we want. Generally, most of
Maybe.. I would even consider using ArrayDeque.. would need to test performance.
That was my assumption as well, that is why |
|||||||
But semantics for add and set in case of null are somewhat (or can be) somewhat similar - we can simply reject
Good point, though in case of
Understood, +1 for removing null checks if not needed.
+1
+1 |
|||||||
I've applied discussed changes and:
|
|||||||
wojciech.kapcia@tigase.net added to iteration "tigase-server-9.0.0" 4 months ago
|
|||||||
wojciech.kapcia@tigase.net added to iteration "tigase-server-8.5.0" 4 months ago
|
|||||||
It seems this was never merged (but got status "closed" and got sidetracked) after migration… For now I think we should mark APIs that will be removed as deprecated. (tentatively worth investigating: org.apache.commons.collections4.list.TreeList - could be faster but probably would use more memory than ArrayList, though probably still less than LinkedList; there is also brownies-collections with performance comparison and for small collections ) |
|||||||
wojciech.kapcia@tigase.net changed fields 4 months ago
|
Type |
Task
|
Priority |
Normal
|
Assignee | |
Version |
tigase-server-9.0.0
|
-
tigase-server-8.5.0 Open
-
tigase-server-9.0.0 Open
Current
Element
API can be sometimes confusing and quite often, without looking at the sources it's difficult to tell what to expect. Improvements:null
s - use eitherOptional<>
or emptyList