Cleanup Element API (#14)

Wojciech Kapcia (Tigase) opened 3 years ago

Current Element API can be sometimes confusing and quite often, without looking at the sources it's difficult to tell what to expect. Improvements:

better naming
add javadocs
remove nulls - use either Optional<> or empty List

Activities

Andrzej Wójcik (Tigase) commented 3 years ago
replace List<T> mapChildren(Matcher<Element>, Function<Element,T>) with Stream<Element> streamChildren()

review and improve/extend builder pattern API (withXXX() methods)
Andrzej Wójcik (Tigase) commented 3 years ago
consider adding metods like getCDataUnescaped() / setCDataUnescaped(String) to make it easier to set and escape content of the element.
Wojciech Kapcia (Tigase) commented 3 years ago

Possibly use ArrayList instead of LinkedList (considering that Element shouldn't be mutable using ArrayList could be somewhat better)
Andrzej Wójcik (Tigase) commented 3 years ago

~~I do not think we should use Optional<> in getters due to performance reasons, see https://pkolaczk.github.io/overhead-of-optional/~~ I've double-checked their results, and if my tests are correct (with use of JMH) usage of Optional<Element> is around 400-500 us slower per 1000000 operations. (It looks like mentioned article had worse values not due to the usage of Optional but due to conversion of long to Long and back).

Usage of empty List would, performance-wise, be equal to returning null for methods returning List, but it would allow us to "drop" checks if we received an instance of a List or null.
Andrzej Wójcik (Tigase) commented 3 years ago
I think we should provide 2 forms of methods for getting a child element:

Element getChild(String name, String xmlns); Optional<Element> findChild(String name, String xmlns);

I would also leave existing versions of methods accepting as a parameter Matcher/Predicate as it makes querying more flexible (ie. we can query using different attributes or even by testing if children contains particular element).

Andrzej Wójcik (Tigase) commented 3 years ago

Here are benchmark results for the tests which I've implemented:

Benchmark                                                  Mode  Cnt   Score    Error  Units
ElementPerformanceTest.benchmarkFindChildAndClassic        avgt    5  14,577 ±  1,681  us/op
ElementPerformanceTest.benchmarkFindChildAndOptional       avgt    5  13,954 ±  1,230  us/op
ElementPerformanceTest.benchmarkFindChildArrayList         avgt    5  18,353 ±  0,433  us/op
ElementPerformanceTest.benchmarkFindChildLinkedList        avgt    5  18,815 ±  0,662  us/op
ElementPerformanceTest.benchmarkFindChildNew               avgt    5   1,685 ±  0,016  us/op
ElementPerformanceTest.benchmarkFindChildNewStatic         avgt    5   1,423 ±  0,028  us/op
ElementPerformanceTest.benchmarkFindChildNewX              avgt    5   1,733 ±  0,037  us/op
ElementPerformanceTest.benchmarkFindChildOldEmpty          avgt    5   0,019 ±  0,001  us/op
ElementPerformanceTest.benchmarkFindChildOldOptional       avgt    5   3,294 ±  0,456  us/op
ElementPerformanceTest.benchmarkFindChildOldStatic         avgt    5   1,456 ±  0,069  us/op
ElementPerformanceTest.benchmarkFlatMap                    avgt    5   0,095 ±  0,004  us/op
ElementPerformanceTest.benchmarkFlatMapStreamToList        avgt    5   0,278 ±  0,038  us/op
ElementPerformanceTest.benchmarkFlatMapStreamToList2       avgt    5   0,193 ±  0,017  us/op
ElementPerformanceTest.benchmarkHashMapGet                 avgt    5   0,057 ±  0,001  us/op
ElementPerformanceTest.benchmarkHashMapGetIdentity         avgt    5   0,477 ±  0,007  us/op
ElementPerformanceTest.benchmarkHashMapGetMemoryOptimized  avgt    5   0,067 ±  0,002  us/op
ElementPerformanceTest.benchmarkHashMapSet                 avgt    5   0,072 ±  0,003  us/op
ElementPerformanceTest.benchmarkHashMapSetIdentity         avgt    5   0,517 ±  0,019  us/op
ElementPerformanceTest.benchmarkHashMapSetMemoryOptimized  avgt    5   0,093 ±  0,005  us/op
ElementPerformanceTest.benchmarkIndentityMapGet            avgt    5   0,507 ±  0,013  us/op
ElementPerformanceTest.benchmarkIndentityMapGetStatic      avgt    5   0,047 ±  0,001  us/op
ElementPerformanceTest.benchmarkIndentityMapSet            avgt    5   0,524 ±  0,017  us/op

Following things we should consider:

Use ArrayList instead of LinkedList as it should be smaller and faster
String.intern() is slow.
Usage of HashMap is faster for lookup of arbitrary strings, as IdentityHashMap requires usage of String.intern(). When using static strings HashMap is 21% slower on get but is still faster on put about 86% (as we do not have StaticString equivalent).
Usage of HashMap will increase memory usage as attribute keys will not be deduplicated (unless we would use "optimization", to replace most commonly used attribute names with static Strings).
Element parsing will be faster when String.intern() is not used.
Comparing element names using identity instead of equality is about 18% faster.

Looking at those results, I think that we may consider replacing LinkedList with ArrayList and IdentityHashMap with HashMap.

In case of Map replacement, while we may slightly slow down "get" requests, we would for sure increase performance of "set" (which is done during parsing of elements, element instantiation, etc.). We may also consider usage of "optimized" version, to reduce memory overhead of this decision.

In case of Stream, we should use them in "non-critical" path as it is slower, but if we are creating many collections before we get our result, then usage of Stream may be faster..

In case of Optional, we should use them in "non-critical" path as it is slower.

While I would like to "simplify" our API and remove "StaticString" methods, I think we would have to keep them as removing them would slow down child elements lookup by 18% (however, it would reduce element creation penalty "String.intern()", but that is one time operation). Optionally, we could create a similar optimization to proposed "memory optimization" for known attribute names, or even check if manually created pool of queried element names would not be faster.

All tests are now in a separate branch issue #14.

@wojtek What do you think?

Wojciech Kapcia (Tigase) commented 3 years ago

Use ArrayList instead of LinkedList as it should be smaller and faster

+1

I'm somewhat split regarding String.intern() - it indeed is quite slow (https://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern#10628759), quite possibly "optimised" HashMap would be better option here. Same goes with comparison (identity vs equality). Yes, pure comparison with identity is faster but we should consider lookups in the pool. And I think that those tables could be relatively big (element names and attribute names could be small but their values not so much, considering high variation in JIDs for example; and I memory serves me right I saw that those tables used quite some space after a while). Again - we would have to review the code and check all the places where the comparison is made.

Regarding Optionals and usage . The problem with "using them in non-critical paths" would be non-enforceable thus it could happen that someone would use it without knowing the penalty (This applies to Stream methods as well). And even though with Value Classes this should be somewhat mitigated, it's still future. Alternatively we could use @Nullable annotation (could be from jetbrains) - it could add hints in IDE and we could add checks during build (we already have spotbugs-maven-plugin but it's not enabled by default).
Andrzej Wójcik (Tigase) commented 3 years ago

Same goes with comparison (identity vs equality). Yes, pure comparison with identity is faster but we should consider lookups in the pool. And I think that those tables could be relatively big (element names and attribute names could be small but their values not so much, considering high variation in JIDs for example; and I memory serves me right I saw that those tables used quite some space after a while). Again - we would have to review the code and check all the places where the comparison is made.

True, if we would "cache" all element names and attribute names, those tables would be huge. In recent Java hash tables from String.intern() were increased and in theory, it should speed up processing, however, looking at my results, suggests that this change was not so beneficial for us. Moreover, if we call String.intern() on many strings, this will slow down instance lookup internally in String.intern() implementation (see http://java-performance.info/string-intern-in-java-6-7-8/)

My optimized approach was to actually cache only "attribute names" known and commonly used (I know, this may have some drawbacks).

Please note, that we do not call String.intern() for attribute values.

Regarding Optionals and usage . The problem with "using them in non-critical paths" would be non-enforceable thus it could happen that someone would use it without knowing the penalty (This applies to Stream methods as well). And even though with Value Classes this should be somewhat mitigated, it's still future. Alternatively we could use @Nullable annotation (could be from jetbrains) - it could add hints in IDE and we could add checks during build (we already have spotbugs-maven-plugin but it's not enabled by default).

We can add @Nullable annotation to the method to get attribute value.

Usage of Optional and Stream is clean and convenient, but may have pently, so we need to decided if we accept it, or not.
Wojciech Kapcia (Tigase) commented 3 years ago

I think we (once again) have same stance on intern() and it's drawbacks, i.e. I was agreeing with you that our optimisation (for names) should be better for us.

Regarding performance penalties: I would avoid exposing Stream. Optional has some penalty but as you stated earlier - it's not that huge and in future JVM version should be even lower (though, at that time handling of nulls themself could be different. IMHO from the performance point of view anoting the methods with @Nullable would be best here - we can have compile-time checks (SpotBugs or PMD) to make sure that we do check for null, which would be virtually the same as the check enforced by Optional but with no overhead.
Andrzej Wójcik (Tigase) commented 3 years ago
Regarding performance penalties: I would avoid exposing Stream. Optional has some penalty but as you stated earlier - it's not that huge and in future JVM version should be even lower (though, at that time handling of nulls themself could be different. IMHO from the performance point of view anoting the methods with @Nullable would be best here - we can have compile-time checks (SpotBugs or PMD) to make sure that we do check for null, which would be virtually the same as the check enforced by Optional but with no overhead.

Maybe, just maybe, we could introduce functions:

T mapChild(String name, String xmlns, Function<Element,T>) This method would return null but the mapping function would be called only if the element was found.

void ifChildPresence(String name, String xmlns, Consumer<Element>)

This would allow us to implement "Optional-list" processing without the overhead of creation of the Optional instance.

I could prepare some implementations, so we could evaluate the performance of such methods to compare with the manual retrieval of element and check for null.

Andrzej Wójcik (Tigase) commented 3 years ago

I've rerun tests once again as those very high results of Optional were suspecious. The same tests with -Xms1g -Xmx1g generated in following results:

Benchmark                                                  Mode  Cnt   Score    Error  Units
ElementPerformanceTest.benchmarkFindChildAndClassic        avgt    5  11,334 ±  0,642  us/op
ElementPerformanceTest.benchmarkFindChildAndOptional       avgt    5  11,712 ±  0,445  us/op
ElementPerformanceTest.benchmarkFindChildArrayList         avgt    5  17,932 ±  0,809  us/op
ElementPerformanceTest.benchmarkFindChildLinkedList        avgt    5  18,308 ±  0,472  us/op
ElementPerformanceTest.benchmarkFindChildNew               avgt    5   1,663 ±  0,021  us/op
ElementPerformanceTest.benchmarkFindChildNewStatic         avgt    5   1,407 ±  0,025  us/op
ElementPerformanceTest.benchmarkFindChildNewX              avgt    5   1,709 ±  0,087  us/op
ElementPerformanceTest.benchmarkFindChildOldEmpty          avgt    5   0,019 ±  0,001  us/op
ElementPerformanceTest.benchmarkFindChildOldOptional       avgt    5   2,471 ±  0,075  us/op
ElementPerformanceTest.benchmarkFindChildOldStatic         avgt    5   1,408 ±  0,013  us/op
ElementPerformanceTest.benchmarkFlatMap                    avgt    5   0,079 ±  0,008  us/op
ElementPerformanceTest.benchmarkFlatMapStreamToList        avgt    5   0,201 ±  0,011  us/op
ElementPerformanceTest.benchmarkFlatMapStreamToList2       avgt    5   0,143 ±  0,003  us/op
ElementPerformanceTest.benchmarkHashMapGet                 avgt    5   0,055 ±  0,003  us/op
ElementPerformanceTest.benchmarkHashMapGetIdentity         avgt    5   0,470 ±  0,047  us/op
ElementPerformanceTest.benchmarkHashMapGetMemoryOptimized  avgt    5   0,066 ±  0,004  us/op
ElementPerformanceTest.benchmarkHashMapSet                 avgt    5   0,087 ±  0,002  us/op
ElementPerformanceTest.benchmarkHashMapSetIdentity         avgt    5   0,549 ±  0,005  us/op
ElementPerformanceTest.benchmarkHashMapSetMemoryOptimized  avgt    5   0,111 ±  0,005  us/op
ElementPerformanceTest.benchmarkIndentityMapGet            avgt    5   0,482 ±  0,024  us/op
ElementPerformanceTest.benchmarkIndentityMapGetStatic      avgt    5   0,047 ±  0,001  us/op
ElementPerformanceTest.benchmarkIndentityMapSet            avgt    5   0,523 ±  0,017  us/op

As we can see, Optional has still very high score, but almost 800us smaller. I suppose that this was due to the fact that new object is being created and JMH caused GC during the test and with increased memory heap, it was not run or had less executions.

Also Stream is a lot faster, still slower than iteration over List but almost 28% faster.

Andrzej Wójcik (Tigase) commented 3 years ago
I've fixed following tests and rerun tests (static string was passed instead of new instance, which caused equlas() to return true a lot faster than it should):

ElementPerformanceTest.benchmarkFindChildNew avgt 5 4,166 ± 0,259 us/op ElementPerformanceTest.benchmarkFindChildNewX avgt 5 4,476 ± 0,094 us/op
Wojciech Kapcia (Tigase) commented 3 years ago

Maybe, just maybe, we could introduce functions:

As commented yesterday - I'm somewhat sceptical towards such API in Element class.

I suppose that this was due to the fact that new object is being created and JMH caused GC during the test and with increased memory heap, it was not run or had less executions.

This is interesting point - maybe the GC wasn't run in the test (with increased heap size) but there was additional allocation which can be problematic during normal operation (i.e. triggering GC more often). AFAIR it should be possible to measure allocations in JMH tests - could we compare it?

Also Stream is a lot faster, still slower than iteration over List but almost 28% faster.

Wouldn't Stream simply be based on existing collection? Thus we could return simple List and if someone would need fancier processing of elements then it could call .stream() on that collection? Moreover, collections could be re-used, Streams not so much.
Andrzej Wójcik (Tigase) commented 3 years ago
Some more current results.

Benchmark Mode Cnt Score Error Units ElementPerformanceTest.benchmarkFindAndMapChild avgt 5 1,742 ± 0,103 us/op ElementPerformanceTest.benchmarkFindChildAndClassic avgt 5 11,337 ± 0,743 us/op ElementPerformanceTest.benchmarkFindChildAndOptional avgt 5 11,694 ± 0,358 us/op ElementPerformanceTest.benchmarkFindChildArrayList avgt 5 17,862 ± 0,316 us/op ElementPerformanceTest.benchmarkFindChildLinkedList avgt 5 18,486 ± 0,930 us/op ElementPerformanceTest.benchmarkFindChildNew avgt 5 4,078 ± 0,074 us/op ElementPerformanceTest.benchmarkFindChildNewStatic avgt 5 1,411 ± 0,012 us/op ElementPerformanceTest.benchmarkFindChildNewX avgt 5 4,432 ± 0,213 us/op ElementPerformanceTest.benchmarkFindChildOldEmpty avgt 5 2,485 ± 0,090 us/op ElementPerformanceTest.benchmarkFindChildOldOptional avgt 5 2,504 ± 0,176 us/op ElementPerformanceTest.benchmarkFindChildOldStatic avgt 5 1,418 ± 0,014 us/op ElementPerformanceTest.benchmarkFlatMap avgt 5 0,081 ± 0,002 us/op ElementPerformanceTest.benchmarkFlatMapManual avgt 5 0,072 ± 0,004 us/op ElementPerformanceTest.benchmarkFlatMapStreamToArray2 avgt 5 0,148 ± 0,006 us/op ElementPerformanceTest.benchmarkFlatMapStreamToList2 avgt 5 0,153 ± 0,004 us/op ElementPerformanceTest.benchmarkHashMapGet avgt 5 0,054 ± 0,001 us/op ElementPerformanceTest.benchmarkHashMapGetIdentity avgt 5 0,500 ± 0,008 us/op ElementPerformanceTest.benchmarkHashMapGetMemoryOptimized avgt 5 0,065 ± 0,002 us/op ElementPerformanceTest.benchmarkHashMapSet avgt 5 0,087 ± 0,002 us/op ElementPerformanceTest.benchmarkHashMapSetIdentity avgt 5 0,551 ± 0,024 us/op ElementPerformanceTest.benchmarkHashMapSetMemoryOptimized avgt 5 0,112 ± 0,003 us/op ElementPerformanceTest.benchmarkIndentityMapGet avgt 5 0,480 ± 0,010 us/op ElementPerformanceTest.benchmarkIndentityMapGetStatic avgt 5 0,049 ± 0,002 us/op ElementPerformanceTest.benchmarkIndentityMapSet avgt 5 0,524 ± 0,016 us/op ElementPerformanceTest.benchmarkMapStreamToList avgt 5 0,097 ± 0,003 us/op

NOTES:

Results for methods named benchmarkHashMap*, benchmarkIdentityMap* are times of 6 executions of those methods for different attribute names on Element class.

Results for methods named benchmarkFlatMap and benchmarkMapStreamToList are times of a single execution of those methods on Element class.

Results for methods named benchmarkFindChild* are times of 1000 executions of those methods on Element class.

During this iteration of tests, I've added benchmarkFindAndMapChild test (based on equality and not on instance equality, which uses "manually implemented" Optional-like return type allowing mapping of elements, attribute values, etc. The difference is, that it does not have isPresent() method as it doesn't catch state of the expression but evaluates it. Its get method returns null normally (in the oposition to Optional class from Java) and have (for now) less methods. This return type would be beneficial if we would use it for findChild or getAttribute as in some cases it would allow us to "map" returned value and it would provide "null checks" internally. The only "last" check on the result of get method would be required. That could simplify some code without overhead introduced by Optional.

Back to your questions:

This is interesting point - maybe the GC wasn't run in the test (with increased heap size) but there was additional allocation which can be problematic during normal operation (i.e. triggering GC more often). AFAIR it should be possible to measure allocations in JMH tests - could we compare it?

I think that there was no real issue with GC (small short lived objects should be removed fast), but there was no -Xms so I suppose that small amount of the memory was allocated and that forced JVM to resize HEAP - allocate more memory from the OS).

As for allocations, most likely small objects may increase GC, but they are short lived so that should be fast. If you would look closer on my tests na implementation of getChildren() and streamChildren() you will find out that getChildren() creates LinkedList every time it is called. On the other hand streamChildren() builds stream without collection (most likely there is something used internally).

To be honest, looking at latest results, stream with flatMap() (benchmarkFlatMapStreamToList2) is 2 times slower that benchmarkFlatMap. Both are using collections flattening.

On the other hand if you would look at benchmarkMapStreamToList, where we have a stream of children mapped with getName() to list of Strings, you can see that Stream performance without flatMap is not so bad (I would say that it is rather good).

Wouldn't Stream simply be based on existing collection? Thus we could return simple List and if someone would need fancier processing of elements then it could call .stream() on that collection? Moreover, collections could be re-used, Streams not so much.

Basing steam on the collection which needs to be created (or filtered) was slower in my tests. Manual creation of a Stream like in streamChildren() method was faster.

To sum up.

We need to keep "StaticString" methods as those are around 3 times faster.

We should replace LinkedList with ArrayList for performance and memory usage improvements.

I'm in favour of keeping Element:streamChildren() as it is a single method and it's impact is small to none and it is still faster that Element.getChildren()::stream().

With IdentityHashMap would suggest replacing it with HashMap and add this "simple" method with a switch replacing commonly used attribute names with their "static" references (not all of them, just some listed, like "to", "from", "id", "name", "version", "xmlns", "var", and maybe few others). That should improve performance (due to dropped calls to String.intern()) with small penalty for against getAttributeStaticString() but being far faster than getAttribute().

I have mixed fillings about using Optional. On the one hand, having methods returning Optional could be good, but could increase complexity of Element class and overuse of it could cause performance issues and people can "easily" wrap results of our methods with Optional.ofNullable(). On the other hand this performance may be an issue. It is about 80% slower, but execution times are quite fast (just 2.5us for a single execution) comparing to ie. getChildren().

Wojciech Kapcia (Tigase) commented 3 years ago

Thank you for running those tests. I made a couple of changes (from commit):

Add possibility to configure Element's List implementation

Remove test mode from configuration options (to allow overriding via annotation)

Add parameters to test execution: number of children elements of Element; whether to use first children from list or random (0…max children)

Rename certain test methods to make them easier to discern/understand

Removed for-loops in favour of running more iterations with Throughput benchmark mode

Could you please do check those changes and comment on them?

I added case with more sub-children as this can happen as well, though I think that majority of the request (so when we do perform lookup would be mostly IQ queries so rather limited number of children and most of the time checking first of them; with forms being the exception here). On the other hand results can have bigger amount of children but in that case we would most likely simply produce Element like that and there would be very limited querying/using "find*" methods IMHO thus I'm not sure how focused we should be on results with bigger stanzas / lookup of random child (sub-)element.

With the above I run two tests ( comparing .getChild() and checking for null versus using Optional.ofNullable) and the results are as follows:

Benchmark                                                     (clazz)  (maxChild)  (randomChild)   Mode  Cnt    Score    Error   Units
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList           1          false  thrpt    5  236,333 ±  3,763  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList           1           true  thrpt    5  234,281 ±  2,581  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList           5          false  thrpt    5  234,640 ±  4,701  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList           5           true  thrpt    5  116,522 ± 65,283  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList          10          false  thrpt    5  235,529 ±  5,840  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList          10           true  thrpt    5   95,278 ± 68,433  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList          20          false  thrpt    5  236,338 ±  5,741  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison   ArrayList          20           true  thrpt    5   79,458 ± 28,718  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList           1          false  thrpt    5  165,172 ±  4,443  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList           1           true  thrpt    5  167,101 ±  3,130  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList           5          false  thrpt    5  169,708 ±  2,190  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList           5           true  thrpt    5   83,583 ± 41,249  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList          10          false  thrpt    5  165,741 ±  4,012  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList          10           true  thrpt    5   81,997 ± 72,196  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList          20          false  thrpt    5  167,691 ±  1,806  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison  LinkedList          20           true  thrpt    5   63,790 ± 20,075  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList           1          false  thrpt    5  224,966 ± 14,476  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList           1           true  thrpt    5  225,953 ±  7,391  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList           5          false  thrpt    5  224,014 ±  5,323  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList           5           true  thrpt    5  107,219 ± 37,451  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList          10          false  thrpt    5  225,062 ±  3,534  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList          10           true  thrpt    5   83,064 ± 41,024  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList          20          false  thrpt    5  224,379 ±  6,866  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional         ArrayList          20           true  thrpt    5   78,561 ± 63,378  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList           1          false  thrpt    5  165,315 ±  4,145  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList           1           true  thrpt    5  166,405 ±  2,049  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList           5          false  thrpt    5  164,328 ±  3,828  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList           5           true  thrpt    5  103,478 ± 72,707  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList          10          false  thrpt    5  165,087 ±  1,678  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList          10           true  thrpt    5   87,306 ± 53,350  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList          20          false  thrpt    5  163,638 ±  4,182  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional        LinkedList          20           true  thrpt    5   50,018 ± 43,976  ops/us

So, for this small subset of tests:

ArrayList is faster LinkedList quite significantly (~45%), even for random elements and not only the first one
checking for nullness vs using Optional gives the former slight advantage (~5%)

I'm going to run tests for all methods later on (don't want to use the machine while it runs to avoid interruptions).

I think that there was no real issue with GC (small short lived objects should be removed fast), but there was no -Xms so I suppose that small amount of the memory was allocated and that forced JVM to resize HEAP - allocate more memory from the OS). As for allocations, most likely small objects may increase GC, but they are short lived so that should be fast. If you would look closer on my tests na implementation of getChildren() and streamChildren() you will find out that getChildren() creates LinkedList every time it is called. On the other hand streamChildren() builds stream without collection (most likely there is something used internally).

Yes, those should be removed fast but still this is additional allocation and in the current JVM version (without Value Classes) I think we should keep it in mind considering Element is the most fundamental piece of Tigase. (Yes, I'm slightly leaning towards returning null and using @Nullable annotation)

To be honest, looking at latest results, stream with flatMap() (benchmarkFlatMapStreamToList2) is 2 times slower that benchmarkFlatMap. Both are using collections flattening. On the other hand if you would look at benchmarkMapStreamToList , where we have a stream of children mapped with getName() to list of Strings, you can see that Stream performance without flatMap is not so bad (I would say that it is rather good).

To be honest I checked results from the table you provided and could confirm your observations on that data… though, while looking at the code I wasn't sure we are making exactly the same comparison in all cases (it would be better to group cases that should give same outcome but do it differently somehow for more clarity - maybe different classes?)

Regarding "finding child" use case, here is the extract for the case with Element with 20 children sub-elements backed by ArrayList and returning first child (this is throughput so higher=better):

ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          20          false  thrpt    5  536,225 ±  13,491  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          20          false  thrpt    5  133,136 ±   1,619  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          20          false  thrpt    5  132,562 ±   2,474  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          20          false  thrpt    5  137,221 ±   3,510  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          20          false  thrpt    5  563,661 ±   9,735  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          20          false  thrpt    5  562,216 ±  18,203  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          20          false  thrpt    5  553,154 ±   7,123  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          20          false  thrpt    5    5,723 ±   0,883  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          20          false  thrpt    5   10,629 ±   0,091  ops/us

What surprised me was poor result for StaticStr and findChild with predicate based on identity check (which is counterintuitive)

Wouldn't Stream simply be based on existing collection? Thus we could return simple List and if someone would need fancier processing of elements then it could call .stream() on that collection? Moreover, collections could be re-used, Streams not so much.

Basing steam on the collection which needs to be created (or filtered) was slower in my tests. Manual creation of a Stream like in streamChildren() method was faster.

To sum up.

We need to keep "StaticString" methods as those are around 3 times faster.

I assume you mean methods that favours identity (references) comparison instead of equality?

We should replace LinkedList with ArrayList for performance and memory usage improvements.

Agreed completely from what I saw.

I'm in favour of keeping Element:streamChildren() as it is a single method and it's impact is small to none and it is still faster that Element.getChildren()::stream().

Will do more testing (I think that with Elements with more children Stream methods could be slower but as I said - more testing needed).

Though, I think we should avoid giving to many choices and simply provide user with "single best method" to avoid someone using something "convenient" that could later on become a bottleneck. Of course more convenience methods are OK while they don't degrade performance significantly.

With IdentityHashMap would suggest replacing it with HashMap and add this "simple" method with a switch replacing commonly used attribute names with their "static" references (not all of them, just some listed, like "to", "from", "id", "name", "version", "xmlns", "var", and maybe few others). That should improve performance (due to dropped calls to String.intern()) with small penalty for against getAttributeStaticString() but being far faster than getAttribute().

This seems ok.

I have mixed fillings about using Optional. On the one hand, having methods returning Optional could be good, but could increase complexity of Element class and overuse of it could cause performance issues and people can "easily" wrap results of our methods with Optional.ofNullable(). On the other hand this performance may be an issue. It is about 80% slower, but execution times are quite fast (just 2.5us for a single execution) comparing to ie. getChildren().

Same conclusion.

Complete result from the current set of tests:

Benchmark                                                                         (clazz)  (maxChild)  (randomChild)   Mode  Cnt    Score     Error   Units
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           1          false  thrpt    5  539,085 ±  12,827  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           1           true  thrpt    5  540,144 ±   3,835  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           5          false  thrpt    5  535,902 ±  17,290  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           5           true  thrpt    5  325,764 ± 432,496  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          10          false  thrpt    5  539,969 ±  10,823  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          10           true  thrpt    5  229,710 ± 573,576  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          20          false  thrpt    5  536,225 ±  13,491  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          20           true  thrpt    5  125,397 ± 174,318  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           1          false  thrpt    5  523,468 ±   5,786  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           1           true  thrpt    5  518,944 ±  21,567  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           5          false  thrpt    5  534,196 ±   5,688  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           5           true  thrpt    5  256,962 ± 274,198  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          10          false  thrpt    5  537,580 ±  14,030  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          10           true  thrpt    5  154,408 ± 348,009  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          20          false  thrpt    5  534,719 ±   4,177  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          20           true  thrpt    5   78,444 ±  97,766  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           1          false  thrpt    5  747,264 ±  21,063  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           1           true  thrpt    5  749,625 ±  13,905  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           5          false  thrpt    5  328,784 ±   2,578  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           5           true  thrpt    5  330,592 ±   5,394  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          10          false  thrpt    5  218,086 ±   2,793  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          10           true  thrpt    5  218,747 ±   4,519  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          20          false  thrpt    5  133,136 ±   1,619  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          20           true  thrpt    5  132,662 ±   0,402  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           1          false  thrpt    5  871,265 ±  11,243  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           1           true  thrpt    5  871,231 ±   7,048  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           5          false  thrpt    5  324,462 ±   2,666  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           5           true  thrpt    5  324,440 ±   3,570  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          10          false  thrpt    5  162,820 ±   2,243  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          10           true  thrpt    5  162,260 ±   2,747  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          20          false  thrpt    5   76,111 ±   1,141  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          20           true  thrpt    5   75,369 ±   0,667  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           1          false  thrpt    5  680,177 ±   6,690  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           1           true  thrpt    5  724,642 ±  20,068  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           5          false  thrpt    5  332,793 ±   8,871  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           5           true  thrpt    5  332,448 ±  11,752  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          10          false  thrpt    5  216,507 ±   3,911  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          10           true  thrpt    5  216,765 ±   2,027  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          20          false  thrpt    5  132,562 ±   2,474  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          20           true  thrpt    5  132,102 ±   0,899  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           1          false  thrpt    5  762,051 ±  13,793  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           1           true  thrpt    5  814,414 ±  10,604  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           5          false  thrpt    5  340,997 ±   5,704  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           5           true  thrpt    5  322,356 ±  10,800  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          10          false  thrpt    5  163,174 ±   4,264  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          10           true  thrpt    5  164,148 ±   4,126  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          20          false  thrpt    5   75,072 ±   0,951  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          20           true  thrpt    5   75,202 ±   0,373  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           1          false  thrpt    5  829,788 ±   2,980  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           1           true  thrpt    5  839,544 ±  19,739  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           5          false  thrpt    5  329,963 ±   9,489  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           5           true  thrpt    5  329,263 ±   1,186  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          10          false  thrpt    5  215,261 ±   0,931  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          10           true  thrpt    5  216,895 ±   4,538  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          20          false  thrpt    5  137,221 ±   3,510  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          20           true  thrpt    5  137,187 ±   2,423  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           1          false  thrpt    5  862,107 ±  26,565  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           1           true  thrpt    5  861,248 ±  12,594  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           5          false  thrpt    5  347,114 ±   7,143  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           5           true  thrpt    5  346,358 ±  10,022  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          10          false  thrpt    5  169,617 ±   1,709  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          10           true  thrpt    5  169,996 ±   4,278  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          20          false  thrpt    5   79,374 ±   2,345  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          20           true  thrpt    5   76,332 ±   0,584  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           1          false  thrpt    5  549,403 ±  17,548  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           1           true  thrpt    5  550,733 ±  12,833  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           5          false  thrpt    5  558,461 ±  10,447  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           5           true  thrpt    5  251,083 ± 325,713  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          10          false  thrpt    5  557,759 ±  16,369  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          10           true  thrpt    5  165,897 ± 190,680  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          20          false  thrpt    5  563,661 ±   9,735  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          20           true  thrpt    5  175,458 ± 270,762  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           1          false  thrpt    5  543,895 ±  11,502  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           1           true  thrpt    5  546,446 ±   5,638  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           5          false  thrpt    5  557,672 ±   8,784  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           5           true  thrpt    5  271,937 ± 373,001  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          10          false  thrpt    5  561,501 ±  18,048  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          10           true  thrpt    5  233,095 ± 412,933  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          20          false  thrpt    5  561,397 ±  14,286  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          20           true  thrpt    5  108,231 ± 188,377  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           1          false  thrpt    5  558,711 ±  11,616  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           1           true  thrpt    5  554,554 ±   7,128  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           5          false  thrpt    5  562,505 ±  10,916  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           5           true  thrpt    5  220,350 ± 203,325  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          10          false  thrpt    5  560,544 ±   7,321  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          10           true  thrpt    5  158,713 ± 134,166  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          20          false  thrpt    5  562,216 ±  18,203  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          20           true  thrpt    5  128,479 ± 178,694  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           1          false  thrpt    5  540,242 ±   9,092  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           1           true  thrpt    5  540,163 ±  15,128  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           5          false  thrpt    5  559,354 ±   7,952  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           5           true  thrpt    5  240,311 ± 294,342  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          10          false  thrpt    5  551,215 ±   7,223  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          10           true  thrpt    5  162,254 ±  91,732  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          20          false  thrpt    5  565,501 ±   5,429  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          20           true  thrpt    5   88,613 ±  90,993  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           1          false  thrpt    5  559,473 ±  10,822  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           1           true  thrpt    5  559,713 ±  34,239  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           5          false  thrpt    5  554,958 ±  16,232  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           5           true  thrpt    5  247,863 ± 236,474  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          10          false  thrpt    5  554,524 ±  31,806  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          10           true  thrpt    5  206,893 ± 268,016  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          20          false  thrpt    5  553,154 ±   7,123  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          20           true  thrpt    5  103,946 ±  73,610  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           1          false  thrpt    5  544,949 ±   9,985  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           1           true  thrpt    5  537,616 ±  19,101  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           5          false  thrpt    5  555,715 ±  30,640  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           5           true  thrpt    5  146,804 ± 121,396  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          10          false  thrpt    5  552,609 ±  18,380  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          10           true  thrpt    5   88,238 ±  99,413  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          20          false  thrpt    5  563,031 ±  22,835  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          20           true  thrpt    5   92,659 ± 150,276  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           1          false  thrpt    5   26,496 ±   5,228  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           1           true  thrpt    5   28,164 ±  15,615  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           5          false  thrpt    5   13,133 ±   6,236  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           5           true  thrpt    5   14,722 ±   1,615  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          10          false  thrpt    5    9,755 ±   1,544  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          10           true  thrpt    5   10,068 ±   2,511  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          20          false  thrpt    5    5,723 ±   0,883  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          20           true  thrpt    5    4,611 ±   1,429  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           1          false  thrpt    5   25,445 ±   9,211  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           1           true  thrpt    5   27,543 ±  10,269  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           5          false  thrpt    5   14,769 ±   4,796  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           5           true  thrpt    5   13,981 ±   2,563  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          10          false  thrpt    5    9,699 ±   6,985  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          10           true  thrpt    5    9,998 ±   5,641  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          20          false  thrpt    5    6,045 ±   1,903  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          20           true  thrpt    5    6,180 ±   2,457  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           1          false  thrpt    5   95,110 ±   1,808  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           1           true  thrpt    5   96,115 ±   1,684  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           5          false  thrpt    5   40,001 ±   0,567  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           5           true  thrpt    5   42,873 ±   0,697  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          10          false  thrpt    5   31,145 ±   0,345  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          10           true  thrpt    5   31,216 ±   0,493  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          20          false  thrpt    5   10,629 ±   0,091  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          20           true  thrpt    5   10,617 ±   0,226  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           1          false  thrpt    5   93,644 ±   1,316  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           1           true  thrpt    5   94,876 ±   1,754  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           5          false  thrpt    5   40,840 ±   0,641  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           5           true  thrpt    5   41,068 ±   1,384  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          10          false  thrpt    5   28,466 ±   0,274  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          10           true  thrpt    5   28,724 ±   0,299  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          20          false  thrpt    5   10,396 ±   0,202  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          20           true  thrpt    5   11,179 ±   0,176  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           1          false  thrpt    5  234,901 ±   8,647  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           1           true  thrpt    5  235,774 ±   5,994  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           5          false  thrpt    5  236,102 ±   5,545  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           5           true  thrpt    5  108,319 ±  40,833  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          10          false  thrpt    5  235,172 ±   6,003  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          10           true  thrpt    5  104,831 ±  75,528  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          20          false  thrpt    5  236,018 ±   4,367  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          20           true  thrpt    5   89,975 ±  65,232  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           1          false  thrpt    5  165,081 ±   4,258  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           1           true  thrpt    5  165,097 ±  10,391  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           5          false  thrpt    5  166,439 ±   3,500  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           5           true  thrpt    5   85,197 ±  87,759  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          10          false  thrpt    5  169,921 ±   3,563  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          10           true  thrpt    5   88,654 ± 101,693  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          20          false  thrpt    5  169,330 ±   1,605  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          20           true  thrpt    5   55,874 ±  49,351  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           1          false  thrpt    5  225,389 ±   6,782  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           1           true  thrpt    5  224,127 ±   5,849  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           5          false  thrpt    5  225,726 ±   4,396  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           5           true  thrpt    5  118,939 ±  35,815  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          10          false  thrpt    5  219,142 ±  11,629  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          10           true  thrpt    5  104,793 ±  30,632  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          20          false  thrpt    5  224,347 ±  10,909  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          20           true  thrpt    5   86,632 ±  78,354  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           1          false  thrpt    5  166,167 ±   3,320  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           1           true  thrpt    5  169,203 ±   2,698  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           5          false  thrpt    5  165,869 ±   3,918  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           5           true  thrpt    5   97,427 ±  66,403  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          10          false  thrpt    5  165,894 ±   2,915  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          10           true  thrpt    5   68,796 ±  15,427  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          20          false  thrpt    5  164,723 ±   4,912  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          20           true  thrpt    5   61,574 ±  99,175  ops/us
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           1          false   avgt    5    0,024 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           1           true   avgt    5    0,024 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           5          false   avgt    5    0,101 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           5           true   avgt    5    0,100 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          10          false   avgt    5    0,216 ±   0,012   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          10           true   avgt    5    0,234 ±   0,019   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          20          false   avgt    5    0,416 ±   0,011   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          20           true   avgt    5    0,409 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           1          false   avgt    5    0,025 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           1           true   avgt    5    0,025 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           5          false   avgt    5    0,141 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           5           true   avgt    5    0,171 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          10          false   avgt    5    0,266 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          10           true   avgt    5    0,272 ±   0,012   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          20          false   avgt    5    0,468 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          20           true   avgt    5    0,479 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           1          false   avgt    5    0,030 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           1           true   avgt    5    0,031 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           5          false   avgt    5    0,126 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           5           true   avgt    5    0,125 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          10          false   avgt    5    0,247 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          10           true   avgt    5    0,247 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          20          false   avgt    5    0,493 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          20           true   avgt    5    0,494 ±   0,013   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           1          false   avgt    5    0,035 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           1           true   avgt    5    0,035 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           5          false   avgt    5    0,152 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           5           true   avgt    5    0,148 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          10          false   avgt    5    0,281 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          10           true   avgt    5    0,285 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          20          false   avgt    5    0,570 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          20           true   avgt    5    0,568 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           1          false   avgt    5    0,140 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           1           true   avgt    5    0,144 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           5          false   avgt    5    0,397 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           5           true   avgt    5    0,404 ±   0,016   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          10          false   avgt    5    0,715 ±   0,013   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          10           true   avgt    5    0,714 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          20          false   avgt    5    1,520 ±   0,041   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          20           true   avgt    5    1,454 ±   0,030   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           1          false   avgt    5    0,142 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           1           true   avgt    5    0,138 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           5          false   avgt    5    0,375 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           5           true   avgt    5    0,399 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          10          false   avgt    5    0,654 ±   0,027   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          10           true   avgt    5    0,698 ±   0,010   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          20          false   avgt    5    1,323 ±   0,014   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          20           true   avgt    5    1,255 ±   0,035   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           1          false   avgt    5    0,059 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           1           true   avgt    5    0,052 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           5          false   avgt    5    0,207 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           5           true   avgt    5    0,202 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          10          false   avgt    5    0,226 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          10           true   avgt    5    0,226 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          20          false   avgt    5    0,589 ±   0,011   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          20           true   avgt    5    0,594 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           1          false   avgt    5    0,058 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           1           true   avgt    5    0,053 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           5          false   avgt    5    0,136 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           5           true   avgt    5    0,136 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          10          false   avgt    5    0,227 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          10           true   avgt    5    0,407 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          20          false   avgt    5    0,626 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          20           true   avgt    5    0,625 ±   0,019   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           1          false   avgt    5    0,086 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           1           true   avgt    5    0,086 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           5          false   avgt    5    0,259 ±   0,013   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           5           true   avgt    5    0,258 ±   0,010   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          10          false   avgt    5    0,440 ±   0,012   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          10           true   avgt    5    0,433 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          20          false   avgt    5    0,639 ±   0,026   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          20           true   avgt    5    0,646 ±   0,016   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           1          false   avgt    5    0,089 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           1           true   avgt    5    0,090 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           5          false   avgt    5    0,235 ±   0,008   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           5           true   avgt    5    0,236 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          10          false   avgt    5    0,402 ±   0,012   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          10           true   avgt    5    0,400 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          20          false   avgt    5    0,637 ±   0,016   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          20           true   avgt    5    0,638 ±   0,023   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           1          false   avgt    5    0,060 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           1           true   avgt    5    0,058 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           5          false   avgt    5    0,134 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           5           true   avgt    5    0,135 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          10          false   avgt    5    0,384 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          10           true   avgt    5    0,220 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          20          false   avgt    5    0,565 ±   0,008   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          20           true   avgt    5    0,548 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           1          false   avgt    5    0,060 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           1           true   avgt    5    0,062 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           5          false   avgt    5    0,135 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           5           true   avgt    5    0,233 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          10          false   avgt    5    0,224 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          10           true   avgt    5    0,222 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          20          false   avgt    5    0,609 ±   0,014   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          20           true   avgt    5    0,512 ±   0,008   us/op
ElementPerformanceTest.benchmarkHashMapGet                                            N/A         N/A            N/A   avgt    5    0,031 ±   0,001   us/op
ElementPerformanceTest.benchmarkHashMapGetIdentity                                    N/A         N/A            N/A   avgt    5    0,666 ±   0,007   us/op
ElementPerformanceTest.benchmarkHashMapGetMemoryOptimized                             N/A         N/A            N/A   avgt    5    0,054 ±   0,001   us/op
ElementPerformanceTest.benchmarkHashMapSet                                            N/A         N/A            N/A   avgt    5    0,019 ±   0,001   us/op
ElementPerformanceTest.benchmarkHashMapSetIdentity                                    N/A         N/A            N/A   avgt    5    0,662 ±   0,004   us/op
ElementPerformanceTest.benchmarkHashMapSetMemoryOptimized                             N/A         N/A            N/A   avgt    5    0,044 ±   0,001   us/op
ElementPerformanceTest.benchmarkIndentityMapGet                                       N/A         N/A            N/A   avgt    5    0,647 ±   0,014   us/op
ElementPerformanceTest.benchmarkIndentityMapGetStatic                                 N/A         N/A            N/A   avgt    5    0,032 ±   0,001   us/op
ElementPerformanceTest.benchmarkIndentityMapSet                                       N/A         N/A            N/A   avgt    5    0,658 ±   0,007   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           1          false   avgt    5    0,048 ±   0,002   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           1           true   avgt    5    0,047 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           5          false   avgt    5    0,088 ±   0,002   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           5           true   avgt    5    0,087 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          10          false   avgt    5    0,117 ±   0,005   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          10           true   avgt    5    0,114 ±   0,012   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          20          false   avgt    5    0,213 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          20           true   avgt    5    0,212 ±   0,012   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           1          false   avgt    5    0,048 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           1           true   avgt    5    0,048 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           5          false   avgt    5    0,084 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           5           true   avgt    5    0,083 ±   0,002   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          10          false   avgt    5    0,123 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          10           true   avgt    5    0,123 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          20          false   avgt    5    0,266 ±   0,004   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          20           true   avgt    5    0,267 ±   0,013   us/op

Andrzej Wójcik (Tigase) commented 3 years ago

I've rerun tests on my device (just for confirmation of the results):

Benchmark                                                                         (clazz)  (maxChild)  (randomChild)   Mode  Cnt     Score     Error   Units
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           1          false  thrpt    5   423,188 ±  10,806  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           1           true  thrpt    5   423,261 ±   3,969  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           5          false  thrpt    5   417,632 ±  35,821  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList           5           true  thrpt    5   154,873 ±  80,264  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          10          false  thrpt    5   420,095 ±  13,134  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          10           true  thrpt    5    86,880 ±  57,611  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          20          false  thrpt    5   417,735 ±   7,310  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                               ArrayList          20           true  thrpt    5    91,805 ±  45,995  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           1          false  thrpt    5   370,248 ±   3,082  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           1           true  thrpt    5   369,220 ±  14,114  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           5          false  thrpt    5   375,413 ±  18,409  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList           5           true  thrpt    5   231,619 ± 251,507  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          10          false  thrpt    5   373,015 ±  42,821  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          10           true  thrpt    5    87,188 ±  68,971  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          20          false  thrpt    5   380,034 ±   7,764  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                              LinkedList          20           true  thrpt    5   120,767 ± 442,771  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           1          false  thrpt    5  1120,717 ±  10,730  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           1           true  thrpt    5  1121,619 ±   4,722  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           5          false  thrpt    5  1122,085 ±   9,664  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList           5           true  thrpt    5   441,373 ± 356,780  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          10          false  thrpt    5  1118,401 ±  10,219  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          10           true  thrpt    5   387,174 ± 452,062  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          20          false  thrpt    5  1113,951 ±  20,243  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                      ArrayList          20           true  thrpt    5   262,479 ± 233,681  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           1          false  thrpt    5  1059,517 ±  29,829  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           1           true  thrpt    5  1054,566 ±  18,752  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           5          false  thrpt    5  1008,539 ±  13,921  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList           5           true  thrpt    5   537,933 ± 751,115  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          10          false  thrpt    5  1009,320 ±  11,129  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          10           true  thrpt    5   457,063 ± 875,516  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          20          false  thrpt    5   995,362 ±  44,941  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStr                     LinkedList          20           true  thrpt    5   168,266 ± 275,345  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           1          false  thrpt    5  1113,747 ±  18,743  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           1           true  thrpt    5  1100,001 ±  26,638  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           5          false  thrpt    5  1104,330 ± 101,606  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList           5           true  thrpt    5   454,811 ± 367,371  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          10          false  thrpt    5  1117,008 ±   4,354  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          10           true  thrpt    5   423,192 ± 336,249  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          20          false  thrpt    5  1100,559 ±  91,876  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional              ArrayList          20           true  thrpt    5   216,928 ± 252,530  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           1          false  thrpt    5  1057,063 ±  13,297  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           1           true  thrpt    5  1058,902 ±  15,367  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           5          false  thrpt    5  1006,604 ±  10,338  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList           5           true  thrpt    5   522,616 ± 551,944  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          10          false  thrpt    5   999,878 ±  21,332  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          10           true  thrpt    5   286,018 ± 153,200  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          20          false  thrpt    5  1002,240 ±  34,059  ops/us
ElementPerformanceTest.benchmarkFindChildGetChildStaticStrOptional             LinkedList          20           true  thrpt    5   203,783 ± 318,287  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           1          false  thrpt    5   478,378 ±   7,768  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           1           true  thrpt    5   476,888 ±   9,098  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           5          false  thrpt    5   282,199 ±   5,693  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList           5           true  thrpt    5   256,032 ±  27,086  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          10          false  thrpt    5   182,493 ±  10,158  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          10           true  thrpt    5   184,106 ±   4,837  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          20          false  thrpt    5   104,716 ±   7,860  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                               ArrayList          20           true  thrpt    5   104,527 ±   1,885  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           1          false  thrpt    5   529,952 ±  38,298  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           1           true  thrpt    5   535,282 ±   8,899  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           5          false  thrpt    5   292,701 ±   1,319  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList           5           true  thrpt    5   291,346 ±  16,194  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          10          false  thrpt    5   191,617 ±  13,623  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          10           true  thrpt    5   191,870 ±   4,746  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          20          false  thrpt    5    91,443 ±   5,410  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                              LinkedList          20           true  thrpt    5    91,468 ±   3,164  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           1          false  thrpt    5   441,789 ±  34,051  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           1           true  thrpt    5   449,293 ±   6,070  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           5          false  thrpt    5   445,836 ±  34,837  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList           5           true  thrpt    5   202,783 ± 190,396  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          10          false  thrpt    5   448,042 ±   9,499  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          10           true  thrpt    5   123,082 ± 161,207  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          20          false  thrpt    5   446,923 ±  12,915  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                             ArrayList          20           true  thrpt    5    65,456 ±  44,718  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           1          false  thrpt    5   375,530 ±  23,980  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           1           true  thrpt    5   379,721 ±   4,949  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           5          false  thrpt    5   396,859 ±   5,893  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList           5           true  thrpt    5   170,482 ± 236,733  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          10          false  thrpt    5   387,271 ±  44,871  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          10           true  thrpt    5   142,712 ± 264,626  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          20          false  thrpt    5   392,170 ±  29,970  ops/us
ElementPerformanceTest.benchmarkFindChildNewMatcher                            LinkedList          20           true  thrpt    5    86,316 ± 118,652  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           1          false  thrpt    5   367,263 ±  25,725  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           1           true  thrpt    5   368,246 ±   7,986  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           5          false  thrpt    5   367,446 ±  29,559  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList           5           true  thrpt    5   136,297 ± 134,230  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          10          false  thrpt    5   369,060 ±  25,805  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          10           true  thrpt    5    95,418 ±  43,454  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          20          false  thrpt    5   365,791 ±  42,911  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                     ArrayList          20           true  thrpt    5    83,119 ±  35,294  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           1          false  thrpt    5   345,709 ±  12,167  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           1           true  thrpt    5   343,125 ±   6,945  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           5          false  thrpt    5   365,058 ±   8,170  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList           5           true  thrpt    5   153,322 ± 205,752  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          10          false  thrpt    5   366,896 ±   2,651  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          10           true  thrpt    5    69,944 ±  13,082  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          20          false  thrpt    5   366,292 ±   2,650  ops/us
ElementPerformanceTest.benchmarkFindChildNullableAndGetName                    LinkedList          20           true  thrpt    5    69,931 ± 107,890  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           1          false  thrpt    5   365,005 ±   4,115  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           1           true  thrpt    5   365,087 ±   6,733  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           5          false  thrpt    5   366,408 ±   3,523  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList           5           true  thrpt    5   107,241 ±  87,631  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          10          false  thrpt    5   365,261 ±   7,046  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          10           true  thrpt    5   117,375 ±  71,044  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          20          false  thrpt    5   365,732 ±   5,985  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName               ArrayList          20           true  thrpt    5    87,111 ± 118,842  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           1          false  thrpt    5   344,761 ±   8,656  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           1           true  thrpt    5   346,667 ±   5,627  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           5          false  thrpt    5   365,892 ±   4,214  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList           5           true  thrpt    5   152,747 ± 147,831  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          10          false  thrpt    5   365,291 ±   5,595  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          10           true  thrpt    5    85,660 ±  66,327  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          20          false  thrpt    5   366,091 ±   7,036  ops/us
ElementPerformanceTest.benchmarkFindChildOptionalMapperAndGetName              LinkedList          20           true  thrpt    5    69,969 ±  75,417  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           1          false  thrpt    5    23,497 ±   7,405  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           1           true  thrpt    5    22,890 ±  10,142  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           5          false  thrpt    5    13,440 ±   3,780  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList           5           true  thrpt    5    14,397 ±   5,174  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          10          false  thrpt    5     8,761 ±   3,489  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          10           true  thrpt    5     8,735 ±   3,663  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          20          false  thrpt    5     5,324 ±   2,596  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                   ArrayList          20           true  thrpt    5     5,194 ±   2,759  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           1          false  thrpt    5    22,971 ±   3,217  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           1           true  thrpt    5    22,484 ±   3,902  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           5          false  thrpt    5    13,323 ±   3,847  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList           5           true  thrpt    5    13,171 ±   2,877  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          10          false  thrpt    5     9,126 ±   3,925  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          10           true  thrpt    5     8,644 ±   3,996  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          20          false  thrpt    5     5,115 ±   2,124  ops/us
ElementPerformanceTest.benchmarkFindChildStreamDirectIdentity                  LinkedList          20           true  thrpt    5     5,131 ±   1,933  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           1          false  thrpt    5    73,515 ±   3,224  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           1           true  thrpt    5    71,614 ±  13,624  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           5          false  thrpt    5    32,084 ±   0,641  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList           5           true  thrpt    5    30,543 ±   1,601  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          10          false  thrpt    5    22,185 ±   0,507  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          10           true  thrpt    5    21,383 ±   0,890  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          20          false  thrpt    5     9,211 ±   0,421  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                         ArrayList          20           true  thrpt    5     8,646 ±   0,081  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           1          false  thrpt    5    71,451 ±   2,204  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           1           true  thrpt    5    69,768 ±  13,318  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           5          false  thrpt    5    30,896 ±   1,636  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList           5           true  thrpt    5    30,616 ±   0,720  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          10          false  thrpt    5    21,131 ±   0,658  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          10           true  thrpt    5    21,290 ±   0,552  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          20          false  thrpt    5     9,016 ±   1,209  ops/us
ElementPerformanceTest.benchmarkFindChildStreamIdentity                        LinkedList          20           true  thrpt    5     8,820 ±   1,062  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           1          false  thrpt    5   151,586 ±   3,986  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           1           true  thrpt    5   150,688 ±   1,427  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           5          false  thrpt    5   148,141 ±  26,464  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList           5           true  thrpt    5   104,481 ±  66,781  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          10          false  thrpt    5   150,837 ±   3,175  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          10           true  thrpt    5    79,177 ±  50,146  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          20          false  thrpt    5   154,036 ±   3,832  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                       ArrayList          20           true  thrpt    5    56,121 ±  44,353  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           1          false  thrpt    5   130,458 ±  23,240  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           1           true  thrpt    5   129,622 ±   6,216  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           5          false  thrpt    5   119,111 ±  11,161  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList           5           true  thrpt    5    89,627 ±  53,418  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          10          false  thrpt    5   133,007 ±   5,347  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          10           true  thrpt    5    75,890 ±  39,071  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          20          false  thrpt    5   132,784 ±   1,293  ops/us
ElementPerformanceTest.benchmarkGetChildAndNullComparison                      LinkedList          20           true  thrpt    5    44,435 ±  21,761  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           1          false  thrpt    5   168,183 ±   4,521  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           1           true  thrpt    5   171,855 ±   4,203  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           5          false  thrpt    5   167,970 ±  19,891  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList           5           true  thrpt    5    89,849 ±  28,360  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          10          false  thrpt    5   172,189 ±  14,855  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          10           true  thrpt    5    77,608 ±  51,003  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          20          false  thrpt    5   174,644 ±  15,135  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                             ArrayList          20           true  thrpt    5    69,575 ±  90,217  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           1          false  thrpt    5   143,939 ±   1,594  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           1           true  thrpt    5   141,766 ±   3,514  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           5          false  thrpt    5   140,080 ±   3,225  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList           5           true  thrpt    5   114,170 ±  63,605  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          10          false  thrpt    5   142,609 ±   3,737  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          10           true  thrpt    5    78,885 ±  64,933  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          20          false  thrpt    5   139,749 ±   5,379  ops/us
ElementPerformanceTest.benchmarkGetChildAndOptional                            LinkedList          20           true  thrpt    5    64,690 ±  51,087  ops/us
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           1          false   avgt    5     0,033 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           1           true   avgt    5     0,033 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           5          false   avgt    5     0,122 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList           5           true   avgt    5     0,124 ±   0,024   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          10          false   avgt    5     0,236 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          10           true   avgt    5     0,238 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          20          false   avgt    5     0,480 ±   0,008   us/op
ElementPerformanceTest.benchmarkFlatMap                                         ArrayList          20           true   avgt    5     0,493 ±   0,020   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           1          false   avgt    5     0,033 ±   0,010   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           1           true   avgt    5     0,031 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           5          false   avgt    5     0,133 ±   0,019   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList           5           true   avgt    5     0,130 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          10          false   avgt    5     0,256 ±   0,025   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          10           true   avgt    5     0,237 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          20          false   avgt    5     0,497 ±   0,052   us/op
ElementPerformanceTest.benchmarkFlatMap                                        LinkedList          20           true   avgt    5     0,503 ±   0,079   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           1          false   avgt    5     0,028 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           1           true   avgt    5     0,029 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           5          false   avgt    5     0,114 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList           5           true   avgt    5     0,117 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          10          false   avgt    5     0,224 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          10           true   avgt    5     0,223 ±   0,022   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          20          false   avgt    5     0,447 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                   ArrayList          20           true   avgt    5     0,448 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           1          false   avgt    5     0,029 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           1           true   avgt    5     0,030 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           5          false   avgt    5     0,115 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList           5           true   avgt    5     0,115 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          10          false   avgt    5     0,220 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          10           true   avgt    5     0,220 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          20          false   avgt    5     0,435 ±   0,008   us/op
ElementPerformanceTest.benchmarkFlatMapManual                                  LinkedList          20           true   avgt    5     0,448 ±   0,085   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           1          false   avgt    5     0,186 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           1           true   avgt    5     0,187 ±   0,033   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           5          false   avgt    5     0,529 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList           5           true   avgt    5     0,532 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          10          false   avgt    5     0,933 ±   0,020   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          10           true   avgt    5     0,972 ±   0,211   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          20          false   avgt    5     1,859 ±   0,040   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect   ArrayList          20           true   avgt    5     1,852 ±   0,111   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           1          false   avgt    5     0,186 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           1           true   avgt    5     0,186 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           5          false   avgt    5     0,521 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList           5           true   avgt    5     0,541 ±   0,009   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          10          false   avgt    5     0,935 ±   0,018   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          10           true   avgt    5     0,959 ±   0,018   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          20          false   avgt    5     1,903 ±   0,131   us/op
ElementPerformanceTest.benchmarkFlatMapStreamDirectToListStreamChildrenDirect  LinkedList          20           true   avgt    5     1,827 ±   0,017   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           1          false   avgt    5     0,072 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           1           true   avgt    5     0,073 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           5          false   avgt    5     0,191 ±   0,013   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList           5           true   avgt    5     0,191 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          10          false   avgt    5     0,336 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          10           true   avgt    5     0,336 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          20          false   avgt    5     0,680 ±   0,128   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                            ArrayList          20           true   avgt    5     0,688 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           1          false   avgt    5     0,073 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           1           true   avgt    5     0,073 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           5          false   avgt    5     0,193 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList           5           true   avgt    5     0,194 ±   0,032   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          10          false   avgt    5     0,322 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          10           true   avgt    5     0,344 ±   0,055   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          20          false   avgt    5     0,694 ±   0,097   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToArray                           LinkedList          20           true   avgt    5     0,684 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           1          false   avgt    5     0,117 ±   0,008   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           1           true   avgt    5     0,118 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           5          false   avgt    5     0,260 ±   0,026   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList           5           true   avgt    5     0,257 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          10          false   avgt    5     0,436 ±   0,087   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          10           true   avgt    5     0,428 ±   0,005   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          20          false   avgt    5     0,913 ±   0,038   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream            ArrayList          20           true   avgt    5     0,906 ±   0,010   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           1          false   avgt    5     0,119 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           1           true   avgt    5     0,119 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           5          false   avgt    5     0,273 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList           5           true   avgt    5     0,270 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          10          false   avgt    5     0,451 ±   0,010   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          10           true   avgt    5     0,442 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          20          false   avgt    5     0,758 ±   0,018   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListGetChildrenStream           LinkedList          20           true   avgt    5     0,756 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           1          false   avgt    5     0,072 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           1           true   avgt    5     0,077 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           5          false   avgt    5     0,180 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList           5           true   avgt    5     0,196 ±   0,034   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          10          false   avgt    5     0,330 ±   0,006   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          10           true   avgt    5     0,341 ±   0,053   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          20          false   avgt    5     0,663 ±   0,015   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren               ArrayList          20           true   avgt    5     0,688 ±   0,004   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           1          false   avgt    5     0,075 ±   0,001   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           1           true   avgt    5     0,081 ±   0,053   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           5          false   avgt    5     0,179 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList           5           true   avgt    5     0,183 ±   0,035   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          10          false   avgt    5     0,313 ±   0,003   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          10           true   avgt    5     0,285 ±   0,007   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          20          false   avgt    5     0,665 ±   0,002   us/op
ElementPerformanceTest.benchmarkFlatMapStreamToListStreamChildren              LinkedList          20           true   avgt    5     0,693 ±   0,103   us/op
ElementPerformanceTest.benchmarkHashMapGet                                            N/A         N/A            N/A   avgt    5     0,055 ±   0,001   us/op
ElementPerformanceTest.benchmarkHashMapGetIdentity                                    N/A         N/A            N/A   avgt    5     0,468 ±   0,005   us/op
ElementPerformanceTest.benchmarkHashMapGetMemoryOptimized                             N/A         N/A            N/A   avgt    5     0,064 ±   0,001   us/op
ElementPerformanceTest.benchmarkHashMapSet                                            N/A         N/A            N/A   avgt    5     0,086 ±   0,002   us/op
ElementPerformanceTest.benchmarkHashMapSetIdentity                                    N/A         N/A            N/A   avgt    5     0,563 ±   0,006   us/op
ElementPerformanceTest.benchmarkHashMapSetMemoryOptimized                             N/A         N/A            N/A   avgt    5     0,112 ±   0,010   us/op
ElementPerformanceTest.benchmarkIndentityMapGet                                       N/A         N/A            N/A   avgt    5     0,467 ±   0,010   us/op
ElementPerformanceTest.benchmarkIndentityMapGetStatic                                 N/A         N/A            N/A   avgt    5     0,046 ±   0,001   us/op
ElementPerformanceTest.benchmarkIndentityMapSet                                       N/A         N/A            N/A   avgt    5     0,508 ±   0,033   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           1          false   avgt    5     0,064 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           1           true   avgt    5     0,062 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           5          false   avgt    5     0,117 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList           5           true   avgt    5     0,119 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          10          false   avgt    5     0,157 ±   0,024   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          10           true   avgt    5     0,165 ±   0,003   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          20          false   avgt    5     0,274 ±   0,012   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                 ArrayList          20           true   avgt    5     0,259 ±   0,004   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           1          false   avgt    5     0,063 ±   0,002   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           1           true   avgt    5     0,065 ±   0,017   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           5          false   avgt    5     0,111 ±   0,001   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList           5           true   avgt    5     0,130 ±   0,002   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          10          false   avgt    5     0,153 ±   0,006   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          10           true   avgt    5     0,163 ±   0,014   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          20          false   avgt    5     0,272 ±   0,005   us/op
ElementPerformanceTest.benchmarkMapStreamToList                                LinkedList          20           true   avgt    5     0,273 ±   0,016   us/op

Andrzej Wójcik (Tigase) commented 3 years ago
Additionally, I've gathered String intern table metrics with -XX:StringTableSize=1000003 -XX:+PrintStringTableStatistics:

SymbolTable statistics: Number of buckets : 32768 = 262144 bytes, each 8 Number of entries : 16885 = 270160 bytes, each 16 Number of literals : 16885 = 674664 bytes, avg 39.000 Total footprint : = 1206968 bytes Average bucket size : 0.515 Variance of bucket size : 0.511 Std. dev. of bucket size: 0.715 Maximum bucket size : 6 StringTable statistics: Number of buckets : 1048576 = 8388608 bytes, each 8 Number of entries : 4935 = 78960 bytes, each 16 Number of literals : 4935 = 318208 bytes, avg 64.000 Total footprint : = 8785776 bytes Average bucket size : 0.005 Variance of bucket size : 0.005 Std. dev. of bucket size: 0.069 Maximum bucket size : 2

which confirms that even with small no. of entries and increased bucket size by quite a lot there was no increase of performance of String.intern() method.

From that, it is clear that the comparison of Strings using == instead of equals() is around 35% faster. On the other hand, each call String.intern() takes around 0,6us (which is called on each creation of an element, or setting of xmlns. However, looking at around 325 operations of equals() per each us I wonder if it is ok to call String.intern() for each Element creation. So to overcome the overhead of String.intern() call, each String would need to be compared by == at least 195 times. (assuming that element was in a list with at least 5 elements and in a random position - quite possible).

I would assume that element name is compared 10-20 times on average in a single pass of a packet, but I may be wrong. We mostly check/compare element names of "outer" elements of the stanza (close to the root), while internal elements may newer be compared. We would need to measure this somehow... any ideas?

I'm not sure, if this call to String.intern() actually gives us a performance boost, or rather just "slightly" reduces memory usage due to common element names deduplication.
Andrzej Wójcik (Tigase) commented 3 years ago

I wonder if we shouldn't add -XX:StringTableSize=1000003 -XX:+PrintStringTableStatistics to our deployments to check actual usage of String.intern(), ie. to know how many strings were processed, know used memory and check if StringTableSize is big enough (too small will produce performance issues).
Wojciech Kapcia (Tigase) commented 3 years ago

Interesting observation regarding intern() performance. If indeed it usage is not that beneficial performance wise we could drop it in favour of something else (our own local "intern" table with common names?).

Regarding usage / comparison count - while we won't compare it normally Element could do it while performing lookup. I think if we want to measure it we could add basic statistics to Element (only temporarily for the testing purpose) and run... TTS-NG against server with such custom build of tigase-xmltools.

Adding StringTable statistics to our tigase.im/org deployment could also give us more information.
Andrzej Wójcik (Tigase) commented 3 years ago

I think that knowing how many times elements (and xmlns) were interned would be good to know, but I suppose there is no way to count how many times we are comparing element names as those are instance comparisons so there is no way to measure that, right? We could check executions of getChild(), etc.
Wojciech Kapcia (Tigase) commented 3 years ago

Yes, I was thinking about counting usage of .getChild() and such however you rise a valid point that comparing instances is usually how it's done. Nevertheless, we usually don't compare complete Element but rather it's features (name, xmlns) and in that case we do something along the lines of Element.getName() == "message" (interestingly, in tigase-server I found 70 usage of .getName() with instance comparison and ~110 usages of calls with .getName().equals())
Andrzej Wójcik (Tigase) commented 3 years ago
I've tested a direct comparison of performance between comparing String with String::equals(), instance comparison, instance comparison with call to intern(). Here are results:

ElementPerformanceTest2.benchmarkStringEquals N/A N/A thrpt 5 560,065 ± 60,731 ops/us ElementPerformanceTest2.benchmarkStringIdentity N/A N/A thrpt 5 11,180 ± 0,572 ops/us ElementPerformanceTest2.benchmarkStringIdentityInternalized N/A N/A thrpt 5 3496,496 ± 169,305 ops/us ElementPerformanceTest2.benchmarkStringIdentityInternalizedOverkill N/A N/A thrpt 5 11,197 ± 1,314 ops/us ElementPerformanceTest2.benchmarkStringIdentityOneInternalized N/A N/A thrpt 5 21,179 ± 2,375 ops/us

From that we can see, that comparing String with equals() is 7 times slower than instance comparison. On the other hand, it is 50 times faster than comparing 2 strings with an instance comparison if both of them need to be internalized and 27 times faster if at least one of the strings is static and the second one needs to be internalized.

I will try to make some measurements to estimate how many times we are calling String::intern() and how many times we are comparing internalized versions of String. That should give us an average ratio and will help us decide what would be better for us.

Alternatively, we could use the "mixed" mode, by internalizing in a global map all element names passed as parameters to findChild() methods and actually call String::intern() only on elements which names or XMLNSs are compared. That could reduce no. of calls to String::intern() for large stanzas (with a lot of internal elements which are never compared).

Another approach could be internalizing only "known" element names (such as message, presence, iq, query, body, etc.) but that would require manually writing switch with cases, so we would need to measure times for those as well.
Wojciech Kapcia (Tigase) commented 3 years ago

A point in discussion https://shipilev.net/jvm/anatomy-quarks/10-string-intern/

In almost every project we were taking care of, removing String.intern() from the hotpaths, or optionally replacing it with a handrolled deduplicator, was the very profitable performance optimization. Do not use String.intern() without thinking very hard about it, okay?

Andrzej Wójcik (Tigase) commented 3 years ago

As discussed, I've modified Element to support identity-based and nonidentity-based attributes holder, and here are results:

Benchmark                                                       (attributesClass)   Mode  Cnt     Score     Error   Units
ElementAttributesPerformanceTest.benchmarkAttributeGet                IdentityMap  thrpt    5    22,342 ±   0,420  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                    HashMap  thrpt    5   268,068 ±   2,820  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet               DedupHashMap  thrpt    5   276,086 ±   6,135  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity        IdentityMap  thrpt    5   905,443 ±  50,518  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity            HashMap  thrpt    5   259,715 ±  50,871  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity       DedupHashMap  thrpt    5   280,487 ±   6,015  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                IdentityMap  thrpt    5    22,347 ±   0,182  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                    HashMap  thrpt    5   149,649 ±  30,114  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut               DedupHashMap  thrpt    5   149,762 ±  11,182  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove             IdentityMap  thrpt    5    22,650 ±   0,922  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                 HashMap  thrpt    5  1030,129 ±  26,221  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove            DedupHashMap  thrpt    5  1004,137 ± 187,431  ops/us

It looks like put is 7 times slower when intern() is used. However, if we compare getStatic vs get then it is 3.4 times faster. But if we "forget" about using "static", then identity-based solution is over 10x slower. Please note, that put is used for each attribute (even those which are never asked for!). Similarly, remove is 45x slower when we are using identity-based solution.

Wojciech Kapcia (Tigase) commented 3 years ago

This seems to confirm that we should drop String.intern() use. Though, I think we should modify test cast and instead of using completely random attribute names also try with limited set of most popular attributes (this should especially show differences in DedupHashMap implementation).
Andrzej Wójcik (Tigase) commented 3 years ago

This change could improve the performance of DedupHashMap as it would use instance comparison instead of checking for equality when retrieving attribute value. I've created the current test this way to measure "worst-case" scenario.
Wojciech Kapcia (Tigase) commented 3 years ago

Yup, that what's I was thinking about. Though, even in worst-cast scenario it's performance is on par.

Andrzej Wójcik (Tigase) commented 3 years ago

Benchmark                                                       (attributeType)  (attributesClass)   Mode  Cnt     Score     Error   Units
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic        IdentityMap  thrpt    5    23,314 ±   1,812  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic            HashMap  thrpt    5   271,550 ±  11,128  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic       DedupHashMap  thrpt    5   266,812 ±  48,502  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static        IdentityMap  thrpt    5    41,831 ±   3,953  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static            HashMap  thrpt    5   351,414 ±  24,967  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static       DedupHashMap  thrpt    5   602,491 ±  24,917  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic        IdentityMap  thrpt    5   924,518 ±  19,176  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic            HashMap  thrpt    5   278,129 ±   7,343  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic       DedupHashMap  thrpt    5   279,252 ±   2,265  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static        IdentityMap  thrpt    5   927,816 ±  13,294  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static            HashMap  thrpt    5   361,070 ±   4,145  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static       DedupHashMap  thrpt    5   604,214 ±  66,511  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic        IdentityMap  thrpt    5    16,831 ±   0,423  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic            HashMap  thrpt    5   158,600 ±  11,464  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic       DedupHashMap  thrpt    5   155,214 ±   2,629  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static        IdentityMap  thrpt    5    35,416 ±   0,401  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static            HashMap  thrpt    5   130,465 ±   0,986  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static       DedupHashMap  thrpt    5   185,464 ±   3,369  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic        IdentityMap  thrpt    5    22,139 ±   2,529  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic            HashMap  thrpt    5  1058,439 ±  22,766  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic       DedupHashMap  thrpt    5  1057,848 ±  14,043  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static        IdentityMap  thrpt    5    40,887 ±   0,807  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static            HashMap  thrpt    5  1037,860 ± 168,149  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static       DedupHashMap  thrpt    5  1059,319 ±   2,549  ops/us

Here are results including case when "known" attribute name was used ("id") - "Static".

Andrzej Wójcik (Tigase) commented 3 years ago

I've added one more test.

Attributes holder class named previously DedupHashMap was doing deduplication using "static" switch-case is now named DedupStaticHashMap. The new version, using HashMap for deduplication (allowing us to modify the list of deduplicated attribute names in runtime) is now named DedupHashHashMap.

Here are results:

Benchmark                                                       (attributeType)   (attributesClass)   Mode  Cnt     Score     Error   Units
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic         IdentityMap  thrpt    5    15,457 ±   1,387  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic             HashMap  thrpt    5   271,397 ±   4,562  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic  DedupStaticHashMap  thrpt    5   275,557 ±   4,993  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                  Dynamic    DedupHashHashMap  thrpt    5   287,798 ±   5,461  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static         IdentityMap  thrpt    5    40,866 ±   2,709  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static             HashMap  thrpt    5   352,945 ±   4,543  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static  DedupStaticHashMap  thrpt    5   605,102 ±   5,953  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGet                   Static    DedupHashHashMap  thrpt    5   598,744 ±  23,502  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic         IdentityMap  thrpt    5   922,564 ±  10,873  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic             HashMap  thrpt    5   284,818 ±   5,797  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic  DedupStaticHashMap  thrpt    5   266,303 ±   8,952  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity          Dynamic    DedupHashHashMap  thrpt    5   275,538 ±   6,721  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static         IdentityMap  thrpt    5   913,211 ±  38,650  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static             HashMap  thrpt    5   354,059 ±   4,145  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static  DedupStaticHashMap  thrpt    5   603,541 ±   7,332  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeGetIdentity           Static    DedupHashHashMap  thrpt    5   595,235 ±  29,891  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic         IdentityMap  thrpt    5    17,004 ±   0,538  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic             HashMap  thrpt    5   156,010 ±  11,896  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic  DedupStaticHashMap  thrpt    5   154,720 ±   6,253  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                  Dynamic    DedupHashHashMap  thrpt    5   142,995 ±   7,413  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static         IdentityMap  thrpt    5    32,695 ±   3,046  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static             HashMap  thrpt    5   128,166 ±   8,409  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static  DedupStaticHashMap  thrpt    5   142,257 ±   2,664  ops/us
ElementAttributesPerformanceTest.benchmarkAttributePut                   Static    DedupHashHashMap  thrpt    5   162,970 ±   0,597  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic         IdentityMap  thrpt    5    22,533 ±   2,082  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic             HashMap  thrpt    5  1019,817 ± 175,476  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic  DedupStaticHashMap  thrpt    5  1049,247 ±  11,586  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove               Dynamic    DedupHashHashMap  thrpt    5  1042,285 ±  78,440  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static         IdentityMap  thrpt    5    39,838 ±   0,699  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static             HashMap  thrpt    5  1054,993 ±  12,879  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static  DedupStaticHashMap  thrpt    5  1018,364 ±  45,795  ops/us
ElementAttributesPerformanceTest.benchmarkAttributeRemove                Static    DedupHashHashMap  thrpt    5  1049,882 ±  25,535  ops/us

Wojciech Kapcia (Tigase) commented 3 years ago

Hmm, it doesn't seem all that faster in case of dynamic attributes and seems slower in static ones (which I think would be majority of cases)?

Andrzej Wójcik (Tigase) commented 3 years ago

Here are some more tests to measure if usage of deduplication of element name would speed up element lookup.

Benchmark                                                                   (clazz)  (maxChild)  (willDedup)   Mode  Cnt     Score    Error   Units
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList           1        false  thrpt    5   324,337 ±  5,819  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList           1         true  thrpt    5   276,836 ±  5,511  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList           5        false  thrpt    5   150,787 ± 17,099  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList           5         true  thrpt    5   166,530 ±  5,483  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList          10        false  thrpt    5   102,848 ±  1,524  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                    ArrayList          10         true  thrpt    5   125,910 ± 12,166  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList           1        false  thrpt    5   323,914 ±  3,066  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList           1         true  thrpt    5   262,859 ±  6,712  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList           5        false  thrpt    5    64,743 ± 15,393  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList           5         true  thrpt    5    66,343 ± 17,814  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList          10        false  thrpt    5    40,147 ±  0,712  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEquality                   LinkedList          10         true  thrpt    5    39,947 ±  7,333  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList           1        false  thrpt    5   430,021 ±  6,353  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList           1         true  thrpt    5   424,701 ± 25,306  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList           5        false  thrpt    5   141,278 ± 31,980  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList           5         true  thrpt    5   161,832 ±  1,610  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList          10        false  thrpt    5    98,278 ±  2,041  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized   ArrayList          10         true  thrpt    5   117,119 ±  1,040  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList           1        false  thrpt    5   382,869 ± 50,581  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList           1         true  thrpt    5   385,171 ±  6,204  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList           5        false  thrpt    5    63,648 ± 13,582  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList           5         true  thrpt    5    61,753 ±  0,838  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList          10        false  thrpt    5    38,915 ±  0,639  ops/us
ElementPerformanceTest.benchmarkFindChildDedupEqualityPossibleOptimized  LinkedList          10         true  thrpt    5    41,068 ±  0,739  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                         ArrayList           1          N/A  thrpt    5   430,022 ± 47,343  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                         ArrayList           5          N/A  thrpt    5   174,254 ± 15,473  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                         ArrayList          10          N/A  thrpt    5   115,625 ±  1,765  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                        LinkedList           1          N/A  thrpt    5   387,752 ±  8,182  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                        LinkedList           5          N/A  thrpt    5   163,705 ±  2,547  ops/us
ElementPerformanceTest.benchmarkFindChildEquality                        LinkedList          10          N/A  thrpt    5   104,628 ±  2,562  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                         ArrayList           1          N/A  thrpt    5  1136,214 ± 24,443  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                         ArrayList           5          N/A  thrpt    5   335,138 ± 11,112  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                         ArrayList          10          N/A  thrpt    5   217,671 ±  4,249  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                        LinkedList           1          N/A  thrpt    5  1038,479 ±  4,364  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                        LinkedList           5          N/A  thrpt    5   333,967 ±  4,592  ops/us
ElementPerformanceTest.benchmarkFindChildIdentity                        LinkedList          10          N/A  thrpt    5   184,304 ±  3,872  ops/us

To make it better visible I've created a chart

From here, it looks like usage of plain String.equals() is the second best solution for performance of string comparison after using String.intern(). Deduplicated version of comparison (even optimized one) is slower than direct usage of String.equals(). (Even if comparison using identity is used for lookup after using deduplication).

My guess is that this is caused by HashMap backing our deduplication mechanism, which forces Java to calculate hashCode() for each String passed as name. This method iterates over each char of the String (unless hashCode value was already cached for this instance of a String). Due to that, comparison has to iterate over each char:

for dedup:
- if string has deduplicated value - once (to find deduplicated version)
- if string has not deduplication value - twice (to look for deduplicated value and then during call to equals())
for optimized deduplication:
- if less than 5 children are in the collection - once (equality comparision)
- if string has deduplicated value - once (to find deduplicated version)
- if string has not deduplication value - twice (to look for deduplicated value and then during call to equals())
for equality - just once (call to equals())

To sum this up, each method which has deduplication will slow down as it forces Java to calculate hashCode which is expensive (at the same level as call to equality). The only improvements could be seen if instance of String would be reused and deduplication will happen, but that is rather "unlikely" and would force use to deduplicate all element names in ConcurrentHashMap (which would lock and slow down processing and names could not be released from there).

Andrzej Wójcik (Tigase) commented 3 years ago
To add some more performance data, here are the results of performance testing of String.intern() vs "deduplication".

Benchmark Mode Cnt Score Error Units StringPerformanceTest.stringDeduplication thrpt 5 724,057 ± 31,020 ops/us StringPerformanceTest.stringIntern thrpt 5 26,079 ± 0,125 ops/us StringPerformanceTest.stringNope thrpt 5 4076,612 ± 271,274 ops/us

"stringNope" does nothing - just accesses string instance.

From that, it is really visible that calls to String.intern() while improving memory usage may be slowing us down (as each attribute name, element name, xmlns) is "internalized" (for each element - not only those used for comparison).

It looks like, String.intern() is 27.8x times slower than "deduplication" which is only 5.6x times slower than using an original instance of a String.

In my opinion, we could keep "deduplication" just to reduce memory usage, but as deduplication will not improve performance (see the previous comment), we will sacrifice some of the performance gains to get lower memory usage.

Looking at those numbers, I wonder if we should not make "deduplication" optional. However, adding a simply "if" to decide if deduplication should be done or not, would slow down accessing string instance by ~35% (from my quick test), compared to using string without any conditions.
Andrzej Wójcik (Tigase) commented 3 years ago

Looking at the results above, it looks like a single call to String.intern() requires as much time as 16 calls of String.equal() between 2 strings... not to mention that most likely we have a lot of attributes interned now and we are not using them in the comparison anywere.
Andrzej Wójcik (Tigase) commented 3 years ago

@kobit Looking at the results above, we are considering dropping the usage of String.intern() and switching from comparing element names and attribute names using String.equals() instead of == of instances (at this is no longer an option with dropped usage of String.intern(). From our tests, it looks like we could improve the performance of parsing XML, and element processing should not be harmed as a single call to String.intern() is expensive (even when it is later not used for comparison).

Also, deduplication of strings (using HashMap with a configurable list of deduplicated names) may be used only for deduplication of strings during the creation of an element or setting an attribute as usage of it during comparisons and querying would slow us down.

Could you look at the comments since https://projects.tigase.net/issue/issue #14#focus=Comments-4-33480.0-0 up to this comment and let us know what you think about those changes (those changes would be part of Tigase XMPP Server 9.0.0)
Artur Hefczyc commented 3 years ago

Of course, if after tests the equals() gives better overall performance than intern() with ==, then yes, we should rework this code. Go ahead and do this.

I am only worried about all these places where == is used which could be missed after removing intern(). This may result in weird errors. Please be careful.

To be honest I have never tested a real performance of intern(). I just assumed a single intern() call would be about the same as a single equals().

For most small installations, performance is not a problem but memory usage is. Tigase from the start requires a lot of RAM, too much in my opinion. But I do not think this is the place to optimize for memory usage. Especially that even with zero traffic, Tigase still requires a lot of RAM.

So, to sum it up. Yes, please go ahead and remove usage of intern() + == and replace it with equals().
Wojciech Kapcia (Tigase) commented 3 years ago

I am only worried about all these places where == is used which could be missed after removing intern(). This may result in weird errors. Please be careful.

I think that TTS-NG/TTS will help here and would catch if we miss something.

For most small installations, performance is not a problem but memory usage is. Tigase from the start requires a lot of RAM, too much in my opinion. But I do not think this is the place to optimize for memory usage. Especially that even with zero traffic, Tigase still requires a lot of RAM. So, to sum it up. Yes, please go ahead and remove usage of intern() + == and replace it with equals().

Unfortunately JVM itself is responsible as well (especially on small installations) but it's changing (jlink for example, which with removal of dependencies/groovy) should help quite a lot with memory usage here.
Andrzej Wójcik (Tigase) commented 3 years ago

I am only worried about all these places where == is used which could be missed after removing intern(). This may result in weird errors. Please be careful.

As @wojtek mentioned, TTS-NG tests should help us find issues. Maybe we could try the lint plugin with some rules to proactively check that? On the other hand, I've suggested that we could add boolean matches(String name, String xmlns) method to Element API, so that we could call this instead of calling getName() == "" which could ensure that checks would be done properly.

For most small installations, performance is not a problem but memory usage is. Tigase from the start requires a lot of RAM, too much in my opinion. But I do not think this is the place to optimize for memory usage. Especially that even with zero traffic, Tigase still requires a lot of RAM.

If I recall, we already removed usage of DirectByteBuffer pools to reduce memory usage. I think that we could make a few more optimizations related to the memory, but in my opinion, the issue is also caused by the very high number of threads (each thread requires memory, and no. of threads depends on no. of CPUs) and task queues used by Tigase (each component has a task/packet queues (in & out), this no. depends on no. of components and no. of CPU cores).
Artur Hefczyc commented 3 years ago

As for the threads number. Yes, the number might be scaled down somewhat. Specifically for small installations.

This part was extensively tested by me years back and the number of threads was adjusted to provide maximum throughput for large/huge installations. In short, I got much higher overall throughput with higher number of threads. Of course this translates into higher resource usage. But in such cases it is justified.

For small installations we deal with a few messages per minute sometimes per second, rather than thousands per second. So, we do not need a very high throughput and we could scale threads down without impact on the end-user.
Wojciech Kapcia (Tigase) commented 3 years ago

Regarding threads and it's impact on memory usage - project loom (Lightweight Virtual Threads) is starting to take form so in the future the whole concept how to handle this could be though in rather different light.
Artur Hefczyc commented 3 years ago

Literally different light-weight. ;-)
Andrzej Wójcik (Tigase) commented 3 years ago

I've applied discussed changes to the Tigase XML Tools package. @wojtek please review if all discussed changes were applied.
Wojciech Kapcia (Tigase) commented 3 years ago

I think it includes everything we talked about. I made a couple of minor comments. One thing that stood out was lack of defaultXMLNS - I think it should be included in the end.

Other than that, we definitely need unit tests included as well.
Andrzej Wójcik (Tigase) commented 3 years ago

I think we should drop defXMLNS as it is just a boilerplate code not actually needed which actually makes people wonder what it is and why XMLNS is sometimes set and sometimes (when constructed manually) it is not set. We should remove this confusion.
Andrzej Wójcik (Tigase) commented 3 years ago

I've answered commends also with a few suggestions.
Wojciech Kapcia (Tigase) commented 3 years ago

I'm not sure what's the conclusion on default XMLNS is - as said before - I do see certain sense and logic in it (and it does seem follow XML specs). The only remaining changes are simplifications of constructors.

Andrzej Wójcik (Tigase) commented 3 years ago

As we discussed removing "constructors" as we have now builder-like API for adding attributes and children, I've run some test to see if there will be any performance penalty. Here are results

Benchmark                                                                          Mode  Cnt   Score   Error   Units
ElementPerformanceTest.benchmarkElementWithAttributesCreationBuilder              thrpt    5  21,436 ± 0,417  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationDynamicArray         thrpt    5  17,622 ± 4,194  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationDynamicMap           thrpt    5  11,659 ± 0,961  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationDynamicMapOfEntries  thrpt    5   9,028 ± 0,405  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationMixedStatic          thrpt    5  19,578 ± 0,294  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationStaticArray          thrpt    5  19,272 ± 0,097  ops/us
ElementPerformanceTest.benchmarkElementWithAttributesCreationStaticMap            thrpt    5  19,966 ± 0,611  ops/us

I would expect "Builder" to be slower than "StaticArray" and I would assume it is slower or at least has similar performance as I've had "builder" slower during a few runs when I was running less tests at once.

For sure, we should not use methods accepting Map for settings attributes unless we already have an instance of a map.

Taking those results into account, I would say we should have following constructors:

Element(String name)
Element(String name, String xmlns)
Element(Element src)

I do not see a point in having other constructors as other cases can easily be taken care of by using builder pattern. The only issue may be during "transition" as we had a following constructor

Element(String name, String cdata)

which could be easily mistaken with the second one which I've proposed. Due to that I could lean on using just 2 constructors:

Element(String name)
Element(Element src)

The second one is responsible for creation of a shallow copy.

Andrzej Wójcik (Tigase) commented 3 years ago

I can restore this defXMLNS, but I'm still not convinced that we need this and as I've pointed out, current implementation has flaws and I do not know how to fix them without dropping support for defXMLNS.
Wojciech Kapcia (Tigase) commented 3 years ago

+1 for 2 constructors (with name and Element) -- seems enough and with "withers" there is no need for others.

As for default XMLNS - I vaguely remember that there were weird issues with missing xmlns (especially for jabber:client or jabber:server and I think this default could be useful here). Though, during 9.0 development we could drop it and test if anything breaks.
Andrzej Wójcik (Tigase) commented 3 years ago

I would say, that let's drop defXMLNS and if needed we will restore it as we are just at the begining of changes for 9.0.0 and we have time.

I'll modify the constructors list tomorrow.
Andrzej Wójcik (Tigase) commented 3 years ago

@wojtek Could you review the changes once again? I've dropped unnecessary constructors and getChild() method (which should be replaced by now with findChild()). Additionally, I've added JavaDoc for Element methods.

The last remaining thing will be adding test cases.
Andrzej Wójcik (Tigase) commented 3 years ago

I've added test cases and implemented a builder pattern for ElementMatcher to make it easier to use.
Wojciech Kapcia (Tigase) commented 3 years ago
It looks good IMHO.

A couple of further comments:

do we need addAttribute*() methods when they simply call setAttribute*(...) internally?

regarding setAttributes(@NotNull String[] names, @NotNull String[] values) - we already have one with Map and you can use Map.of()

I think we can utilize Objects.requireNonNull() - it does the same thing but would simplify a bit the code

do we need comments in toString*() methods regarding nulls? with FIXME? It seems it's a non-issue now?

Should we leave FIXME comments in setXMLNS()? seems a good approach with removeAttribute/setAttribute

Shouldn't we use Deque<Element> in DomBuilderHandler?

In Path in // FIXME: I'm not sure about this.. maybe assert would be a better option.. - If we annote parameter with NotNull then I think it would be sensible to use Objects.requireNonNull(element); instead of returning empty collection.

Pushed some of the changes to the branch.
Andrzej Wójcik (Tigase) commented 3 years ago

do we need addAttribute*() methods when they simply call setAttribute*(...) internally?

I would prefer to use addAttribute() and removeAttribute() as we do not accept null as a value, so set will not remove an attribute if the value would be set to null. (as I would expect).

regarding setAttributes(@NotNull String[] names, @NotNull String[] values) - we already have one with Map and you can use Map.of()

See performance comparison of using Map.of() -> ElementPerformanceTest.benchmarkElementWithAttributesCreationDynamicMap vs benchmarkElementWithAttributesCreationDynamicArray. We should move to use (add/set)Attribute() only.

I think we can utilize Objects.requireNonNull() - it does the same thing but would simplify a bit the code

+1

do we need comments in toString*() methods regarding nulls? with FIXME? It seems it's a non-issue now?

The checks for null were there, but I think there is no need for it as other methods guard against adding null. Due to that I've commented them out and left FIXME to verify that.

Should we leave FIXME comments in setXMLNS()? seems a good approach with removeAttribute/setAttribute

I've left them there to make it easy to find and verify that this is what we want. Generally, most of FIXME comments are to review the code there and then remove them.

Shouldn't we use Deque in DomBuilderHandler?

Maybe.. I would even consider using ArrayDeque.. would need to test performance.

In Path in // FIXME: I'm not sure about this.. maybe assert would be a better option.. - If we annote parameter with NotNull then I think it would be sensible to use Objects.requireNonNull(element); instead of returning empty collection.

That was my assumption as well, that is why FIXME was there to ensure this will be discussed and reviewed.
Wojciech Kapcia (Tigase) commented 3 years ago

I would prefer to use addAttribute() and removeAttribute() as we do not accept null as a value, so set will not remove an attribute if the value would be set to null. (as I would expect).

But semantics for add and set in case of null are somewhat (or can be) somewhat similar - we can simply reject null's here. I'm pondering the case where you would call add twice with the same attribute name - you wouldn't add second one but simply set value of (whether exists or not).

See performance comparison of using Map.of() -> ElementPerformanceTest.benchmarkElementWithAttributesCreationDynamicMap vs benchmarkElementWithAttributesCreationDynamicArray. We should move to use (add/set)Attribute() only.

Good point, though in case of benchmarkElementWithAttributesCreationDynamicArray there is ~4k error margin (somewhat weird)

The checks for null were there, but I think there is no need for it as other methods guard against adding null. Due to that I've commented them out and left FIXME to verify that.

Understood, +1 for removing null checks if not needed.

Maybe.. I would even consider using ArrayDeque.. would need to test performance.

+1

That was my assumption as well, that is why FIXME was there to ensure this will be discussed and reviewed.

+1
Andrzej Wójcik (Tigase) commented 3 years ago
I've applied discussed changes and:

renamed children collection to nodes to reflect its actual content

moved filtering of Elements from nodes collection to a single place private void forEachChild(Consumer<Element>) method and using it in getChildren() and findChildren() methods
Wojciech Kapcia (Tigase) added to iteration "tigase-server-9.0.0" 11 months ago
Wojciech Kapcia (Tigase) added to iteration "tigase-server-8.5.0" 11 months ago
Wojciech Kapcia (Tigase) commented 11 months ago

It seems this was never merged (but got status "closed" and got sidetracked) after migration…

For now I think we should mark APIs that will be removed as deprecated.

(tentatively worth investigating: org.apache.commons.collections4.list.TreeList - could be faster but probably would use more memory than ArrayList, though probably still less than LinkedList; there is also brownies-collections with performance comparison and for small collections )
Wojciech Kapcia (Tigase) changed fields 11 months ago
Name Previous Value Current Value
Assignee

andrzej.wojcik

andrzej.wojcik, wojtek
Wojciech Kapcia (Tigase) changed state to 'In Progress' 3 months ago
Previous Value Current Value
Closed

In Progress
Wojciech Kapcia (Tigase) commented 3 months ago

Possibly relevant: https://inside.java/2025/05/01/strings-just-got-faster/

(moving it to "in progress" as it's still to be done)
Login to comment

Name	Previous Value	Current Value
Assignee	andrzej.wojcik	andrzej.wojcik, wojtek

Previous Value	Current Value
Closed	In Progress

Type	Task
Priority	Normal
Assignee	Andrzej Wójcik (Tigase)
Version	tigase-server-9.0.0

Iterations

tigase-server-8.5.0 Open
tigase-server-9.0.0 Open

Issue Votes (0)

Watchers (2)

Reference

tigase/_server/tigase-xmltools#14

NOTES: