Projects tigase _server server-core Issues #209
Tigase Memory Leak: SSLSessionImpl Object is not getting Garbage collected (#209)
Closed
Gaurav Arora opened 1 decade ago
Due Date
2015-01-30

Hi artur,

Recently we were investigating the memory usage of tigase and we found SSLSessionImpl was not getting cleared immediately after the client is disconnected.Please find the detailed analysis below.

Regards,

Gaurav

Analysis

problem description

The suspected memory is leak is caused because the objects of the class com.sun.net.ssl.internal.ssl.SSLSessionImpl is not garbage collected.

When a printer establishes XMPP connection to Tigase, one instance of tigase.io.TLSWrapper class is created for handling SSL. During the instantiation of each TLSWrapper object, one instance of com.sun.net.ssl.internal.ssl.SSLSessionImpl object is created. When a connection disconnects, these objecs are freed up and become candidate for GC.

The analysis of heap dump using the MAT tool showed that the number of objects of SSLSessionImpl class is much much higher than the number of objetcs of TLSWrapper class. In the dump taken on May 17th, there are 64k objects of TLSWrapper class but 338K objects of SSLSessionImpl class. In the dump taken on May 18th, corresponding numbers are 100k and 1500k. The histogram below shows the number of objects in the dump taken on May 17th.

After analysis we found that tigase is not clearing SSL Session while stopping the TLSWrapper, clearning or invalidating session will remove the session from the cache maintained internally in jvm.

Impact

All these SSL session were not getting cleared and accumulated in heap and only cleared when the Memory reached above 90% of heap size. Which in-turn impacting performance of tigase too.

Solution Description

Overview of Management of SSLSessionImpl Object (which is leaking)

SSLSession is stored in a Cache with a timeout and timeout is checked when the session is accessed.If timeout have already occurred then it will invalidate the session.Default timeout period is 24 hours.Refer jvm code CacheEntry.java in package sun.misc.CacheEntry

Reason for Leak:

Since number of connect and disconnect are very high on server.Session is not accessed after the timeout and they will never be cleared from the Cache table. Moreover these Cache Entry is SoftReference, therefore it won't be cleared till server is at verge of throwing Out of memory exception.

Hence as a result number of SSLSession object instance will increase in the server leading to Memory Leak.

Solution Approach:

When the disconnect is received for the XMPP Connection from ConnectionManager . We can invalidate the session which will internally remove the session from the Cache Table underneath in jvm.

In tigase session is stored in the TLSWrapper which have tlsEngine which maintain session etc.

when we are stopping that TLSWrapper and tlsEngine. we get the session from the tlsEngine and invalidate the session.Which will remove it from the Table.Following lines were added into the TLSWrapper's stop function.

 tlsEngine.getSession().invalidate();

Summary of Fix:

SSLSessionImpl object were getting accumulated in the Memory as we were not invalidating session and removing it from the Cache which was resulting in the MemoryLeak.

Now while ConnectionManager is cleaning up the connection and other object. we have added code to invalidate the session in TLSWrapper .

Artur Hefczyc commented 1 decade ago

To the Tigase team:

This code area went through some changes recently. And I think almost everybody touched it somehow. Therefore, please read through this and let Bartosz know if you have any ideas.

Bartosz:

You are our expert at this stuff, therefore You are the one to resolve this. Please, have a look at the code, problem and proposed resolution. Make modifications as necessary. It looks serious enough to also back port the fix to 5.1.x line. Please apply the fix to both branches - 5.2.x and 5.1.x.

Gaurav Arora:

Thank you for the report and a very detailed and extensive diagnosis and resolution. This is a huge help to us. Please let us know if you need a patch for a version of the Tigase you use. However I doubt you need it as you apparently solved the problem yourself.

Artur Hefczyc commented 10 years ago

Bartosz, what is status of this? Please plan for completing the task and set due date.

issue 1 of 1
Type
Patch
Priority
Major
Assignee
RedmineID
1395
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#209
Please wait...
Page is in error, reload to recover