Subir Jolly opened 10 years ago
|
|
Could you check Tigase Statistics from the session manager component (either via XMPP or JMX)? Check:
|
|
@sess-man/Average processing time on last 100 runs [ms]:--0 sess-man/Processor: session-open:--, Queue: 0, AvTime: 1, Runs: 467704, Lost: 0 sess-man/Processor: http://jabber.org/protocol/stats:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: http://jabber.org/protocol/commands:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: jabber:iq:version:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: jabber:iq:roster:--, Queue: 0, AvTime: 4, Runs: 434876, Lost: 0 sess-man/Processor: starttls:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: presence:--, Queue: 0, AvTime: 2, Runs: 2232561, Lost: 0 sess-man/Processor: default-handler:--, Queue: 0, AvTime: 1, Runs: 30721573, Lost: 0 sess-man/Processor: urn:ietf:params:xml:ns:xmpp-sasl:--, Queue: 10949, AvTime: 16775, Runs: 575521, Lost: 0 sess-man/Processor: urn:xmpp:ping:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: urn:ietf:params:xml:ns:xmpp-session:--, Queue: 0, AvTime: 1, Runs: 407769, Lost: 0 sess-man/Processor: session-close:--, Queue: 0, AvTime: 1, Runs: 460723, Lost: 0 sess-man/Processor: amp:--, Queue: 0, AvTime: 1, Runs: 2272762, Lost: 0 sess-man/Processor: disco:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: zlib:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: jabber:iq:privacy:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: urn:ietf:params:xml:ns:xmpp-bind:--, Queue: 0, AvTime: 1, Runs: 408281, Lost: 0 sess-man/Processor: message-carbons:--, Queue: 0, AvTime: 1, Runs: 280698, Lost: 0 sess-man/Processor: jabber:iq:private:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 sess-man/Processor: jabber:iq:auth:--, Queue: 0, AvTime: 0, Runs: 0, Lost: 0 total/Total queues wait:--10949@ "sess-man/Processor: urn:ietf:params:xml:ns:xmpp-sasl" Seems to be the one with queues. I've attached the entire dump of JMX statistics. |
|
Subir Jolly wrote:
Sam Wright wrote:
SASL plugin uses AuthRepository to perform authentication. What changes have you made exactly? Can you verify, that your implementation of AuthRepository/processing doesn't cause the slowdown in authentication that impacts SASL plugin? |
|
Not sure if it matters but in our implementation of AuthRepository, we use "synchronized" keyword for auths. Do you think that making auth synchronous could cause this issue? |
|
This is very likely. Unresponsive means the server got stuck on something, usually threads-lock on some resource. This maybe database access. This may often happen if you use synchronized. |
|
Thanks, I am now removing the synchronous login. I would also like to know the exact meaning of these stats. Would it be possible to give me some brief description for each of these? sess-man/Authentication timouts sess-man/Closed user connections sess-man/Open user connections sess-man/Total user connections sess-man/Open user sessions sess-man/Total user sessions Thanks, Subir |
|
Subir Jolly wrote:
Number of user logins failed due to authentication timeout. That is user opened a connection but not completed authentication within required time.
Total number of user's connection closed. During normal operation of the server users connect and disconnect all the time. So this number indicates how many connections were closed since the server startup time.
A number of currently opened user connections. Please note a user may have more than one connection opened, for example, one from mobile device, another from desktop client, another from a web client. These are counted here.
A total number of user connections opened since the server startup time. Some of these connections can be already closed.
A number of user currently opened users' sessions. If a user is connected/logged in from a multiple devices it still counts as a single user session. In other words this is a number of distinct users online at the moment.
A total number of users sessions opened since the server startup time. In other words number of distinct user sessions which were opened and some of them closed since the server startup time. If a user has many connections active at the same time, it still counts as a single session. However, if a user opened multiple connections, closed all of them and then opened connections again, this counts as 2 separate sessions. |
|
Thank you so much for all this information. It cleared a lot of things for us. I have made the changes to auth. After that we tested it in our dev environment and it seemed to be holding up fine. We have deployed it to live as well and waiting for seeing the behavior now. Thanks again for all your help. I really appreciate your support and efforts. Subir |
Type |
Bug
|
Priority |
Major
|
Assignee | |
RedmineID |
2624
|
At random times, Tigase has started becoming unresponsive and all the requests start queuing up. We have a clustered setup of 4 servers. When Tigase starts becoming unresponsive, the queues for Session Manager start rising and in order for lowering down the queues, we have to get our tigase servers out of rotation. The other way is to restart Tigase servers. We have just a little over than 10K users on this system. We have overridden Auth component to do custom auth against our system.
We have noticed that when Tigase starts becoming unresponsive and Session Manager queues start growing, there aren't any DB reads. Not sure if this is okay.
What do you suggest we do in this scenario?
Server configurations
Linux CentOS
80 GB RAM
24 Core Processor
Packages
java-1.7.0-oracle-1.7.0.55
mysql-connector-java-5.1.12
tigase-extras-1.1.0
tigase-issue #5.2.1
tigase-xmltools-3.4.4
tigase-utils-3.4.2
1-20-15-stats.txt