Projects tigase _server server-core Issues #675
Database Connectivity (#675)
Open
Eric Dziewa opened 9 years ago

It seems that Tigase never closes the socket when the db connection is cut. Then it doesn't start enough connections to recover.

Start Tigase

java    15559 tigase   79u  IPv6 143063619      0t0  TCP hw1.xmpp-test.net:15502->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   80u  IPv6 143063623      0t0  TCP hw1.xmpp-test.net:15503->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   81u  IPv6 143063626      0t0  TCP hw1.xmpp-test.net:15504->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   82u  IPv6 143063630      0t0  TCP hw1.xmpp-test.net:15505->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   83u  IPv6 143057240      0t0  TCP hw1.xmpp-test.net:15506->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   84u  IPv6 143063637      0t0  TCP hw1.xmpp-test.net:15507->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   85u  IPv6 143063662      0t0  TCP hw1.xmpp-test.net:15508->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   86u  IPv6 143062977      0t0  TCP hw1.xmpp-test.net:15509->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   87u  IPv6 143063668      0t0  TCP hw1.xmpp-test.net:15510->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   88u  IPv6 143063672      0t0  TCP hw1.xmpp-test.net:15511->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase  237u  IPv6 143061738      0t0  TCP hw1.xmpp-test.net:15500->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase  238u  IPv6 143063614      0t0  TCP hw1.xmpp-test.net:15501->hw2.xmpp-test.net:3306 (ESTABLISHED)

Restart MySQL

java    15559 tigase   65u  IPv6 143063012      0t0  TCP hw1.xmpp-test.net:15518->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   79u  IPv6 143063619      0t0  TCP hw1.xmpp-test.net:15502->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   80u  IPv6 143063623      0t0  TCP hw1.xmpp-test.net:15503->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   81u  IPv6 143063626      0t0  TCP hw1.xmpp-test.net:15504->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   82u  IPv6 143063630      0t0  TCP hw1.xmpp-test.net:15505->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   83u  IPv6 143057240      0t0  TCP hw1.xmpp-test.net:15506->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   84u  IPv6 143063637      0t0  TCP hw1.xmpp-test.net:15507->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   85u  IPv6 143063662      0t0  TCP hw1.xmpp-test.net:15508->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   86u  IPv6 143062977      0t0  TCP hw1.xmpp-test.net:15509->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   87u  IPv6 143063668      0t0  TCP hw1.xmpp-test.net:15510->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   88u  IPv6 143063672      0t0  TCP hw1.xmpp-test.net:15511->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase  238u  IPv6 143063614      0t0  TCP hw1.xmpp-test.net:15501->hw2.xmpp-test.net:3306 (CLOSE_WAIT)

Start Load

java    15559 tigase   65u  IPv6 143063012      0t0  TCP hw1.xmpp-test.net:15518->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   79u  IPv6 143063619      0t0  TCP hw1.xmpp-test.net:15502->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   80u  IPv6 143063623      0t0  TCP hw1.xmpp-test.net:15503->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   81u  IPv6 143063626      0t0  TCP hw1.xmpp-test.net:15504->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   82u  IPv6 143063630      0t0  TCP hw1.xmpp-test.net:15505->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   83u  IPv6 143057240      0t0  TCP hw1.xmpp-test.net:15506->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   84u  IPv6 143063637      0t0  TCP hw1.xmpp-test.net:15507->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   85u  IPv6 143063662      0t0  TCP hw1.xmpp-test.net:15508->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   86u  IPv6 143062977      0t0  TCP hw1.xmpp-test.net:15509->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   87u  IPv6 143063668      0t0  TCP hw1.xmpp-test.net:15510->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   88u  IPv6 143063672      0t0  TCP hw1.xmpp-test.net:15511->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase  936u  IPv6 143067172      0t0  TCP hw1.xmpp-test.net:15526->hw2.xmpp-test.net:3306 (ESTABLISHED)

CLOSE_WAIT means your program is still running, and hasn't closed the socket (and the kernel is waiting for it to do so).

There are now 10,000 connections on port 5222. I try to login to tigase but can't Connection reset by peer, Server closed the connection.

java    15559 tigase   65u  IPv6 143063012      0t0  TCP hw1.xmpp-test.net:15518->hw2.xmpp-test.net:3306 (ESTABLISHED)
java    15559 tigase   79u  IPv6 143063619      0t0  TCP hw1.xmpp-test.net:15502->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   80u  IPv6 143063623      0t0  TCP hw1.xmpp-test.net:15503->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   81u  IPv6 143063626      0t0  TCP hw1.xmpp-test.net:15504->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   82u  IPv6 143063630      0t0  TCP hw1.xmpp-test.net:15505->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   83u  IPv6 143057240      0t0  TCP hw1.xmpp-test.net:15506->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   84u  IPv6 143063637      0t0  TCP hw1.xmpp-test.net:15507->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   85u  IPv6 143063662      0t0  TCP hw1.xmpp-test.net:15508->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   86u  IPv6 143062977      0t0  TCP hw1.xmpp-test.net:15509->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   87u  IPv6 143063668      0t0  TCP hw1.xmpp-test.net:15510->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase   88u  IPv6 143063672      0t0  TCP hw1.xmpp-test.net:15511->hw2.xmpp-test.net:3306 (CLOSE_WAIT)
java    15559 tigase  936u  IPv6 143067172      0t0  TCP hw1.xmpp-test.net:15526->hw2.xmpp-test.net:3306 (ESTABLISHED)

The CLOSE_WAIT state never goes away.

Eric Dziewa commented 9 years ago

Just tried the same thing using SQLServer. The CLOSE_WAIT did not happen but Tigase only started 2 connections to the DB again.

Start Tigase

java    28110 tigase   80u  IPv6 143328377      0t0  TCP hw1.xmpp-test.net:10428->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   81u  IPv6 143325497      0t0  TCP hw1.xmpp-test.net:10429->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   82u  IPv6 143325498      0t0  TCP hw1.xmpp-test.net:10430->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   83u  IPv6 143325499      0t0  TCP hw1.xmpp-test.net:10431->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   84u  IPv6 143325500      0t0  TCP hw1.xmpp-test.net:10432->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   85u  IPv6 143325501      0t0  TCP hw1.xmpp-test.net:10433->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   86u  IPv6 143325502      0t0  TCP hw1.xmpp-test.net:10434->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   87u  IPv6 143325503      0t0  TCP hw1.xmpp-test.net:10435->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   88u  IPv6 143325504      0t0  TCP hw1.xmpp-test.net:10436->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase   89u  IPv6 143325505      0t0  TCP hw1.xmpp-test.net:10437->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase  238u  IPv6 143319354      0t0  TCP hw1.xmpp-test.net:10426->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase  240u  IPv6 143325494      0t0  TCP hw1.xmpp-test.net:10427->sqlserverhost:1433 (ESTABLISHED)

Restart SQLServer

java    28110 tigase   66u  IPv6 143325753      0t0  TCP hw1.xmpp-test.net:10450->sqlserverhost:1433 (ESTABLISHED)

Start Load

java    28110 tigase   66u  IPv6 143325753      0t0  TCP hw1.xmpp-test.net:10450->sqlserverhost:1433 (ESTABLISHED)
java    28110 tigase  240u  IPv6 143326422      0t0  TCP hw1.xmpp-test.net:10459->sqlserverhost:1433 (ESTABLISHED)
Andrzej Wójcik (Tigase) commented 9 years ago

Tigase reconnects only connections which tries to use, so if 10 connections are broken, but needs only 2 for current operation then only this 2 will reconnect at this point. Next will reconnect when needed.

Eric Dziewa commented 9 years ago

I logged in 75,000 users to the server within 5 minutes. I don't think 2 connections are enough for that.

Here is using Postgres.

Start Tigase

java    31891 tigase   65u  IPv6 44714091      0t0  TCP localhost:50363->localhost:5432 (ESTABLISHED)
java    31891 tigase   66u  IPv6 44714093      0t0  TCP localhost:50364->localhost:5432 (ESTABLISHED)
java    31891 tigase   67u  IPv6 44712401      0t0  TCP localhost:50365->localhost:5432 (ESTABLISHED)
java    31891 tigase   68u  IPv6 44712405      0t0  TCP localhost:50366->localhost:5432 (ESTABLISHED)
java    31891 tigase   69u  IPv6 44712407      0t0  TCP localhost:50367->localhost:5432 (ESTABLISHED)
java    31891 tigase   70u  IPv6 44712409      0t0  TCP localhost:50368->localhost:5432 (ESTABLISHED)
java    31891 tigase   71u  IPv6 44712411      0t0  TCP localhost:50369->localhost:5432 (ESTABLISHED)
java    31891 tigase   72u  IPv6 44714108      0t0  TCP localhost:50370->localhost:5432 (ESTABLISHED)
java    31891 tigase   73u  IPv6 44714110      0t0  TCP localhost:50371->localhost:5432 (ESTABLISHED)
java    31891 tigase   74u  IPv6 44712419      0t0  TCP localhost:50372->localhost:5432 (ESTABLISHED)
java    31891 tigase   83u  IPv6 44714116      0t0  TCP localhost:50373->localhost:5432 (ESTABLISHED)
java    31891 tigase   84u  IPv6 44714118      0t0  TCP localhost:50374->localhost:5432 (ESTABLISHED)
java    31891 tigase   85u  IPv6 44714120      0t0  TCP localhost:50375->localhost:5432 (ESTABLISHED)
java    31891 tigase   86u  IPv6 44712432      0t0  TCP localhost:50376->localhost:5432 (ESTABLISHED)
java    31891 tigase   87u  IPv6 44714123      0t0  TCP localhost:50377->localhost:5432 (ESTABLISHED)
java    31891 tigase   88u  IPv6 44712437      0t0  TCP localhost:50378->localhost:5432 (ESTABLISHED)
java    31891 tigase   89u  IPv6 44714126      0t0  TCP localhost:50379->localhost:5432 (ESTABLISHED)
java    31891 tigase   90u  IPv6 44714128      0t0  TCP localhost:50380->localhost:5432 (ESTABLISHED)
java    31891 tigase   91u  IPv6 44714130      0t0  TCP localhost:50381->localhost:5432 (ESTABLISHED)
java    31891 tigase   92u  IPv6 44714132      0t0  TCP localhost:50382->localhost:5432 (ESTABLISHED)

Restart Postgres

java    31891 tigase   66u  IPv6 44714093      0t0  TCP localhost:50364->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   67u  IPv6 44712401      0t0  TCP localhost:50365->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   68u  IPv6 44712405      0t0  TCP localhost:50366->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   69u  IPv6 44712407      0t0  TCP localhost:50367->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   70u  IPv6 44712409      0t0  TCP localhost:50368->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   71u  IPv6 44712411      0t0  TCP localhost:50369->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   72u  IPv6 44714108      0t0  TCP localhost:50370->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   74u  IPv6 44712419      0t0  TCP localhost:50372->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   83u  IPv6 44714116      0t0  TCP localhost:50373->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   84u  IPv6 44714118      0t0  TCP localhost:50374->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   85u  IPv6 44714120      0t0  TCP localhost:50375->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   86u  IPv6 44712432      0t0  TCP localhost:50376->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   87u  IPv6 44714123      0t0  TCP localhost:50377->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   88u  IPv6 44712437      0t0  TCP localhost:50378->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   89u  IPv6 44714126      0t0  TCP localhost:50379->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   90u  IPv6 44714128      0t0  TCP localhost:50380->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   91u  IPv6 44714130      0t0  TCP localhost:50381->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   92u  IPv6 44714132      0t0  TCP localhost:50382->localhost:5432 (CLOSE_WAIT)
java    31891 tigase   99u  IPv6 44712838      0t0  TCP localhost:50402->localhost:5432 (ESTABLISHED)
java    31891 tigase  100u  IPv6 44714506      0t0  TCP localhost:50403->localhost:5432 (ESTABLISHED)

Start load and nothing changes.

Eric Dziewa commented 9 years ago

Correction 90,000 users attempted. I think about 20,000 were able to login.

Artur Hefczyc commented 9 years ago

The CLOSE_WAIT issue might be related to the DB driver we use. So it can be different for different DBs and even different between driver versions. I guess, for some drivers the CLOSE_WAIT may be cleared out only after GC run.

In any case we have to investigate it.

If the Tigase reopens only 2 DB connections, even under a high load it may mean that DB code does not make a proper use of concurrency. Or maybe DB response is fast enough and the load is not high enough.

Please investigate further.

issue 1 of 1
Type
Bug
Priority
Blocker
Assignee
RedmineID
4145
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#675
Please wait...
Page is in error, reload to recover