Projects tigase _server server-core Issues #464
SEVERE: Uncaught thread: "cluster-nodes" exception (#464)
Artur Hefczyc opened 10 years ago
Due Date
2015-05-12

bq. We saw a few instances of the following severe exception:

2015-04-30 07:00:55.738 [cluster-nodes] ThreadExceptionHandler.uncaughtException() SEVERE: Uncaught thread: "cluster-nodes" exception java.lang.NullPointerException
at tigase.cluster.repo.ClConSQLRepository.storeItem(ClConSQLRepository.java:204)
at tigase.cluster.repo.ClConConfigRepository.reload(ClConConfigRepository.java:124)
at tigase.cluster.repo.ClConSQLRepository.reload(ClConSQLRepository.java:235)
at tigase.db.comp.ConfigRepository$1.run(ConfigRepository.java:78)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

bq. Can you please advise the root cause of this null pointer exception? We also observed the cluster-nodes thread will be terminated and the cluster_nodes table are no longer updated by the node.

Andrzej, please fix this as soon as possible. This looks like a serious issue which may impact the whole cluster reliability. Once this is done, please reassign it back to me as we need to release a new bug fix version with this.

Andrzej Wójcik (Tigase) commented 10 years ago

This issue is caused by data_repo field being @null@. This can happen only if there was issue with creation of repository and propably with some issue during establishing connection to database server during server instance startup.

I think that there is no way to recover from this properly and best would be to find root cause of this issue, ie. by searching logs for entries containing:

and then solving root cause of this issue.
Artur Hefczyc commented 10 years ago

Thank you for looking at this. In such case it is not such a big problem. However, I think, if there is no way Tigase can handle or recover from this properly and if this happens at startup time, then the best action would be to print a correct log message and stop the server.

Unknown commented 10 years ago

This issue happened during server runtime, not during server instance startup. The data_repo was not null, but the statement was null and the code synchronized(stmt) thrown the NullPointerException.

I would suggest to catch exception in the method ConfigRepository.setAutoloadTimer() to prevent the timer thread being terminated when unexpected exception happens.

autoLoadTimer.schedule(new TimerTask()

{

		@Override

		public void run() {

                          try {

			reload();

                          } catch (Exception e) {

                            //print some log messages here.

                          }

		}

}, interval, interval);

Andrzej Wójcik (Tigase) commented 10 years ago

I checked repository for version in which I could see that in file ClConSQLRepository.java in line 204 there is synchronized(stmt) but I could not find exact version, but assuming this might be issue and try {} catch {} in run() method of TimerTask will help protect timer from being killed in case of other unexpected exceptions I added this suggested try{} catch {}.

This change was added to master and stable branch.

Artur Hefczyc commented 10 years ago

Thank you for providing additional details and helping us with fixing the problem.

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
3058
Version
tigase-server-7.0.2
Spent time
3h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#464
Please wait...
Page is in error, reload to recover