SEVERE: Uncaught thread: "cluster-nodes" exception (#464)

Artur Hefczyc opened 1 decade ago

Due Date
2015-05-12

bq. We saw a few instances of the following severe exception:

2015-04-30 07:00:55.738 [cluster-nodes] ThreadExceptionHandler.uncaughtException() SEVERE: Uncaught thread: "cluster-nodes" exception java.lang.NullPointerException
at tigase.cluster.repo.ClConSQLRepository.storeItem(ClConSQLRepository.java:204)
at tigase.cluster.repo.ClConConfigRepository.reload(ClConConfigRepository.java:124)
at tigase.cluster.repo.ClConSQLRepository.reload(ClConSQLRepository.java:235)
at tigase.db.comp.ConfigRepository$1.run(ConfigRepository.java:78)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

bq. Can you please advise the root cause of this null pointer exception? We also observed the cluster-nodes thread will be terminated and the cluster_nodes table are no longer updated by the node.

Andrzej, please fix this as soon as possible. This looks like a serious issue which may impact the whole cluster reliability. Once this is done, please reassign it back to me as we need to release a new bug fix version with this.

Related
- Version 7.0.2 release (#466) Closed

Activities

Andrzej Wójcik (Tigase) commented 1 decade ago
This issue is caused by data_repo field being @null@. This can happen only if there was issue with creation of repository and propably with some issue during establishing connection to database server during server instance startup.

I think that there is no way to recover from this properly and best would be to find root cause of this issue, ie. by searching logs for entries containing:

and then solving root cause of this issue.
Artur Hefczyc commented 1 decade ago

Thank you for looking at this. In such case it is not such a big problem. However, I think, if there is no way Tigase can handle or recover from this properly and if this happens at startup time, then the best action would be to print a correct log message and stop the server.
Unknown commented 1 decade ago
This issue happened during server runtime, not during server instance startup. The data_repo was not null, but the statement was null and the code synchronized(stmt) thrown the NullPointerException.

I would suggest to catch exception in the method ConfigRepository.setAutoloadTimer() to prevent the timer thread being terminated when unexpected exception happens.

autoLoadTimer.schedule(new TimerTask()

{

@Override public void run() { try { reload(); } catch (Exception e) { //print some log messages here. } }

}, interval, interval);
Andrzej Wójcik (Tigase) commented 1 decade ago

I checked repository for version in which I could see that in file ClConSQLRepository.java in line 204 there is synchronized(stmt) but I could not find exact version, but assuming this might be issue and try {} catch {} in run() method of TimerTask will help protect timer from being killed in case of other unexpected exceptions I added this suggested try{} catch {}.

This change was added to master and stable branch.
Artur Hefczyc commented 1 decade ago

Thank you for providing additional details and helping us with fixing the problem.
Login to comment

Type	Bug
Priority	Major
Assignee	Artur Hefczyc
RedmineID	3058
Version	tigase-server-7.0.2
Spent time	0

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/server-core#464