Projects tigase _server server-core Issues #464
SEVERE: Uncaught thread: "cluster-nodes" exception (#464)
Artur Hefczyc opened 1 decade ago
Due Date
2015-05-12

bq. We saw a few instances of the following severe exception:

2015-04-30 07:00:55.738 [cluster-nodes] ThreadExceptionHandler.uncaughtException() SEVERE: Uncaught thread: "cluster-nodes" exception java.lang.NullPointerException
at tigase.cluster.repo.ClConSQLRepository.storeItem(ClConSQLRepository.java:204)
at tigase.cluster.repo.ClConConfigRepository.reload(ClConConfigRepository.java:124)
at tigase.cluster.repo.ClConSQLRepository.reload(ClConSQLRepository.java:235)
at tigase.db.comp.ConfigRepository$1.run(ConfigRepository.java:78)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

bq. Can you please advise the root cause of this null pointer exception? We also observed the cluster-nodes thread will be terminated and the cluster_nodes table are no longer updated by the node.

Andrzej, please fix this as soon as possible. This looks like a serious issue which may impact the whole cluster reliability. Once this is done, please reassign it back to me as we need to release a new bug fix version with this.

  • Andrzej Wójcik (Tigase) commented 1 decade ago

    This issue is caused by data_repo field being @null@. This can happen only if there was issue with creation of repository and propably with some issue during establishing connection to database server during server instance startup.

    I think that there is no way to recover from this properly and best would be to find root cause of this issue, ie. by searching logs for entries containing:

    and then solving root cause of this issue.
    
  • Artur Hefczyc commented 1 decade ago

    Thank you for looking at this. In such case it is not such a big problem. However, I think, if there is no way Tigase can handle or recover from this properly and if this happens at startup time, then the best action would be to print a correct log message and stop the server.

  • Unknown commented 1 decade ago

    This issue happened during server runtime, not during server instance startup. The data_repo was not null, but the statement was null and the code synchronized(stmt) thrown the NullPointerException.

    I would suggest to catch exception in the method ConfigRepository.setAutoloadTimer() to prevent the timer thread being terminated when unexpected exception happens.

    autoLoadTimer.schedule(new TimerTask()

    {

    		@Override
    
    		public void run() {
    
                              try {
    
    			reload();
    
                              } catch (Exception e) {
    
                                //print some log messages here.
    
                              }
    
    		}
    

    }, interval, interval);

  • Andrzej Wójcik (Tigase) commented 1 decade ago

    I checked repository for version in which I could see that in file ClConSQLRepository.java in line 204 there is synchronized(stmt) but I could not find exact version, but assuming this might be issue and try {} catch {} in run() method of TimerTask will help protect timer from being killed in case of other unexpected exceptions I added this suggested try{} catch {}.

    This change was added to master and stable branch.

  • Artur Hefczyc commented 1 decade ago

    Thank you for providing additional details and helping us with fixing the problem.

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
3058
Version
tigase-server-7.0.2
Spent time
0
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#464
Please wait...
Page is in error, reload to recover