Projects tigase _server server-core Issues #1009
How to test our cluster whether working fine when we enable the cluster (#1009)
Livia Yu opened 6 years ago

Dear sir,

we have setuped 2 machines for Tigase7.1.4 server cluster. form the logs, the clusters seem to work fine. How to check our cluster whether working or not? Is there any tools to help to check this?

The DNS RSV configuration is below:

ZZ-DMZ-0501.foxconn.com IN A 10.207.249.193
ZZ-DMZ-0401.foxconn.com IN A 10.207.249.194
_xmpp-client._tcp.foxconn.com 86400 IN SRV 0 5 5222 im5.foxconn.com
_xmpp-client._tcp.foxconn.com 86400 IN SRV 0 5 5222 im6.foxconn.com
_xmpp-server._tcp.foxconn.com 86400 IN SRV 0 5 5269 im5.foxconn.com
_xmpp-server._tcp.foxconn.com 86400 IN SRV 0 5 5269 im6.foxconn.com

our domain is foxconn.com.

the attachment is DNS RSV&server config.

Thanks very much.

194.init.properties 193.init.properties rsv.png zz-dmz-0501.png zz-dmz-0401.png

Artur Hefczyc commented 6 years ago

Wojciech please provide assistance as soon as you can.

wojciech.kapcia@tigase.net commented 6 years ago

Your DNS configuration looks correct. As for checking cluster connection - please refer to the documentation Checking Cluster Connections

I checked your init.properties files an have a couple of comments:

  • those files should be virtually identical on all cluster machines
  • --admins configuration can include http@{clusterNode} instead of specifying it for each node
  • you are using non-clustered MUC and PubSub components - please refer to [[acs:|Advanced Clustering Strategy (ACS)]] guide, and specifically to:
    • [[acs:Tigase ACS SM Configuration]]
    • [[acs:Tigase ACS MUC Configuration]]
    • [[acs:Tigase ACS PubSub Configuration]]

Please also note that there are newer version of Tigase XMPP Server available: 7.1.5 (your current release line) and brand-new 8.0.0 - both available for download from download section

Livia Yu commented 6 years ago

Hi Wojciech Kapcia,

Thanks for your timely reply.

I have check the S2S connection according to your suggestion and seems both servers' 5277 port run well.

for MUC&PubSub cluster,cause the servers' cannot access internet now so if I enabled MUC&PubSub cluster, the service will shutdown immediately(for ACS lisence check). So currently I didnot enable it.

Here is our current situation:

When I use PC xmpp Client (Spark) and set the domain as foxconn.com and port as 5222. I cannot event connected to ther services. Any debug suggestions for current issue? Please help to give some advices.

Thanks.

wojciech.kapcia@tigase.net commented 6 years ago

Livia Yu wrote:

I have check the S2S connection according to your suggestion and seems both servers' 5277 port run well.

This looks ok.

for MUC&PubSub cluster,cause the servers' cannot access internet now so if I enabled MUC&PubSub cluster, the service will shutdown immediately(for ACS lisence check). So currently I didnot enable it.

Understood. As per documentation Licensing - you can register your installation and then retrieve temporary licence. With working internet connection this licence is periodically renewed.

Here is our current situation:

When I use PC xmpp Client (Spark) and set the domain as foxconn.com and port as 5222. I cannot event connected to ther services. Any debug suggestions for current issue? Please help to give some advices.

Is the client reporting any issue?

Can you establish a basic socket connection and open the stream? You can try using telnet for example:

telnet im5.foxconn.com 5222

or

telnet im6.foxconn.com 5222

And then send following:

<stream:stream xmlns:stream="http://etherx.jabber.org/streams" xml:lang="en" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="jabber:client" to="foxconn.com" version="1.0">
wojciech.kapcia@tigase.net commented 6 years ago

Livia Yu wrote:

for MUC&PubSub cluster,cause the servers' cannot access internet now so if I enabled MUC&PubSub cluster, the service will shutdown immediately(for ACS lisence check). So currently I didnot enable it.

Could you confirm that you are running the server with following configuration:

--cluster-mode = true
--sm-cluster-strategy-class=tigase.cluster.strategy.OnlineUsersCachingStrategy

?

And can you confirm that you can see files names statistics.log* under logs/ directory?

Livia Yu commented 6 years ago

Can you establish a basic socket connection and open the stream? You can try using telnet for example:

**I have do as your suggestion. Here is the results:

telnet foxconn.com 5222 --> Connection refused
telnet im5.foxconn.com 5222 -->OK
telnet im6.foxconn.com 5222 -->OK

Seems any configure error with DNS ? Since client need to use foxcnn.com:5222 to connect to our services but not the cluster-node like im5.foxconn.com:5222 to our service. **

Could you confirm that you are running the server with following configuration:

--cluster-mode = true
--sm-cluster-strategy-class=tigase.cluster.strategy.OnlineUsersCachingStrategy

?

Yes, I have add thoes settings in etc/init.properties

And can you confirm that you can see files names statistics.log* under logs/ directory?

-->Yes ,there is one file named statistics.log.0 and below is some of the logs.

2019-03-15 19:43:40.771 [pool-20-thread-1]  LicenceChecker.b()                INFO:     Missing licence file (etc/acs.licence), retrieving from the server!
2019-03-15 19:43:40.771 [pool-20-thread-1]  LicenceChecker.a()                WARNING:  Missing licence file (etc/acs.licence)!
2019-03-15 19:43:40.779 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true
2019-03-15 19:43:40.779 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 8, maxCheckRetryCount: 10, delay: 5,342, bannedShutdown: false, forceSendingStatistics: false
2019-03-15 19:43:40.779 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   Invalid licence file detected! Remaining checks: 2 (next attempt in: 10,149s)
2019-03-15 19:43:40.779 [pool-20-thread-1]  LicenceCheckDailyTask.a()         FINE:     ==================================================================
2019-03-15 19:43:40.780 [pool-20-thread-1]  LicenceCheckDailyTask.a()         FINE:     Daily License verification runtime (run: 9 out of: 10; force reloading: true, delay: 10,149).
2019-03-15 22:32:49.831 [pool-20-thread-1]  InstallationIdRetriever.c()       FINEST:   Retrieved installationID from database: null
2019-03-15 22:32:49.846 [pool-20-thread-1]  InstallationIdRetriever.b()       WARNING:  Server returned error: license.tigase.net; next retry in 2s (retries left: 1)
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINE:     statisticsUploadFailCount: 0, STATISTICS_FAIL_COUNT_LIMIT: 10
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   [1]Checking licence for component: acs, licChecker: LicenceChecker{LICENCE_FILE=etc/acs.licence, componentName='acs', initialCheckDelay=748, subsequentCheckDelay=86400, updateCall=tigase.licence.callbacks.LicenceCheckerUpdateCallbackImplACS@6be7a271, lic=null}, licence: null
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceChecker.b()                FINEST:   Trying to load licence file from file: etc/acs.licence (exists: false, empty: true), forcing load from server: true
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceChecker.a()                FINEST:   Loading licence for component: acs from server, licenceData: null
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceChecker.a()                INFO:     Cannot load licence from Server.
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceChecker.b()                INFO:     Missing licence file (etc/acs.licence), retrieving from the server!
2019-03-15 22:32:49.846 [pool-20-thread-1]  LicenceChecker.a()                WARNING:  Missing licence file (etc/acs.licence)!
2019-03-15 22:32:49.854 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true
2019-03-15 22:32:49.855 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 9, maxCheckRetryCount: 10, delay: 10,149, bannedShutdown: false, forceSendingStatistics: false
2019-03-15 22:32:49.855 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   Invalid licence file detected! Remaining checks: 1 (next attempt in: 19,283s)
2019-03-15 22:32:49.855 [pool-20-thread-1]  LicenceCheckDailyTask.a()         FINE:     ==================================================================
2019-03-15 22:32:49.855 [pool-20-thread-1]  LicenceCheckDailyTask.a()         FINE:     Daily License verification runtime (run: 10 out of: 10; force reloading: true, delay: 19,283).
2019-03-16 03:54:12.924 [pool-20-thread-1]  InstallationIdRetriever.c()       FINEST:   Retrieved installationID from database: null
2019-03-16 03:54:12.941 [pool-20-thread-1]  InstallationIdRetriever.b()       WARNING:  Server returned error: license.tigase.net; next retry in 2s (retries left: 1)
2019-03-16 03:54:12.941 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINE:     statisticsUploadFailCount: 0, STATISTICS_FAIL_COUNT_LIMIT: 10
2019-03-16 03:54:12.941 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   [1]Checking licence for component: acs, licChecker: LicenceChecker{LICENCE_FILE=etc/acs.licence, componentName='acs', initialCheckDelay=748, subsequentCheckDelay=86400, updateCall=tigase.licence.callbacks.LicenceCheckerUpdateCallbackImplACS@6be7a271, lic=null}, licence: null
2019-03-16 03:54:12.941 [pool-20-thread-1]  LicenceChecker.b()                FINEST:   Trying to load licence file from file: etc/acs.licence (exists: false, empty: true), forcing load from server: true
2019-03-16 03:54:12.942 [pool-20-thread-1]  LicenceChecker.a()                FINEST:   Loading licence for component: acs from server, licenceData: null
2019-03-16 03:54:12.942 [pool-20-thread-1]  LicenceChecker.a()                INFO:     Cannot load licence from Server.
2019-03-16 03:54:12.942 [pool-20-thread-1]  LicenceChecker.b()                INFO:     Missing licence file (etc/acs.licence), retrieving from the server!
2019-03-16 03:54:12.942 [pool-20-thread-1]  LicenceChecker.a()                WARNING:  Missing licence file (etc/acs.licence)!
2019-03-16 03:54:12.950 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true
2019-03-16 03:54:12.950 [pool-20-thread-1]  LicenceCheckDailyTask$1.run()     FINEST:   Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 10, maxCheckRetryCount: 10, delay: 19,283, bannedShutdown: false, forceSendingStatistics: false

Please help to check.

Thanks.

Livia Yu commented 6 years ago

Livia Yu wrote:

Can you establish a basic socket connection and open the stream? You can try using telnet for example:

**I have do as your suggestion. Here is the results: telnet foxconn.com 5222 --> Connection refused telnet im5.foxconn.com 5222 -->OK telnet im6.foxconn.com 5222 -->OK

Seems any configure error with DNS ? Since client need to use foxcnn.com:5222 to connect to our services but not the cluster-node like im5.foxconn.com:5222 to our service. DO I need to deploy the tigase-service on the domain named foxconn.com also? Currently I deploy the services on im5.foxconn.com and im6.foxconn.com **

Could you confirm that you are running the server with following configuration:

--cluster-mode = true
--sm-cluster-strategy-class=tigase.cluster.strategy.OnlineUsersCachingStrategy

?

Yes, I have add thoes settings in etc/init.properties

And can you confirm that you can see files names statistics.log* under logs/ directory?

-->Yes ,there is one file named statistics.log.0 and below is some of the logs.

2019-03-15 19:43:40.771 [pool-20-thread-1] LicenceChecker.b() INFO: Missing licence file (etc/acs.licence), retrieving from the server! 2019-03-15 19:43:40.771 [pool-20-thread-1] LicenceChecker.a() WARNING: Missing licence file (etc/acs.licence)! 2019-03-15 19:43:40.779 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true 2019-03-15 19:43:40.779 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 8, maxCheckRetryCount: 10, delay: 5,342, bannedShutdown: false, forceSendingStatistics: false 2019-03-15 19:43:40.779 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Invalid licence file detected! Remaining checks: 2 (next attempt in: 10,149s) 2019-03-15 19:43:40.779 [pool-20-thread-1] LicenceCheckDailyTask.a() FINE: ================================================================== 2019-03-15 19:43:40.780 [pool-20-thread-1] LicenceCheckDailyTask.a() FINE: Daily License verification runtime (run: 9 out of: 10; force reloading: true, delay: 10,149). 2019-03-15 22:32:49.831 [pool-20-thread-1] InstallationIdRetriever.c() FINEST: Retrieved installationID from database: null 2019-03-15 22:32:49.846 [pool-20-thread-1] InstallationIdRetriever.b() WARNING: Server returned error: license.tigase.net; next retry in 2s (retries left: 1) 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINE: statisticsUploadFailCount: 0, STATISTICS_FAIL_COUNT_LIMIT: 10 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: [1]Checking licence for component: acs, licChecker: LicenceChecker{LICENCE_FILE=etc/acs.licence, componentName='acs', initialCheckDelay=748, subsequentCheckDelay=86400, updateCall=tigase.licence.callbacks.LicenceCheckerUpdateCallbackImplACS@6be7a271, lic=null}, licence: null 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceChecker.b() FINEST: Trying to load licence file from file: etc/acs.licence (exists: false, empty: true), forcing load from server: true 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceChecker.a() FINEST: Loading licence for component: acs from server, licenceData: null 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceChecker.a() INFO: Cannot load licence from Server. 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceChecker.b() INFO: Missing licence file (etc/acs.licence), retrieving from the server! 2019-03-15 22:32:49.846 [pool-20-thread-1] LicenceChecker.a() WARNING: Missing licence file (etc/acs.licence)! 2019-03-15 22:32:49.854 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true 2019-03-15 22:32:49.855 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 9, maxCheckRetryCount: 10, delay: 10,149, bannedShutdown: false, forceSendingStatistics: false 2019-03-15 22:32:49.855 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Invalid licence file detected! Remaining checks: 1 (next attempt in: 19,283s) 2019-03-15 22:32:49.855 [pool-20-thread-1] LicenceCheckDailyTask.a() FINE: ================================================================== 2019-03-15 22:32:49.855 [pool-20-thread-1] LicenceCheckDailyTask.a() FINE: Daily License verification runtime (run: 10 out of: 10; force reloading: true, delay: 19,283). 2019-03-16 03:54:12.924 [pool-20-thread-1] InstallationIdRetriever.c() FINEST: Retrieved installationID from database: null 2019-03-16 03:54:12.941 [pool-20-thread-1] InstallationIdRetriever.b() WARNING: Server returned error: license.tigase.net; next retry in 2s (retries left: 1) 2019-03-16 03:54:12.941 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINE: statisticsUploadFailCount: 0, STATISTICS_FAIL_COUNT_LIMIT: 10 2019-03-16 03:54:12.941 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: [1]Checking licence for component: acs, licChecker: LicenceChecker{LICENCE_FILE=etc/acs.licence, componentName='acs', initialCheckDelay=748, subsequentCheckDelay=86400, updateCall=tigase.licence.callbacks.LicenceCheckerUpdateCallbackImplACS@6be7a271, lic=null}, licence: null 2019-03-16 03:54:12.941 [pool-20-thread-1] LicenceChecker.b() FINEST: Trying to load licence file from file: etc/acs.licence (exists: false, empty: true), forcing load from server: true 2019-03-16 03:54:12.942 [pool-20-thread-1] LicenceChecker.a() FINEST: Loading licence for component: acs from server, licenceData: null 2019-03-16 03:54:12.942 [pool-20-thread-1] LicenceChecker.a() INFO: Cannot load licence from Server. 2019-03-16 03:54:12.942 [pool-20-thread-1] LicenceChecker.b() INFO: Missing licence file (etc/acs.licence), retrieving from the server! 2019-03-16 03:54:12.942 [pool-20-thread-1] LicenceChecker.a() WARNING: Missing licence file (etc/acs.licence)! 2019-03-16 03:54:12.950 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: [2]Checking licence for component: acs, invalidLicencePresent: true, forceSendingStatistics: false, bannedShutdown: false, displayLicenceNotice: false, licenceShown: true 2019-03-16 03:54:12.950 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 10, maxCheckRetryCount: 10, delay: 19,283, bannedShutdown: false, forceSendingStatistics: false

Please help to check.

Thanks.

wojciech.kapcia@tigase.net commented 6 years ago

Livia Yu wrote:

Can you establish a basic socket connection and open the stream? You can try using telnet for example:

**I have do as your suggestion. Here is the results:

telnet foxconn.com 5222 --> Connection refused
telnet im5.foxconn.com 5222 -->OK
telnet im6.foxconn.com 5222 -->OK

Seems any configure error with DNS ? Since client need to use foxcnn.com:5222 to connect to our services but not the cluster-node like im5.foxconn.com:5222 to our service.

This is correct because you instruct client to connect to either im5.foxconn.com or im6.foxconn.com through SRV records. It's possible that Spark ignores those settings and tries to connect to foxconn.com which could not work (depends on foxconn.com configuration).

Can you check Spark logs for connection errors?

Could you try different XMPP client? You could try https://psi-im.org/ which has XML console

**

Could you confirm that you are running the server with following configuration:

--cluster-mode = true
--sm-cluster-strategy-class=tigase.cluster.strategy.OnlineUsersCachingStrategy

?

Yes, I have add thoes settings in etc/init.properties

Thank you.

And can you confirm that you can see files names statistics.log* under logs/ directory?

-->Yes ,there is one file named statistics.log.0 and below is some of the logs.

2019-03-16 03:54:12.950 [pool-20-thread-1] LicenceCheckDailyTask$1.run() FINEST: Checking result: invalidLicencePresent: true, LicenceChecker.installationId: null, count: 10, maxCheckRetryCount: 10, delay: 19,283, bannedShutdown: false, forceSendingStatistics: false


What happens after those log entries?

Livia Yu commented 6 years ago

Hi Wojciech Kapcia,

After days checking, finally found the issue. The error happends not about the server configuration but about the Client Spark configuration When I try to connect. I used the manual settings for connection the server with foxconn.com and port 5222. But actually we need auto settings for this and when I try to register/login the server with jid (like livia@foxconn.com), the DNS SRV will be triggered and give me the real domain im5.foxconn.com or im6.foxconn.com with port numbers.

And can you confirm that you can see files names statistics.log* under logs/ directory? What happens after those log entries? --> The tigase service will shutdown for serveral hours automaticlly.

Thanks for all your supports.

wojciech.kapcia@tigase.net commented 6 years ago

Livia Yu wrote:

Hi Wojciech Kapcia,

After days checking, finally found the issue. The error happends not about the server configuration but about the Client Spark configuration When I try to connect. I used the manual settings for connection the server with foxconn.com and port 5222. But actually we need auto settings for this and when I try to register/login the server with jid (like livia@foxconn.com), the DNS SRV will be triggered and give me the real domain im5.foxconn.com or im6.foxconn.com with port numbers.

I'm glad it worked. If you have further questions don't hesitate to contact us.

And can you confirm that you can see files names statistics.log* under logs/ directory? What happens after those log entries? --> The tigase service will shutdown for serveral hours automaticlly.

Thanks for all your supports.

Than you for confirmation. I'll close the issue.

issue 1 of 1
Type
New Feature
Priority
Blocker
Assignee
RedmineID
8746
Version
tigase-server-7.1.4
Spent time
5h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#1009
Please wait...
Page is in error, reload to recover