Graphs are not displayed in Statistics tab (#62)
Closed
wojciech.kapcia@tigase.net opened 9 years ago
Due Date
2016-01-10

In the latest version graphs are not displayed under statistics tab

Screenshot 2015-12-30 12.53.34.png

Andrzej Wójcik (Tigase) commented 9 years ago

I checked on version deployed on sure.im and newly compiled version from repository but it worked fine.

wojciech.kapcia@tigase.net commented 9 years ago

It doesn't work in the version deployed locally via http-api component (ie. tigase-web-ui.war available under @http://localhost:8080/ui/@)

Andrzej Wójcik (Tigase) commented 9 years ago

I checked Sure.IM deployed on localhost:8080/ui/ and it worked fine when I connected to sure.im@. However it failed (graphs were not rendered) when I connected to local server. I checked logs and then I found issue - my MBP is named @mbp-andrzej.local@, but I use @zeus as primary domain name for Tigase and I found in logs following lines:

2015-12-24 18:42:56.355 [pool-15-thread-8]  CIDConnections.openOutgoingConnections()  FINEST: Checking DNS for host: mbp-andrzej.local for: zeus@mbp-andrzej.local

which suggests issue with DNS names in my case. Could you check if same issue is in your case? (look for openOutgoingConnections())

wojciech.kapcia@tigase.net commented 9 years ago

In my case it's only connections to sure.im for pubsub:

2015-12-28 11:58:07.980 [in_3-s2s]         S2SConnectionManager.processPacket()  FINEST: Processing packet: from=sess-man@pc12.home, to=null, DATA=<iq from="admin@atlantiscity/2111717575-tigase-3" type="get" xmlns="jabber:client" id="BsbxkG" to="pubsub@sure.im"><pubsub xmlns="http://jabber.org/protocol/pubsub"><items node="news"/></pubsub></iq>, SIZE=199, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=ADMIN, TYPE=get
2015-12-28 11:58:07.980 [in_3-s2s]         S2SConnectionManager.processPacket()  FINEST: Connection ID is: atlantiscity@sure.im

Also - disco query for the items also works ok:

2015-12-28 11:58:10.507 [in_28-ws2s]       ConnectionManager.writePacketToSocket()  FINEST: ws2s@pc12.home/127.0.0.1_5290_127.0.0.1_50469, type: accept, Socket: ws2s@pc12.home/127.0.0.1_5290_127.0.0.1_50469 Socket[addr=/127.0.0.1,port=50469,localport=5290], jid: admin@atlantiscity/2111717575-tigase-3, Writing packet: from=sess-man@pc12.home, to=ws2s@pc12.home/127.0.0.1_5290_127.0.0.1_50469, DATA=<iq from="stats@atlantiscity" type="result" xmlns="jabber:client" id="lsZkTZ" to="admin@atlantiscity/2111717575-tigase-3"><query xmlns="http://jabber.org/protocol/disco#items" node="stats"><item node="stats/vhost-man" jid="stats@atlantiscity" name="Component: vhost-man"/><item node="stats/message-router" jid="stats@atlantiscity" name="Component: message-router"/><item node="stats/amp" jid="stats@atlantiscity" name="Component: amp"/><item node="stats/bosh" jid="stats@atlantiscity" name="Component: bosh"/><item node="stats/c2s" jid="stats@atlantiscity" name="Component: c2s"/><item node="stats/eventbus" jid="stats@atlantiscity" name="Component: eventbus"/><item node="stats/http" jid="stats@atlantiscity" name="Component: http"/><item node="stats/monitor" jid="stats@atlantiscity" name="Component: monitor"/><item node="stats/s2s" jid="stats@atlantiscity" name="Component: s2s"/><item node="stats/sess-man" jid="stats@atlantiscity" name="Component: sess-man"/><item node="stats/ws2s" jid="stats@atlantiscity" name="Comp ... , SIZE=1051, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=ADMIN, TYPE=result

I've checked the logs and now I don't even see request being made to retrieve data for particular component (not even in the WebUI in websocket stream in the debug).

Andrzej Wójcik (Tigase) commented 9 years ago

Looks that for some reason in this configuration graphs are not properly initialized - looks that this is somehow related to detection of cluster node names. I will keep looking for root cause and solution for this issue.

Andrzej Wójcik (Tigase) commented 9 years ago

I found 2 issues here:

  • server not returning service-unavailable for adhoc command request to cl-comp component which is not available in single mode - this caused issue in client and I decided to fix server as this error should be returned

  • off-by-one in number of generated colors to draw

Current version in my environment works fine, please check if this fixed issue for you. Next snapshot build should contain all fixes.

wojciech.kapcia@tigase.net commented 9 years ago

With today's nighly it still doesn't work: attachment:"Screenshot 2015-12-30 12.53.34.png"

When cluster mode is enabled but we only have single node:

2015-12-30 12:56:34.014 [in_17-cl-comp]    ClusterConnectionManager.writePacketToSocket()  WARNING: No cluster connection to send a packet: from=null, to=null, DATA=<route from="sess-man@pc12.home" pr="NORMAL" perm="ADMIN" to="cl-comp@atlantiscity"><iq type="set" xmlns="jabber:client" to="cl-comp@atlantiscity" from="admin@atlantiscity/2111717575-tigase-1" id="FXylNK"><command node="cluster-nodes-list" xmlns="http://jabber.org/protocol/commands" action="execute"/></iq></route>, SIZE=315, XMLNS=null, PRIORITY=NORMAL, PERMISSION=NONE, TYPE=null

(@atlantiscity@ - vhost, pc12.home - hostname)

Andrzej Wójcik (Tigase) commented 9 years ago

OK, I found cause of issue with build b4123 of Tigase XMPP Server but it is not related to this issue. Error is in fact in Jenkins configuration as sure-im aka tigase-web-ui is being built every day at 03:00 while jaxmpp is scheduled to be build at 01:00 which is ok as tigase-web-ui should be compiled after Jaxmpp is compiled, but tigase-server build is done at 00:00 and triggers multijob which is not building tigase-web-ui but imports existing version during tigase-server-distribution step - which results in fact that build of tigase-web-ui from previous day is bundled to distribution build.

I suppose we should fix this by adding jaxmpp and tigase-web-ui to multijob just before tigase-server-distribution@, what do you think? will this be ok? also we should leave build time for @jaxmpp as it is to make sure it will be build even if build of tigase-server fails. Also building jaxmpp as part of multijob would make sure that distribution is always created from current/fresh versions of projects.

wojciech.kapcia@tigase.net commented 9 years ago

OK, I've re-tested the issue with the latest nightly (which should contain the fix: Version: 2.1-SNAPSHOT-74/67954a6a (Jan 3, 2016 3:12:21 AM)@) and it renders only white page with the @--cluster-nodes=true and single node connected. Console log prints only following exceptions (Firefox, Opera):

uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) : this.n is undefined
Uncaught com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) : Cannot read property 'wpb' of undefined

And error mentioned above ( No cluster connection to send a packet ) is still present

I think that it may be good idea to actually enable the debug options in GWT while we are in the nightly phase so it would be easier to track the problem.

When the cluster mode is disabled then graphs are displayed.

Andrzej Wójcik (Tigase) commented 9 years ago

I found that it is possible if cluster-nodes-list adhoc command is not available this may happen - blank page - and only in this case I got No cluster connection to send a packet error.

In my case I have zeus as vhost name and mbp-andrzej.local as hostname and with this command it works fine.

Could you check that vhost and hostname are ok in your case?

I ask as in latest OSX (some latest update changed it) hostname is changed by routers name which is broadcasted when you connect and use DHCP.

I was able to fix issue in Jaxmpp (timeout of request was failing in GWT) which makes it work after 60s timeout if requests fails so this is almost OK.

I suppose we could add some info about that we retrieve names of cluster nodes and reduce this timeout which should fix this issue.

wojciech.kapcia@tigase.net commented 9 years ago

OK, this seems to be the case (odd one, seemingly same router but it looks like it overrides the hostname in some weird way only in one location):

wojtek@atlantiscity.local ~/dev/tmps/tigase-server-dists/tigase-server-7.1.0-SNAPSHOT-b4127 $ hostname
atlantiscity.local

016-01-11 13:06:44.368 [main] DNSResolver.() WARNING: Resolving default host name: pc12.home took: 419

With normal connection I got matching hostnames:

wojtek@atlantiscity.local ~/dev/tmps/tigase-server-dists/tigase-server-7.1.0-SNAPSHOT-b4127 $ hostname
atlantiscity.local

2016-01-11 13:16:40.532 [main] DNSResolver.() WARNING: Resolving default host name: atlantiscity.local took: 3,845

and it works.

Artur Hefczyc commented 9 years ago

Wojciech, what do you think about creating some kind of documentation for this for other users who may experience similar problems?

wojciech.kapcia@tigase.net commented 9 years ago

I think that the problem is quite peculiar. We are already stressing that it's essential to have correct network configuration (and recommend utilizing $ hostname -f to check it, and in this case it returned correct value! but Tigase used different one) and stick with what we distribute (i.e. scripts being in correct location) thus stumbling against same problem would be quite rare.

However - it may be a good idea to display on the statistics tab a few pointers (in case graphs can't be rendered leaving it blank) like: "check that your networking configuration and hostname picked up by tigase are correct and that you have all admin ad-hoc scrips in correct location".

Artur Hefczyc commented 9 years ago

Wojciech Kapcia wrote:

I think that the problem is quite peculiar. We are already stressing that it's essential to have correct network configuration (and recommend utilizing $ hostname -f to check it, and in this case it returned correct value! but Tigase used different one) and stick with what we distribute (i.e. scripts being in correct location) thus stumbling against same problem would be quite rare.

Yes, I understand. Especially the problem is with Mac OS X which has a different concept and tools for setting hostname, and has the concept of hostname, machine name and I think some others. This is very confusing and I usually spent quite a bit of time to set it up the way I want and to make sure it is correctly recognized by all software.

However - it may be a good idea to display on the statistics tab a few pointers (in case graphs can't be rendered leaving it blank) like: "check that your networking configuration and hostname picked up by tigase are correct and that you have all admin ad-hoc scrips in correct location".

Yes, this is what the statistics tab (or the server report page) is for. It should display all the main information about the server which can be then verified and checked by the service admin.

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
RedmineID
3401
Issue Votes (0)
Watchers (0)
Reference
tigase/_clients/sureim#62
Please wait...
Page is in error, reload to recover