Andrzej Wójcik (Tigase) opened 1 decade ago
|
|
I added support for presentation of statistics data collected from single instance and presenting them on charts. I was thinking about possibility of collection of statistics from each of cluster node, but do we have any way in which XMPP client can retrieve list of cluster nodes, so I could send adhoc commands to stats component on each node and collect data? |
|
We will have. It will be one of Monitor component tasks to provide admin with a list of cluster nodes. |
|
Actually I was thinking of an admin ad-hoc command in Monitor component to provide admin with a list of cluster nodes. However we do have a way right now to retrieve a list of connected cluster nodes. If you run service discovery from an admin account, you get a list of all server components. One of them is "Cluster connection manager" which shows a list of subitems. Each subitem corresponds to one cluster node. As the item "Node" field is the cluster node hostname. |
|
I already considered usage of discovery and "Cluster connection manager" subitems, but it contains only list of "other" nodes (local node is not listed) and I need full list of nodes, so due to that it is not usable. I also considered additional adhoc command for this, but I would like make this command embedded (ie. as part of Monitor component), so it would be part of Tigase XMPP Server jar and would not require additional scripts containig this command. |
|
Andrzej Wójcik wrote:
Hm, OK. I thought you could get the local node information somehow or guess it form the general service-discovery information. I think we could actually add the local node to the list of cluster nodes provided by "Cluster connection manager" as this is only informative data which do not affect anything else (for now). But I am hesitant to do this, because "Connection manager" name suggest that the component provides information related to "connections" only.
Yes, the new Monitoring framework Bartek is working on will allow for creating commands in either Java or Scripts. However, maybe it makes sense that Monitor as a component provides some useful information through service discovery as well? Maybe a list of all cluster nodes (remote and local) would be it? |
|
Yes, I could use list of nodes from Monitor component whether I could retrieve it using adhoc command or thru service discovery. If it make sens to add it to service discovery then I would be great but I suppose for service discovery we should create proper "tree" with informations so subnode or nodes on similar level of a tree would present similar data. |
|
To be honest, it does not feel right to put a list of cluster nodes into service discovery information for Monitor. It just does not fit there and it would be not natural. I think much better place would be an admin ad-hoc command to either ClusterController or Monitor. You can actually add such a command to either of these components if you like. At least as a temp solution so we would have a fully working interface for monitoring Tigase installations. |
|
I added gathering and presentation of statistics of each component of running Tigase XMPP Server. It also retrieves statistics from every cluster node and present them on graph so it is possible to compare traffic on each node. To make it work on each cluster node newly added script to retrieve list of cluster nodes must be deployed (I deployed this script on every node of sure.im installation). I've just commited changes so working version should be deployed on http://beta.sure.im/ tomorrow. |
|
It looks very well, There are 2 things I noticed:
It would be also extremely useful to have one more item on the list, kind of summary with main statistics for the server. I am thinking of statistics which are provided by default if you double click on the statistics top level component with the lowest stats level. It loads a set of basic and most important metrics of the server. If we had a screen with this summary and a few charts for the metrics for which charts make sense, this would be very useful tool. (with auto-refresh of course). |
|
I will move Auto refresh is already working. Web client retrieves statistics once every minute and keeps info about last 20 requests so after 20 minutes we would get full chart with 20 points, so I think it would be nice chart. %kobit could you draw how this basic statistics should look like? and list what metrics would you like to have there? |
|
I attached a PDF file with a general idea of the main metrics page. This is what I would like to see when clicked on the Statistics tab without any component selected. This is of course a general idea, if you think of any other metrics useful from an admin point of view, feel free to add. %eric can you think of anything else useful from an admin point of view? |
|
I added graphs for main metrics as presented in PDF and moved buttons a little bit up. Tomorrow version with this feature and fix should be deployed by Jenkins at http://beta.sure.im/ |
|
Everything looks very well. One issue I can see is that CPU usage is showing incorrect. Current CPU usage is close to 0% on all machines but the web tool shows between 30% - 50% depending on server. So there must be a mistake somewhere. Another point worth noting is that we lost a significant number of users on the service.... |
|
Fixed issue with CPU usage. About loss of users I suppose it is related to change of servers and IP addresses so some of vhost owner propably moved to other changes or forgot to update DNS entries. |
|
Yes, it looks good now, thank you. |
Type |
New Feature
|
Priority |
Normal
|
Assignee | |
RedmineID |
2097
|
Add new feature (maybe as a tab) which will be visible or not depending on permissions of authenticated user, which would present server statistics. This should be similar to data presented in #2095 but should be able to aggregate data in time - so it would be possible to show how values are changed in time.
Currently it should be implemented using adhoc commands and polling, but when simplified PubSub API for components #478 would be ready and data would be retrieved we could switch to use push for retrieving updates of statistics and other monitored data.
We should take into account that this Tigase XMPP Server installation might not be a single instance but a cluster and we should present aggregated data from all nodes if possible.
Web_Application_Starter_Project_and_Support__2180__Artur_Hefczyc_2014_-public_forums_time_tracker-Tigase_Private-_Tigase_Projects_and_int_muc_tigase_org.png cluster general metrics.pdf