tigase/_server/server-core

Clustering autodiscovery - removing node infromation from DB (#319)

Wojciech Kapcia (Tigase) opened 1 decade ago

Due Date
2014-09-04

Informations about not-longer-available nodes should be removed from database:

during node shutting down (it can re-add it on subsequent startup);
periodically by other nodes if there are obsolete information present (optional)

Activities

Artur Hefczyc commented 1 decade ago

Why is this important? I know there were some talks about it but I cannot find it. Removing it might be tricky as there might be resource access conflict, so this needs to be done in a smart way.
Wojciech Kapcia (Tigase) commented 1 decade ago

Not that it's super important, more like 'nice to have' just to keep things organized and avoid any unnecessary connection attempts after one node is being shutdown (even with the avoiding connections to inactive nodes there may be a small time window during which connection may be attempted - correct?).

And while I agree that the second point may cause some resource access problems I see no reason why first item should be problematic in that context. I also thought about second point as optional (now included in the description).
Artur Hefczyc commented 1 decade ago

Wojciech Kapcia wrote:

Not that it's super important, more like 'nice to have' just to keep things organized.

Understand, however, an advantage of keeping this data is that we have some track record of what cluster nodes connected and when was the last time the node was alive, what was resource usage on that node, etc.... Of course we can keep such a track record in a separate statistics table.

and avoid any unnecessary connection attempts after one node is being shutdown (even with the avoiding connections to inactive nodes there may be a small time window during which connection may be attempted - correct?).

This is true, but still, most likely if the node died or even was shutdown gracefully, the record about the node would not be removed instantly from the table anyway. So it we would not avoid unnecessary connections attempt.

And while I agree that the second point may cause some resource access problems I see no reason why first item should be problematic in that context. I also thought about second point as optional (now included in the description).

Yes, on a graceful shutdown we could do it. There is even a handler API available for shutdown methods and functions. However, we would need to put a timeout on the SQL query to avoid handing the shutdown process on DB query.
Wojciech Kapcia (Tigase) commented 1 decade ago

Artur Hefczyc wrote:

Wojciech Kapcia wrote:

Not that it's super important, more like 'nice to have' just to keep things organized.

Understand, however, an advantage of keeping this data is that we have some track record of what cluster nodes connected and when was the last time the node was alive, what was resource usage on that node, etc.... Of course we can keep such a track record in a separate statistics table.

There is already tigase.stats.CounterDataArchivizer and tigase.stats.CounterDataLogger. If we need more detailed statistics I'd say, if needed, extending those would be better.

and avoid any unnecessary connection attempts after one node is being shutdown (even with the avoiding connections to inactive nodes there may be a small time window during which connection may be attempted - correct?).

This is true, but still, most likely if the node died or even was shutdown gracefully, the record about the node would not be removed instantly from the table anyway. So it we would not avoid unnecessary connections attempt.

And while I agree that the second point may cause some resource access problems I see no reason why first item should be problematic in that context. I also thought about second point as optional (now included in the description).

Yes, on a graceful shutdown we could do it. There is even a handler API available for shutdown methods and functions. However, we would need to put a timeout on the SQL query to avoid handing the shutdown process on DB query.

Bottomline - in case of unintentional shut down (or faulty node altogether) it may be more prudent to leave those expired stats as is - we have information that something is wrong. However, given there is even already API we can use, removing nodes upon graceful shutdown is a good idea and we should proceed with it?
Artur Hefczyc commented 1 decade ago

Wojciech Kapcia wrote:

Artur Hefczyc wrote:

Wojciech Kapcia wrote:

Not that it's super important, more like 'nice to have' just to keep things organized.

Understand, however, an advantage of keeping this data is that we have some track record of what cluster nodes connected and when was the last time the node was alive, what was resource usage on that node, etc.... Of course we can keep such a track record in a separate statistics table.

There is already tigase.stats.CounterDataArchivizer and tigase.stats.CounterDataLogger. If we need more detailed statistics I'd say, if needed, extending those would be better.

These do not work in a cluster mode unfortunately.

and avoid any unnecessary connection attempts after one node is being shutdown (even with the avoiding connections to inactive nodes there may be a small time window during which connection may be attempted - correct?).

This is true, but still, most likely if the node died or even was shutdown gracefully, the record about the node would not be removed instantly from the table anyway. So it we would not avoid unnecessary connections attempt.

And while I agree that the second point may cause some resource access problems I see no reason why first item should be problematic in that context. I also thought about second point as optional (now included in the description).

Yes, on a graceful shutdown we could do it. There is even a handler API available for shutdown methods and functions. However, we would need to put a timeout on the SQL query to avoid handing the shutdown process on DB query.

Bottomline - in case of unintentional shut down (or faulty node altogether) it may be more prudent to leave those expired stats as is - we have information that something is wrong. However, given there is even already API we can use, removing nodes upon graceful shutdown is a good idea and we should proceed with it?

I am ok with the graceful shutdown and removing this data by the node itself.
Wojciech Kapcia (Tigase) commented 1 decade ago

Added ShutdownHook cluster_node item removal; all prepared statements have 10s (configurable) timeout.
Login to comment

Type	New Feature
Priority	Normal
Assignee	Artur Hefczyc
RedmineID	1946
Version	tigase-server-7.0.0
Spent time	0

Issue Votes (0)

Watchers (0)

Reference

tigase/_server/server-core#319