Add high load linux settings (especially network config) in helm-chart configuration (#3)

Wojciech Kapcia (Tigase) opened 1 year ago

As @andrzej.wojcik pointed out (https://tigase.dev/tigase/_server/tigase-server/~issues/25#IssueComment-125166) we should make sure that Linux Settings for High Load Systems outlined in the documentation are correctly set in helm-chart.

Related
- Add high load linux settings (especially network config) in docker image and docker compose examples (tigase/_server/tigase-server#25) In QA
- Add option to configure Socket TCP KeepAlive settings (tigase/_server/server-core#1542) In QA

Activities

Wojciech Kapcia (Tigase) added "Related" tigase/_server/tigase-server#25 1 year ago
Wojciech Kapcia (Tigase) referenced from other issue 1 year ago

Add high load linux settings (especially network config) in docker image and docker compose examples (tigase/_server/tigase-server#25)

In QA
Wojciech Kapcia (Tigase) moved 1 year ago
Previous Value Current Value
Attic/helm-charts

tigase/helm-charts
Andrzej Wójcik (Tigase) commented 1 year ago
I've looked into this issue and there are a few things to consider.

Not all sysctl settings are marked as safe in kubernetes and that changes with version, ie. net.ipv4.ip_local_port_range is already supported but net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_probes, or net.ipv4.tcp_keepalive_intvl as only supported on Kubernetes at least 1.29.

We have a few possible solutions:

Allow customization of those settings and apply our default values if no other value is set on Kubernetes that support particular sysctl setting (That would require a lot of options to be added).

In securityContext block in values.yaml (defaults) available in Helm, we could have following code:

securityContext: sysctl: - name: value: "32768 61000" # supported since Kubernetes 1.29 # - name: net.ipv4.tcp_keepalive_time # value: "60" # - name: net.ipv4.tcp_keepalive_probes # value: "3" # - name: net.ipv4.tcp_keepalive_intvl # value: "90"

Following block would set defaults for net.ipv4.ip_local_port_range that is already available and would list defaults of other settings (with note about supported k8s version) allowing people to copy this part and uncomment it in their own configuration.

The second option seams the best as it would allow users to adjust their sysctl settings and would present defaults without any overhead of customizing Helm chart for each new variable to be supported (we would just add it to the list of supported setting with a default value).

@wojtek What do you think about that?
Andrzej Wójcik (Tigase) commented 1 year ago

I've added suggested sysctl block to values.yaml, but left it commented out to let user decide.
Andrzej Wójcik (Tigase) changed state to 'In Progress' 1 year ago
Previous Value Current Value
Open

In Progress
Andrzej Wójcik (Tigase) changed state to 'In QA' 1 year ago
Previous Value Current Value
In Progress

In QA
Wojciech Kapcia (Tigase) commented 1 year ago

I assume that setting those options in older version of k8s would result in error hence it's not possible to just set them as-is currently?

Looking at https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#safe-and-unsafe-sysctls I'm not sure what would be the outcome. It also seems to point that it should be possible to set those on pod level still, even for unsafe options but that would not be possible with helm/deployment?
Andrzej Wójcik (Tigase) commented 1 year ago

In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.

As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.
Wojciech Kapcia (Tigase) commented 1 year ago

In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.

Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?

As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.

Maybe setting it on deployment level then? In such case it should be easier to make it configurable?

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?
Andrzej Wójcik (Tigase) commented 1 year ago

Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?

Overwritten by the new config user would specify? I think it should be, however, user would need to change each entry, one by one to get the correct one.

Maybe setting it on deployment level then? In such case it should be easier to make it configurable?

Our current helm chart sets it for pod template that is part part of a deployment.

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?

Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.
Wojciech Kapcia (Tigase) commented 1 year ago

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?

Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.

Shouldn't those be updated to be "correct" by k8s rulebook?

All in all - let's leave it as is, i.e. with commented out settings.
Wojciech Kapcia (Tigase) added "Related" tigase/_server/server-core#1542 11 months ago
Login to comment

Previous Value	Current Value
Attic/helm-charts	tigase/helm-charts

Previous Value	Current Value
Open	In Progress

Previous Value	Current Value
In Progress	In QA

Type	Task
Priority	Normal
Assignee	Andrzej Wójcik (Tigase)
Version	none
Server Version	tigase-server-8.5.0
Sprints	n/a
Customer	n/a

Issue Votes (0)

Watchers (3)

Reference

tigase/helm-charts#3