Add high load linux settings (especially network config) in helm-chart configuration (#3)
wojciech.kapcia@tigase.net opened 6 months ago

As @andrzej.wojcik pointed out (https://tigase.dev/tigase/_server/tigase-server/~issues/25#IssueComment-125166) we should make sure that Linux Settings for High Load Systems outlined in the documentation are correctly set in helm-chart.

wojciech.kapcia@tigase.net added "Related" tigase/_server/tigase-server#25 6 months ago
wojciech.kapcia@tigase.net referenced from other issue 6 months ago
wojciech.kapcia@tigase.net moved 6 months ago
Previous Value Current Value
Attic/helm-charts
tigase/helm-charts
Andrzej Wójcik (Tigase) commented 5 months ago

I've looked into this issue and there are a few things to consider.

Not all sysctl settings are marked as safe in kubernetes and that changes with version, ie. net.ipv4.ip_local_port_range is already supported but net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_probes, or net.ipv4.tcp_keepalive_intvl as only supported on Kubernetes at least 1.29.

We have a few possible solutions:

  1. Allow customization of those settings and apply our default values if no other value is set on Kubernetes that support particular sysctl setting (That would require a lot of options to be added).
  2. In securityContext block in values.yaml (defaults) available in Helm, we could have following code:
securityContext:
  sysctl:
    - name: 
      value: "32768    61000"               
    # supported since Kubernetes 1.29
    # - name: net.ipv4.tcp_keepalive_time
    #   value: "60"
    # - name: net.ipv4.tcp_keepalive_probes
    #   value: "3"
    # - name: net.ipv4.tcp_keepalive_intvl
    #   value: "90"

Following block would set defaults for net.ipv4.ip_local_port_range that is already available and would list defaults of other settings (with note about supported k8s version) allowing people to copy this part and uncomment it in their own configuration.

The second option seams the best as it would allow users to adjust their sysctl settings and would present defaults without any overhead of customizing Helm chart for each new variable to be supported (we would just add it to the list of supported setting with a default value).

@wojtek What do you think about that?

Referenced from commit 5 months ago
Andrzej Wójcik (Tigase) commented 5 months ago

I've added suggested sysctl block to values.yaml, but left it commented out to let user decide.

Andrzej Wójcik (Tigase) changed state to 'In Progress' 5 months ago
Previous Value Current Value
Open
In Progress
Andrzej Wójcik (Tigase) changed state to 'In QA' 5 months ago
Previous Value Current Value
In Progress
In QA
wojciech.kapcia@tigase.net commented 5 months ago

I assume that setting those options in older version of k8s would result in error hence it's not possible to just set them as-is currently?

Looking at https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#safe-and-unsafe-sysctls I'm not sure what would be the outcome. It also seems to point that it should be possible to set those on pod level still, even for unsafe options but that would not be possible with helm/deployment?

Andrzej Wójcik (Tigase) commented 5 months ago

In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.

As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.

wojciech.kapcia@tigase.net commented 5 months ago

In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.

Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?

As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.

Maybe setting it on deployment level then? In such case it should be easier to make it configurable?

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?

Andrzej Wójcik (Tigase) commented 5 months ago

Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?

Overwritten by the new config user would specify? I think it should be, however, user would need to change each entry, one by one to get the correct one.

Maybe setting it on deployment level then? In such case it should be easier to make it configurable?

Our current helm chart sets it for pod template that is part part of a deployment.

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?

Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.

wojciech.kapcia@tigase.net commented 5 months ago

On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?

Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.

Shouldn't those be updated to be "correct" by k8s rulebook?

All in all - let's leave it as is, i.e. with commented out settings.

wojciech.kapcia@tigase.net added "Related" tigase/_server/server-core#1542 3 months ago
issue 1 of 1
Please wait...
Page is in error, reload to recover