-
-
I've looked into this issue and there are a few things to consider.
Not all
sysctl
settings are marked as safe in kubernetes and that changes with version, ie.net.ipv4.ip_local_port_range
is already supported butnet.ipv4.tcp_keepalive_time
,net.ipv4.tcp_keepalive_probes
, ornet.ipv4.tcp_keepalive_intvl
as only supported on Kubernetes at least 1.29.We have a few possible solutions:
- Allow customization of those settings and apply our default values if no other value is set on Kubernetes that support particular sysctl setting (That would require a lot of options to be added).
- In
securityContext
block invalues.yaml
(defaults) available in Helm, we could have following code:
securityContext: sysctl: - name: value: "32768 61000" # supported since Kubernetes 1.29 # - name: net.ipv4.tcp_keepalive_time # value: "60" # - name: net.ipv4.tcp_keepalive_probes # value: "3" # - name: net.ipv4.tcp_keepalive_intvl # value: "90"
Following block would set defaults for
net.ipv4.ip_local_port_range
that is already available and would list defaults of other settings (with note about supported k8s version) allowing people to copy this part and uncomment it in their own configuration.The second option seams the best as it would allow users to adjust their sysctl settings and would present defaults without any overhead of customizing Helm chart for each new variable to be supported (we would just add it to the list of supported setting with a default value).
@wojtek What do you think about that?
-
I assume that setting those options in older version of k8s would result in error hence it's not possible to just set them as-is currently?
Looking at https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#safe-and-unsafe-sysctls I'm not sure what would be the outcome. It also seems to point that it should be possible to set those on pod level still, even for unsafe options but that would not be possible with helm/deployment?
-
In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.
As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.
-
In the mentioned document they stated that "unsafe" options may have impact on the whole node (all pods). I'm not sure if setting them would end up with a crash (I've seen reports on StackOverflow mentioning pod errored with unsafe options), but I would prefer to stay on the safe side and let user decide and not apply defaults as-is.
Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?
As for pod vs deployment, all that can be set on pod level can be applied to deployment as deployment contains template for a pods.
Maybe setting it on deployment level then? In such case it should be easier to make it configurable?
On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?
-
Pondering it a bit more - wouldn't such configuration be overwritten when updating helm-chart?
Overwritten by the new config user would specify? I think it should be, however, user would need to change each entry, one by one to get the correct one.
Maybe setting it on deployment level then? In such case it should be easier to make it configurable?
Our current helm chart sets it for pod template that is part part of a deployment.
On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?
Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.
-
On the other hand, most config options relevant to us are "safe" for 1.29+ versions. Considering very-high-cadence of k8s, 1.28 is marked: "End of Life:2024-10-28" thus in 4 months our qualm will be virtually insignificant?
Yes, and no. Our older Tygrys clusters are running 1.26 and some 1.27.
Shouldn't those be updated to be "correct" by k8s rulebook?
All in all - let's leave it as is, i.e. with commented out settings.
Type |
Task
|
Priority |
Normal
|
Assignee | |
Version |
none
|
Server Version |
tigase-server-8.5.0
|
Sprints |
n/a
|
Customer |
n/a
|
As @andrzej.wojcik pointed out (https://tigase.dev/tigase/_server/tigase-server/~issues/25#IssueComment-125166) we should make sure that Linux Settings for High Load Systems outlined in the documentation are correctly set in helm-chart.