Instrument Sztabina for performance monitoring (SZ-144)

rk@tigase.net opened 1 month ago

Problem The pressure of Sztabina into Sztabina is a critical junction and can break when dealing with large repos. Sztabina has no metrics exposure. To characterize the Sztabina => Sztab boundary under large repo load — and justify streaming — we need to measure notify attempt rate, success/failure, and git HTTP operation duration.

Activities

rk@tigase.net commented 1 month ago

Approach I plan to use add Prometheus to instrument and display metrics from various parts of Sztab.

I am hence adding as a part of this issue: add programming language prometheus/client_golang to Sztabina to Expose /metrics endpoint on :8085 Instrument forwardWithRetry (notify pipeline) Instrument GitHTTPHandler (clone/push/fetch duration) Add Sztabina as second scrape target in prometheus.yml

Complication notify runs as an ephemeral subprocess — counters in subprocess memory die with the process and never reach /metrics. Fix: notify subprocess POSTs a NotifyEvent to main process /internal/metrics/notify before exiting.
rk@tigase.net commented 1 month ago
Tasks

Add prometheus/client_golang dependency

metrics/metrics.go — define counters and histograms

git_http_handler.go — instrument git HTTP requests

main.go — register metrics, expose /metrics, add /internal/metrics/notify handler

notify.go — POST NotifyEvent to main process instead of direct counter increment

Verify sztabina_notify_attempts_total appears in Prometheus after a push
rk@tigase.net commented 1 month ago

Develop directly on wolsonsc
rk@tigase.net changed state to 'In Progress' 1 month ago
Previous Value Current Value
Open

In Progress
rk@tigase.net referenced from other issue 1 month ago

Monitor Sztab JVM vital signs (SZ-143)

Closed
rk@tigase.net commented 1 month ago

Sztabina instrumented with prometheus/client_golang. notify pipeline metrics (attempts/success/failure) working via subprocess=>main process POST relay. Git HTTP handler instrumented for request count and duration. Both targets UP in Prometheus. Closing with 1.10.1.
rk@tigase.net changed state to 'Closed' 1 month ago
Previous Value Current Value
In Progress

Closed
Login to comment

Previous Value	Current Value
Open	In Progress

Previous Value	Current Value
In Progress	Closed

Type	New Feature
Priority	Normal
Assignee	rk@tigase.net
Version	1.10.1
Sprints	n/a
Customer	n/a

Issue Votes (0)

Watchers (2)

Reference

SZ-144