Instrument Sztabina for performance monitoring (SZ-144)
rk@tigase.net opened 2 days ago

Problem The pressure of Sztabina into Sztabina is a critical junction and can break when dealing with large repos. Sztabina has no metrics exposure. To characterize the Sztabina => Sztab boundary under large repo load — and justify streaming — we need to measure notify attempt rate, success/failure, and git HTTP operation duration.

  • rk@tigase.net commented 2 days ago

    Approach I plan to use add Prometheus to instrument and display metrics from various parts of Sztab.

    I am hence adding as a part of this issue: add programming language prometheus/client_golang to Sztabina to Expose /metrics endpoint on :8085 Instrument forwardWithRetry (notify pipeline) Instrument GitHTTPHandler (clone/push/fetch duration) Add Sztabina as second scrape target in prometheus.yml

    Complication notify runs as an ephemeral subprocess — counters in subprocess memory die with the process and never reach /metrics. Fix: notify subprocess POSTs a NotifyEvent to main process /internal/metrics/notify before exiting.

  • rk@tigase.net commented 2 days ago

    Tasks

    • Add prometheus/client_golang dependency
    • metrics/metrics.go — define counters and histograms
    • git_http_handler.go — instrument git HTTP requests
    • main.go — register metrics, expose /metrics, add /internal/metrics/notify handler
    • notify.go — POST NotifyEvent to main process instead of direct counter increment
    • Verify sztabina_notify_attempts_total appears in Prometheus after a push
  • rk@tigase.net commented 2 days ago

    Develop directly on wolsonsc

  • rk@tigase.net changed state to 'In Progress' 2 days ago
    Previous Value Current Value
    Open
    In Progress
  • rk@tigase.net referenced from other issue 2 days ago
  • rk@tigase.net commented 2 days ago

    Sztabina instrumented with prometheus/client_golang. notify pipeline metrics (attempts/success/failure) working via subprocess=>main process POST relay. Git HTTP handler instrumented for request count and duration. Both targets UP in Prometheus. Closing with 1.10.1.

  • rk@tigase.net changed state to 'Closed' 2 days ago
    Previous Value Current Value
    In Progress
    Closed
issue 1 of 1
Type
New Feature
Priority
Normal
Assignee
Version
1.10.1
Sprints
n/a
Customer
n/a
Issue Votes (0)
Watchers (2)
Reference
SZ-144
Please wait...
Page is in error, reload to recover