Protection from AI bots and crawlers (SZ-73)

Artur Hefczyc opened 6 months ago

The main problem we faced was our servers overload by AI bots and crawlers. The most sensible solution seems to hide resource heavy from anonymous or guests access. I suggested to make these operations accessible based on user permissions. This would give us the most flexibility.

Activities

rk@tigase.net changed state to 'In Progress' 4 months ago
Previous Value Current Value
Open

In Progress
rk@tigase.net commented 4 months ago
Bot Attack Surface Area

Description

1) Test Approach:

Choose tools to measure impact of Bots

Choose tools to induce Bot like stress

Form a baseline of resource usage with Bot attack before applying Bot guardrails

After each layer is added — verify the layer holds under the same load. We would watch the sztab-backend and sztabina pods specifically during bot stress tests.

2) Identify tools to measure impact of Bot (CPU usage or I/O usage)

Grafana + Prometheus — we already have this or it's easy to add to the cluster via Helm.

Gives us CPU, memory, and network I/O per pod.

3) Identify tools to induce Bot-like stress

A) k6 — open source load testing tool

We can write scripts in TypeScript and simulate concurrent anonymous/bot traffic against specific endpoints.

Example:

import http, { RefinedResponse, ResponseType } from 'k6/http'; import { check } from 'k6'; export default function (): void { const res: RefinedResponse<ResponseType> = http.get( 'https://staging.sztab.com/api/projects/1/pulls/5/diff', { headers: { 'User-Agent': 'GPTBot/1.0' }, } ); check(res, { 'status is 200': (r) => r.status === 200, }); }

B) Java with Gatling

This is essentially the Java equivalent of k6. Since the broader Tigase team is Java-first, Gatling scripts would feel more natural to them and fit into Maven builds. Shall I use this option? This way the Bot simulation scripts an be reused for other Tigase projects.

Kotlin developers can use Gatling in Kotlin; Java developers can use Gatling in Java

C) JMX

JMX scripts can serve a dual purpose:

Bot simulation

Stress test

However, k6 is frictionless and will work "out of the box".

4) Layered approach to Bot mitigation

a) Layer 1: Spring Security — anonymous request blocking
(lowest effort, highest impact)

b) Layer 2: Caddy — rate limiting + bot filtering at the edge
(before Spring even sees the request)

c) Layer 3: robots.txt (soft signal, respected by well-behaved bots)

d) Layer 4: Permission-based access (Artur's suggestion — most flexible)

4.1 Layer 1

The simplest way is to identify the most expensive APIs and mandate authentication for shortlisted APIs.

With Spring this is easy: in the Spring Security policy add .authenticated() for such endpoints.

APIs that trigger git clone and git merge are candidates.

4.2 Layer 2

Since Caddy is already our reverse proxy with forward_auth, we can add:

# Rate limiting for anonymous traffic @anonymous not header Authorization * @anonymous not header Cookie * rate_limit @anonymous 10r/m # Block known bot user agents @bots header_regexp User-Agent `(?i)(GPTBot|ClaudeBot|CCBot|Bytespider|SemrushBot|AhrefsBot)` respond @bots 403

This stops bots before they consume Spring Boot or Sztabina resources at all.

4.3 Layer 3 — robots.txt

Serve a robots.txt from Caddy directly blocking AI crawlers:

User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: * Disallow: /api/ Allow: /

This is a soft signal, respected only by well-behaved crawlers.

4.4 Layer 4 — Permission-based access

This is the existing ExternalUserPolicy / role system extended with a new dimension.

Instead of just authenticated vs anonymous, we gate by role.

Example:

@PreAuthorize("hasPermission(#projectId, 'Project', 'READ_DIFFS')") public DiffResponse getPullRequestDiff(...) { ... }

Roles like GUEST / COMMUNITY could be explicitly excluded from diff/search endpoints even if authenticated.

This is useful if we ever allow public read-only accounts but still want to protect expensive resources.

4.5 Layer 5 (Host Layer) — Using Host IDS (such as OSSEC)

OSSEC / Wazuh (OSSEC's modern fork) can help — it does log analysis, anomaly detection, and can trigger active responses (e.g. auto-ban an IP via iptables). But I think for now this may be an overkill in Sztab's context.

Known limitations

Authenticated scenario uses a single shared session cookie across all 20 VUs. Real bot farms distribute load across multiple accounts/sessions. A more realistic simulation would create 5-10 bot accounts and distribute cookies among VUs — deferred to a later iteration.

Previous Value	Current Value
Open	In Progress

rk@tigase.net commented 4 months ago

rksuma@Ramakrishnans-MacBook-Pro sztab % git checkout -b feature/SZ-73-Protection-from-AI-bots-and-crawlers 
Switched to a new branch 'feature/SZ-73-Protection-from-AI-bots-and-crawlers'
rksuma@Ramakrishnans-MacBook-Pro sztab %

rk@tigase.net commented 4 months ago

I have assumed that the Bots/crawlers can cause performance issues alone by exhausting resources.

But bots can also attempt privilege escalation. Hence this issue is in part about security posture as well.

Data harvesting is another risk: A crawler indexing all the issues, PRs, comments, and code — even if read-only, this is a confidentiality problem for private projects and can be used for competitor intelligence gathering.

Please let me know if we should treat this as a performance issue alone in this rev.
rk@tigase.net commented 4 months ago

Monitoring tool

Phase 1 (immediate) — kubectl top for CPU/memory across the three pods during stress tests. Free, zero setup, good enough to establish baseline.

Phase 2 (proper) — add node_exporter to the EC2 node for disk I/O, feed into Grafana alongside Caddy metrics. Full picture.

rk@tigase.net commented 4 months ago

SZ-73 Bot Protection — Baseline Measurements

Purpose

Establish pre-mitigation resource usage baseline on staging, before any bot protection layers are applied. These numbers will be used to validate the effectiveness of each mitigation layer as it is implemented.

Environment

Cluster: k3s on AWS EC2 (us-west-2)
Host: ec2-35-87-145-56.us-west-2.compute.amazonaws.com
Namespace: sztab-staging
Image tag: sz73-bot-protection (rebased on wolnosc, no SZ-73 changes applied yet)
Date: 2026-03-12

Idle Baseline (no load)

Captured via kubectl top pods -n sztab-staging with no active traffic.

Pod	CPU (cores)	Memory
sztab-backend	5m	369Mi
sztab-db	4m	46Mi
sztabina	1m	1Mi
caddy	1m	10Mi
sztab-ui	1m	2Mi

Notes:

sztab-backend memory at 369Mi reflects normal Spring Boot JVM baseline (expected)
sztabina and caddy are effectively idle
sztab-db at 4m CPU reflects background PostgreSQL activity only

Bot Stress Baseline (under simulated load)

TODO: Run k6 stress test simulating anonymous bot traffic against expensive endpoints. Capture CPU and memory spike for sztab-backend, sztabina, and sztab-db.

Target Endpoints

Endpoint	Why Expensive
`GET /api/projects/{id}/pulls/{id}/diff`	Triggers git diff via Sztabina
`GET /api/projects/{id}/issues?q=...`	DSL query, DB-heavy
`GET /api/projects/{id}/files/{branch}`	Git tree traversal via Sztabina

k6 Test Parameters

Virtual users: TBD
Duration: TBD
User-Agent: GPTBot/1.0 (simulates AI crawler)
Auth: none (anonymous)

Results

TODO: Fill in after k6 run.

Pod	CPU (cores)	Memory	Delta vs Idle
sztab-backend	-	-	-
sztab-db	-	-	-
sztabina	-	-	-

Post-Mitigation Measurements

TODO: Re-run same k6 test after each layer is applied and record results here.

Layer	Description	Backend CPU	Sztabina CPU	Notes
Layer 1	Spring Security `.authenticated()`	-	-	-
Layer 2	Caddy rate limiting + bot UA blocking	-	-	-
Layer 3	robots.txt	-	-	soft signal only
Layer 4	Permission-based access (role gating)	-	-	-

rk@tigase.net commented 4 months ago
Next step: install k6 on my laptop:

rksuma@Ramakrishnans-MacBook-Pro sztab % brew install k6 //... rksuma@Ramakrishnans-MacBook-Pro sztab % k6 version k6 v1.6.1 (commit/devel, go1.26.0, darwin/arm64) rksuma@Ramakrishnans-MacBook-Pro sztab %

Now, I'll write a Typescript script targeting the three expensive endpoints with a GPTBot user agent, no auth, and enough virtual users to actually stress the backend.
rk@tigase.net referenced from other issue 4 months ago

PAT Authentication for REST API (SZ-117)

Closed

rk@tigase.net commented 4 months ago

Results of Layer 1 testing after locking down all expensive methods with .authrequired() => (please disregard the spurious error at the end in deleting the test project)

Essentially since the Bot does not authenticate itself, it runs into http/403 for all hits and hence makes no difference to the resource usage of Sztab.


rksuma@Ramakrishnans-MacBook-Pro sztab % ADMIN_USER=admin ADMIN_PASSWORD=SztabStagingAdmin! ./scripts/stress-test/k6/run-stress-test.sh
[INFO]  === SZ-73 Bot Stress Test ===
[INFO]  Base URL:    http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com
[INFO]  Namespace:   sztab-staging
[INFO]  VUs:         50
[INFO]  Duration:    60s
[INFO]  --- Step 1: Login ---
[INFO]  Login successful.
[INFO]  Logged in as user id=1
[INFO]  --- Step 2: Create Sztab project ---
[INFO]  Project 'SZ73 Stress Test' already exists — looking up existing project...
[INFO]  Found existing project: id=16
[INFO]  --- Step 3: Create issue ---
[INFO]  Issue created: id=3
[INFO]  --- Step 4: Create pull request ---
[INFO]  Pull request created: id=3
[INFO]  --- Step 5: Baseline pod metrics (idle) ---
NAME                            CPU(cores)   MEMORY(bytes)   
caddy-847774bbf9-xzvnv          1m           12Mi            
sztab-backend-644c77d58-r46xd   2m           432Mi           
sztab-db-fb967c9d5-fs84w        2m           44Mi            
sztab-ui-57764ffc4f-r9hlg       1m           3Mi             
sztabina-65b5cff756-kzl4f       1m           3Mi             
[INFO]  --- Step 6: Run k6 stress test ---
[INFO]  Watch pod metrics in another terminal: kubectl top pods -n sztab-staging --watch

         /\      Grafana   /‾‾/  
    /\  /  \     |\  __   /  /   
   /  \/    \    | |/ /  /   ‾‾\ 
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/ 


     execution: local
        script: /Users/rksuma/tigase/sztab/scripts/stress-test/k6/bot-stress-test.ts
        output: -

     scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop):
              * default: 50 looping VUs for 1m0s (gracefulStop: 30s)



  █ THRESHOLDS 

    http_req_duration
    ✓ 'p(95)<5000' p(95)=134.53ms


  █ TOTAL RESULTS 

    checks_total.......: 69856  1161.339279/s
    checks_succeeded...: 25.00% 17464 out of 69856
    checks_failed......: 75.00% 52392 out of 69856

    ✗ status is 200 (unprotected)
      ↳  0% — ✓ 0 / ✗ 17464
    ✗ status is 401 (auth required)
      ↳  0% — ✓ 0 / ✗ 17464
    ✓ status is 403 (bot blocked)
    ✗ status is 429 (rate limited)
      ↳  0% — ✓ 0 / ✗ 17464

    HTTP
    http_req_duration....: avg=71.19ms  min=29.65ms  med=55.02ms  max=422.05ms p(90)=124.52ms p(95)=134.53ms
    http_req_failed......: 100.00% 17464 out of 17464
    http_reqs............: 17464   290.33482/s

    EXECUTION
    iteration_duration...: avg=172.15ms min=130.26ms med=155.68ms max=522.51ms p(90)=225.17ms p(95)=235.3ms 
    iterations...........: 17464   290.33482/s
    vus..................: 50      min=50             max=50
    vus_max..............: 50      min=50             max=50

    NETWORK
    data_received........: 7.8 MB  129 kB/s
    data_sent............: 2.3 MB  38 kB/s




running (1m00.2s), 00/50 VUs, 17464 complete and 0 interrupted iterations
default ✓ [======================================] 50 VUs  1m0s
[INFO]  --- Step 7: Pod metrics (post-stress) ---
NAME                            CPU(cores)   MEMORY(bytes)   
caddy-847774bbf9-xzvnv          99m          20Mi            
sztab-backend-644c77d58-r46xd   252m         440Mi           
sztab-db-fb967c9d5-fs84w        2m           45Mi            
sztab-ui-57764ffc4f-r9hlg       1m           3Mi             
sztabina-65b5cff756-kzl4f       1m           4Mi             
[INFO]  === Stress test complete. Teardown will run now. ===
[INFO]  --- Teardown ---
[INFO]  Deleting Sztab project 16...
[ERROR] Failed to delete project 16
[INFO]  Teardown complete.
rksuma@Ramakrishnans-MacBook-Pro sztab %

rk@tigase.net commented 4 months ago
Baseline stress test results (pre-protection, 2026-03-14)

Ran k6 stress test against staging (ec2-35-87-145-56.us-west-2.compute.amazonaws.com) with 50 VUs for 60s — 30 unauthenticated (anonymous bot simulation) and 20 authenticated (bot with DEVELOPER role, hitting issues/PR/branch endpoints).

Throughput: 279 req/s

Pod metrics (idle → under load)

Pod CPU idle CPU load Memory idle Memory load

sztab-backend 2m 370m 443Mi 544Mi

sztab-db 4m 137m 46Mi 77Mi

caddy 1m 117m 23Mi 23Mi

sztabina 1m 1m 2Mi 2Mi

Observations

Unauthenticated requests: 100% returning 403 -- Layer 1 (Spring Security) blocking all anonymous traffic correctly.

Authenticated requests: 100% returning 200 -- DEVELOPER role has correct read access.

Backend CPU peaks at 370m under load -- this is the baseline to beat after Caddy rate limiting is applied.

DB CPU peaks at 137m -- issue/PR list queries are the likely driver.

Sztabina unaffected -- git ops not triggered by read-only REST traffic.

Known limitations

Authenticated scenario uses a single shared session cookie across all 20 VUs. Real bot farms distribute load across multiple accounts/sessions. A more realistic simulation would create 5-10 bot accounts and distribute cookies among VUs -- deferred to a later iteration.

Next steps

Implement Layer 2 (Caddy rate limiting) and re-run to measure impact.
rk@tigase.net commented 4 months ago
Layer 2: Caddy-level rate limiting and bot blocking

Rejection is now pushed upstream to the reverse proxy, before requests ever reach the JVM. I added two defenses to the Caddyfile:

UA blocklist -- known well-behaved AI crawlers (GPTBot, ClaudeBot, CCBot, Bytespider, SemrushBot, AhrefsBot) are rejected with 403 at the proxy edge. Btw, this check is easily sidestepped: adversarial scrapers that spoof their user agent will bypass this, which is why rate limiting is the primary defense.

Anonymous rate limiting -- unauthenticated traffic is capped at 30 requests/min per IP. Authenticated users (identified by session cookie or API token) are exempt. At 30 r/m, a human browsing casually has ample headroom; a bot hammering endpoints hits the ceiling immediately.

To support this, I built a custom Caddy image with the rate limiting plugin baked in, pinned to v2.8.4 for reproducibility. The next stress test run will measure how much backend CPU drops as a result.
rk@tigase.net commented 4 months ago
Layer 2 stress test results (Caddy rate limiting, 2026-03-14)

Setup

Same test as baseline: 50 VUs for 60s, 30 unauthenticated and 20 authenticated (DEVELOPER role). Rate limiting applied to anonymous traffic only (30 r/min per IP).

Pod metrics (idle => under load)

Pod CPU idle CPU load Memory idle Memory load

sztab-backend 2m 174m 443Mi 542Mi

sztab-db 4m 147m 46Mi 77Mi

caddy 1m 102m 12Mi 17Mi

sztabina 1m 1m 2Mi 2Mi

Comparison vs baseline (Layer 1 only)

Pod Layer 1 Layer 2 Change

sztab-backend 370m 174m -53%

sztab-db 137m 147m ~flat (noise)

caddy 117m 102m -13%

Observations

Backend CPU dropped by 53% -- anonymous bot traffic is now absorbed by Caddy before requests reach the JVM. The JVM no longer wakes up, allocates objects, or runs the filter chain for unauthenticated requests that exceed the rate limit.

DB CPU is flat -- authenticated queries still run as expected. The reduction in backend CPU is entirely from eliminating the unauthenticated filter chain overhead.

Caddy CPU is slightly lower too -- the rate limit decision short-circuits before the upstream proxy step, so Caddy does less work per rejected request than it did forwarding 403s from the backend.

Memory is stable across both scenarios -- no sign of heap pressure or GC storms under load.

Next steps

Layer 3 (robots.txt) and Layer 4 (permission-based access gating) to follow.

Pod	CPU idle	CPU load	Memory idle	Memory load
sztab-backend	2m	370m	443Mi	544Mi
sztab-db	4m	137m	46Mi	77Mi
caddy	1m	117m	23Mi	23Mi
sztabina	1m	1m	2Mi	2Mi

Pod	CPU idle	CPU load	Memory idle	Memory load
sztab-backend	2m	174m	443Mi	542Mi
sztab-db	4m	147m	46Mi	77Mi
caddy	1m	102m	12Mi	17Mi
sztabina	1m	1m	2Mi	2Mi

Pod	Layer 1	Layer 2	Change
sztab-backend	370m	174m	-53%
sztab-db	137m	147m	~flat (noise)
caddy	117m	102m	-13%

rk@tigase.net commented 4 months ago

Layer 3 Bot mitigation using robots.txt.

Test Results:

rksuma@Ramakrishnans-MacBook-Pro sztab % 
rksuma@Ramakrishnans-MacBook-Pro sztab % helm upgrade sztab deploy/helm/sztab -f deploy/helm/sztab/values-staging.yaml -n sztab-staging
Release "sztab" has been upgraded. Happy Helming!
NAME: sztab
LAST DEPLOYED: Sun Mar 15 10:39:36 2026
NAMESPACE: sztab-staging
STATUS: deployed
REVISION: 25
TEST SUITE: None
rksuma@Ramakrishnans-MacBook-Pro sztab % kubectl rollout restart deployment/caddy -n sztab-staging
deployment.apps/caddy restarted
Waiting for deployment "caddy" rollout to finish: 1 old replicas are pending termination...
rksuma@Ramakrishnans-MacBook-Pro sztab %  kubectl rollout status deployment/caddy -n sztab-staging
Waiting for deployment "caddy" rollout to finish: 1 old replicas are pending termination...
deployment "caddy" successfully rolled out
rksuma@Ramakrishnans-MacBook-Pro sztab % kubectl get pods -n sztab-staging -w                                                          
NAME                            READY   STATUS    RESTARTS   AGE
caddy-6fbc5697cd-ll92p          1/1     Running   0          13s
sztab-backend-644c77d58-r46xd   1/1     Running   0          41h
sztab-db-fb967c9d5-fs84w        1/1     Running   0          18d
sztab-ui-57764ffc4f-r9hlg       1/1     Running   0          3d12h
sztabina-65b5cff756-kzl4f       1/1     Running   0          42h
^C%                     


### **Verify Caddy serves the robots.txt and the sitemap.xml**:                                                                                                                                          rksuma@Ramakrishnans-MacBook-Pro sztab % curl -s http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/robots.txt
curl -s http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/sitemap.xml
User-agent: *
Disallow: /api/
Disallow: /git/

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: PetalBot
Disallow: /

Sitemap: https://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/</loc>
    <lastmod>2026-03-14</lastmod>
  </url>
  <url>
    <loc>https://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/docs</loc>
    <lastmod>2026-03-14</lastmod>
  </url>
</urlset>%                                                                                                                                                        rksuma@Ramakrishnans-MacBook-Pro sztab % curl -s http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/</loc>
    <lastmod>2026-03-14</lastmod>
  </url>
  <url>
    <loc>https://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/docs</loc>
    <lastmod>2026-03-14</lastmod>
  </url>
</urlset>%                                                                                                                                                        rksuma@Ramakrishnans-MacBook-Pro sztab %

rk@tigase.net commented 4 months ago

Layer 4: Permission-based access gating — design rationale

Rate limiting (Layer 2) stops anonymous bots. A determined attacker creates an account and bypasses it. Blocking by INTERNAL/EXTERNAL user type is also insufficient — attacks can come from compromised or low-privilege internal accounts.

Layer 4 gates expensive endpoints (PR detail, diffs, branch list) by role:

Tier	Roles	Access
Light read	OBSERVER, CUSTOMER_SUPPORT	Issues list, basic project info
Full read	DEVELOPER, QA_ENGINEER, DOCUMENT_WRITER, UX_DESIGNER, SCRUM_MASTER	+ PR detail, branch list, diffs
Write	PROJECT_MANAGER, RELEASE_MANAGER	+ create/update
Admin	ADMIN	Everything

Implementation: extend ExternalUserPolicy with requireRole(auth, RoleName...) and apply it at the controller layer. Boundaries are a starting point — raise concerns on this ticket if adjustments are needed.

rk@tigase.net commented 4 months ago
SZ-73 – Layer 4 Bot Mitigation Performance Experiment

Objective

Evaluate the performance impact of removing per-request database lookups for user and role resolution in the authorization policy layer.

Previously, the security policy resolved the user type by querying the database via UserService.getUserByUsername() for each incoming request. Under bot load, this caused the backend to issue frequent database lookups despite the fact that the authenticated user's authorities are already present in the Spring Security Authentication object.

The change introduced in this experiment eliminates the database lookup from the request hot path and instead relies solely on authorities stored in the SecurityContext.

The goal is to verify the impact of this change under synthetic bot traffic.

Change Introduced

Previous behavior:

request | v ExternalUserPolicy.resolveType() | v UserService.getUserByUsername() | v database lookup

New behavior:

request | v SecurityContextHolder | v Authentication.getAuthorities() | v policy enforcement (no DB access)

The authorization aspects (RequireRoleAspect, RequireInternalAspect) now operate purely on the Authentication authorities.

Image tested:

tigase.dev/sztab/sztab-backend:sz73-bot-protection-v8

Test Environment

Cluster: k3s staging cluster
namespace: sztab-staging

Deployment topology:

caddy sztab-backend (1 replica) sztab-db sztab-ui sztabina

Load Generation

Load was generated using the project k6 stress test script:

scripts/stress-test/k6/run-stress-test.sh

Configuration:

50 virtual users 20 authenticated 30 unauthenticated duration: 60 seconds

Traffic profile targets endpoints typically accessed by bot crawlers.

k6 Results

http_req_duration:

avg = 71.7 ms min = 27.68 ms median = 53.23 ms max = 655.92 ms p90 = 126.02 ms p95 = 142 ms

http_req_failed:

63.73% (11093 / 17405)

http_reqs:

17405 requests ≈ 289 requests/sec

*iteration_duration:

avg = 172.51 ms p95 = 243.16 ms

Failure rate is expected because bot mitigation intentionally rejects a large fraction of unauthenticated traffic.

Pod Metrics (Post-Stress)

NAME CPU MEMORY -------------------------------- caddy 105m 19Mi sztab-backend 503m 452Mi sztab-db 126m 74Mi sztab-ui 1m 3Mi sztabina 1m 2Mi

Comparison with Previous Run

Previous experiment (before removing DB lookups):

sztab-backend CPU ≈ 808m
sztab-db CPU ≈ 165m

After optimization:

sztab-backend CPU ≈ 503m
sztab-db CPU ≈ 126m

Observed improvements:

backend CPU reduction ≈ 37–40%
database CPU reduction ≈ 24%

Latency also improved:

previous avg latency ≈ 175 ms
new avg latency ≈ 71 ms

Note: the latency comparison should be considered indicative rather than strictly controlled. The earlier measurement and this run were conducted under slightly different runtime conditions, and the earlier measurement included the overhead of the per-request database lookup. While the improvement is directionally consistent with the removal of that lookup, the latency values should not be interpreted as a controlled A/B benchmark.

Interpretation

The previous design performed a database lookup on each request to determine user type. Under bot traffic (~290 req/s), this resulted in frequent database access for information that was already present in the authenticated security context.

By removing the database dependency from the hot path and relying on Authentication.getAuthorities() instead, the system now performs role checks purely in memory.

This change produced measurable improvements in:

backend CPU utilization

database load

request latency

Importantly, bot mitigation behavior remained unchanged.

Conclusion

Removing per-request user lookups from the authorization policy significantly improved system efficiency under bot traffic.

At approximately 290 requests/sec:

backend CPU dropped from ~808m to ~503m
database CPU dropped from ~165m to ~126m
average request latency dropped from ~175ms to ~71ms

This confirms that the security policy layer should operate exclusively on data already present in the SecurityContext and avoid database access in the request hot path.

This optimization improves the resilience of the system when subjected to high volumes of bot or crawler traffic.
rk@tigase.net commented 4 months ago
Next: Git-level bot mitigation

The REST API and proxy layers are now protected. The remaining attack surface is the Git endpoint (/git/*), which is proxied directly to Sztabina.

A determined bot that obtains a valid PAT (or leverages anonymous access on a public project) can repeatedly issue clone, fetch, and diff operations. These are significantly more expensive than REST requests, as they trigger disk I/O and traversal of the git object graph.

This makes the Git surface a high-cost amplification vector compared to the API layer.

Planned mitigations:

Rate limiting on /git/* at the Caddy layer, independent from REST limits.
Git operations are more expensive, so thresholds should be lower (e.g. 5–10 requests/min per IP).

PAT-scoped rate limiting to track usage per token rather than per IP.
This helps mitigate bots that rotate source IPs while reusing credentials.

Sztabina-level request budgeting to enforce limits within the service itself, ensuring protection even if edge-layer controls are bypassed or misconfigured.

Before implementing these controls, SZ-78 will establish a baseline by stress testing git diff and related endpoints. This will measure CPU and I/O impact on Sztabina under load, using the same methodology previously applied to the REST API layer.

rk@tigase.net commented 4 months ago

rksuma@Ramakrishnans-MacBook-Pro sztab % git checkout wolnosc

Already on 'wolnosc'
Your branch is up to date with 'origin/wolnosc'.
rksuma@Ramakrishnans-MacBook-Pro sztab % git pull origin wolnosc

From https://tigase.dev/sztab
 * branch            wolnosc    -> FETCH_HEAD
Already up to date.
rksuma@Ramakrishnans-MacBook-Pro sztab % git checkout -b feature/SZ-73-git-bot-mitigation
Switched to a new branch 'feature/SZ-73-git-bot-mitigation'
rksuma@Ramakrishnans-MacBook-Pro sztab %

rk@tigase.net commented 4 months ago

SZ-73 Work Log

Summary

Implemented a four-layer bot mitigation strategy across the HTTP stack (Spring Security, Caddy edge controls, crawler directives, and AOP-based authorization). Established load-testing infrastructure (k6) and baseline measurements to quantify impact. Identified Git endpoints as the remaining high-cost attack surface, to be addressed next.

Total effort: ~27h

SZ-77 (blocker, fixed first)

Bug: ProjectService infers repo type from gitUrl presence instead of repoType field

Identified root cause: isExternalRepo = gitUrl != null ignored repoType field entirely
Added RepoType parameter to ProjectService.createProject() interface and impl
Updated ProjectController to pass dto.effectiveRepoType()
Removed deprecated createProject(Project) overload
Updated tests — 297 passing
Branch: bugfix/SZ-77-repoType-inference → merged to wolnosc
Estimate: 2h

SZ-73 Layer 1: Spring Security audit

Confirmed .anyRequest().authenticated() already in place
Identified actuator and Swagger exposure as follow-up items
Estimate: 0.5h

SZ-73 Layer 2: Caddy rate limiting and UA blocklist

Custom Caddy image

Wrote deploy/helm/sztab/caddy/Dockerfile with xcaddy + caddy-ratelimit plugin
Pinned to caddy:2.8.4, added build-time module verification
Built multi-platform image (linux/amd64, linux/arm64)
Updated values.yaml, values-staging.yaml, Helm template for new image
Fixed imagePullPolicy: Always in Helm template to avoid stale image cache

Caddyfile

Added @ai_bots UA blocklist (GPTBot, ClaudeBot, CCBot, Bytespider, SemrushBot, AhrefsBot, Amazonbot, PetalBot)
Added @anonymous rate limit zone: 30 r/min per {remote_ip}
Used header_regexp for JSESSIONID matching (handles multiple cookies correctly)
Moved Caddyfile to deploy/helm/sztab/caddy/Caddyfile
Updated Helm ConfigMap template path
Updated build-release.sh Caddyfile source path

Staging deployment issues resolved

k3s image cache — added imagePullPolicy: Always
ConfigMap empty — fixed .Files.Get path in Helm template
Multiple Caddy CrashLoopBackOff cycles debugged

Estimate: 6h

SZ-73 Layer 3: robots.txt and sitemap.xml

Added handle /robots.txt directly in Caddyfile with per-agent rules
Added handle /sitemap.xml with {env.SZTAB_DOMAIN} placeholder
Used SZTAB_DOMAIN env var (already wired from sztab.domain in Helm values)
Verified both endpoints return correct domain substitution on staging
Updated docker-compose.yml with Caddyfile path and custom image
Estimate: 1.5h

Load testing infrastructure

k6 script (bot-stress-test.ts)

Two scenarios: unauthenticated_bots (30 VUs), authenticated_bots (20 VUs)
Unauthenticated: hits public project/issue/PR list endpoints
Authenticated: hits issues, PR detail, branch list with bot session cookie
Named scenario exports with exec field
Fixed endpoint bugs: api/projects/{id}/issues → api/issues?projectName=, branches path

Runner script (run-stress-test.sh)

Mac-compatible curl_api helper (no head -n -1)
Admin login → fetch user ID → create bot user → assign DEVELOPER role → bot login
Project creation (repoType: LOCAL) → issue → PR
kubectl top before and after k6 run
Teardown: delete all PRs by project → delete feature branch → delete project → delete bot user
Multiple teardown fixes: PR FK constraint, branch FK constraint, bulk PR deletion

Debugging cycles

Mac shell incompatibilities (head -n -1, local -n nameref)
Sztabina 409 handling in SztabinaClient.createRepository()
Sztabina returning text/plain → fixed in Go handler with util.EncodeJSON
RepositoryResponse NPE on null return from 409 handler
k6 named scenario exec field missing
Wrong issues endpoint, branch endpoint literal not interpolated

Estimate: 8h

Baseline measurements

Layer	Backend CPU	DB CPU
Idle	2m	4m
Layer 1 only	370m	137m
Layer 2 (Caddy RL)	174m	137m
Layer 4 v1 (DB lookup)	808m	165m
Layer 4 v2 (auth cache)	503m	126m

Estimate: 1.5h

SZ-73 Layer 4: Permission-based access gating

Design

Defined two access tiers: LIGHT_READ (all roles) and FULL_READ (DEVELOPER+)
Decided against UserType as primary boundary — insider threat applies equally

Implementation

AccessTier enum in com.sztab.policy.security.enums
@RequireRole(AccessTier) annotation in com.sztab.annotations.security
@RequireInternal annotation in com.sztab.annotations.security
RequireRoleAspect and RequireInternalAspect in com.sztab.policy.security.aspect
Pre-allocated RoleName[] arrays in aspect (no per-request allocation)
Defensive auth null check in both aspects
Applied annotations across IssueController, BranchController, PullRequestController
Fixed User.hasRole() bug: enum vs String comparison always returned false
Added UserTest with guard comment explaining the trap

Performance optimization

Initial implementation caused extra getUserByUsername() DB call per request → 808m CPU
Fixed: resolved roles from Authentication.getAuthorities() — no DB access in hot path
Updated CustomUserDetailsService to include USERTYPE_INTERNAL/EXTERNAL as authority
Updated ExternalUserPolicy to use authorities — removed UserService dependency

Estimate: 6h

Documentation

Ticket comments: baseline results, Layer 2 results, Layer 4 results, design rationale
Team update sent to Artur
Work log

Estimate: 1.5h

Total

Area	Hours
SZ-77 blocker fix	2h
Layer 1 audit	0.5h
Layer 2 (Caddy image + Caddyfile)	6h
Layer 3 (robots.txt + sitemap.xml)	1.5h
Load testing infrastructure	8h
Baseline measurements + analysis	1.5h
Layer 4 (AOP role gating + perf optimization)	6h
Documentation	1.5h
Total	27h

rk@tigase.net commented 4 months ago

A new problem seen when subjecting Sztab to large repos.

Sztab and Sztabina are interfaced using Spring WebFlux.

 { Sztabina }  ==> { Sztab }

When computing diff, Sztabina pipes the computed diff to Sztab thru WebFlux buffer. By default the buffer size is 256Kb. This was insufficient for the test I ran (I created a large repo).

This caused the default WebFlux buffer to overflow; when this happens, Sztabina is unaware of the issue and shows no error in logs. It's the consumer (Sztab) that fails but even there the exception happens inside SPring Flux leading to a cryptic exception in the logs:

rksuma@Ramakrishnans-MacBook-Pro sztab %
rksuma@Ramakrishnans-MacBook-Pro sztab % kubectl logs -n sztab-staging deployment/sztab-backend --tail=200 | grep -B2 -A10 "diff-by-url" | head -40
Defaulted container "sztab-backend" out of: sztab-backend, wait-for-db (init)
        Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
        *__checkpoint ⇢ Body from POST http://sztabina:8085/repos/compare/diff-by-url [DefaultClientResponse]
Original Stack Trace:
                at org.springframework.core.io.buffer.LimitedDataBufferList.raiseLimitException(LimitedDataBufferList.java:99) ~[spring-core-6.1.13.jar!/:6.1.13]
                at org.springframework.core.io.buffer.LimitedDataBufferList.updateCount(LimitedDataBufferList.java:92) ~[spring-core-6.1.13.jar!/:6.1.13]
                at org.springframework.core.io.buffer.LimitedDataBufferList.add(LimitedDataBufferList.java:58) ~[spring-core-6.1.13.jar!/:6.1.13]
                at reactor.core.publisher.MonoCollect$CollectSubscriber.onNext(MonoCollect.java:103) ~[reactor-core-3.6.10.jar!/:3.6.10]
                at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122) ~[reactor-core-3.6.10.jar!/:3.6.10]
                at reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:200) ~[reactor-core-3.6.10.jar!/:3.6.10]
                at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122) ~[reactor-core-3.6.10.jar!/:3.6.10]
                at reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:379) ~[reactor-netty-core-1.1.22.jar!/:1.1.22]
                at reactor.netty.channel.ChannelOperations.onInboundNext(ChannelOperations.java:425) ~[reactor-netty-core-1.1.22.jar!/:1.1.22]

FIx:

Increase the WebFlux cache size is 16MB to support large diffs.

We can't fully know in advance because diff size depends on user content. The right approach is a generous configurable default, a specific caught exception with a clear error message, and monitoring.

Artur Hefczyc commented 4 months ago

Good catch. Yes, this is exactly why I am in favor of running tests against real world data. Testing on our largest repos is a good start but maybe this is not enough. How about testing the system against really big repos available out there. Like Linux repo?

I mean, how do you know that the cache size 16MB is enough? Is there a limit to what we can handle?
rk@tigase.net referenced from other issue 4 months ago

WebClient in-memory buffer limit causes diff failures for large PRs (SZ-119)

Closed
rk@tigase.net commented 4 months ago

Good catch. Yes, this is exactly why I am in favor of running tests against real world data. Testing on our largest repos is a good start but maybe this is not enough. How about testing the system against really big repos available out there. Like Linux repo?

I mean, how do you know that the cache size 16MB is enough? Is there a limit to what we can handle?

16MB covers the overwhelming majority of real-world engineering team PRs based on first-principles sizing (500 files × 200 lines × 2 sides × 100 bytes ≈ 20MB worst case). It's a pragmatic default for the target audience.

But for Linux-scale repos the answer is not a bigger buffer — it's streaming.

bodyToMono() forces full in-memory buffering by design. The correct fix at that scale is to stream the diff response directly from Sztabina to the client using bodyToFlux() or server-sent events, bypassing the buffer entirely. That's a more significant architectural change and I am tracking that as a future improvement.

Testing against real large repos like Linux is a good idea for stress testing the git engine (SZ-78), but it will surface the streaming limitation as much as the buffer size. I have opened https://tigase.dev/sztab/~issues/120 to track this.
rk@tigase.net referenced from other issue 4 months ago

Task: Validate diff handling against large public repositories (SZ-120)

Closed

rk@tigase.net commented 4 months ago

SZ-78 Baseline: Sztabina diff endpoint under bot load (2026-03-17)

Setup

50 VUs for 60s — 30 unauthenticated (anonymous bot simulation) and 20 authenticated (DEVELOPER role). Authenticated scenario includes GET /api/pullrequests/29/detail which triggers a real git diff computation in Sztabina against the TESTSZTAB repo (20 files, ~276KB unified diff).

Pod metrics (idle → under load)

Pod	CPU idle	CPU load	Memory idle	Memory load
sztab-backend	2m	294m	443Mi	462Mi
sztab-db	4m	36m	46Mi	75Mi
caddy	1m	66m	12Mi	19Mi
sztabina	1m	497m	2Mi	35Mi

Observations

Sztabina is now the bottleneck — 497m CPU under load, exceeding the backend (294m). Previous runs showed Sztabina at 1m because no real diff work was being done. This confirms git diff computation is CPU-intensive.
DB CPU dropped to 36m — down from 126m in the Layer 4 baseline. Role lookups are now resolved from Spring Security authorities (SZ-79 fix), eliminating per-request DB hits.
106MB total data received during test run — confirms that diff payloads are flowing end-to-end and the 16MB buffer fix is not prematurely truncating responses.
p95 authenticated latency: 2.3s This reflects CPU-bound git diff computation under 20 concurrent requests. Latency is expected to scale with diff size and concurrency; mitigation should focus on limiting concurrent diff execution rather than optimizing JVM paths.
86.5% request failure rate — expected and desired. The majority of unauthenticated bot traffic is intentionally rejected (429 rate limiting at Caddy, 403 at Spring Security). This indicates mitigation layers are actively protecting backend resources.

Comparison with previous baselines

Scenario	Backend CPU	DB CPU	Sztabina CPU	Notes
Layer 1 only	370m	137m	1m	No real diff work
Layer 2 (Caddy RL)	174m	137m	1m	No real diff work
Layer 4 (auth cache)	503m	126m	1m	No real diff work
SZ-78 (real diffs)	294m	36m	497m	Real git diff load

Key finding

Git diff computation is CPU-bound and shifts the system bottleneck from the JVM to Sztabina. Under concurrent load, Sztabina saturates (~500m CPU) before backend or DB resources become constrained.

This establishes git diff execution as the dominant cost center in the system and justifies prioritizing rate limiting and concurrency control for diff endpoints.

🔴 Important: The system is not I/O-bound or DB-bound under load; it is compute-bound on git operations. All further scaling and mitigation decisions should be evaluated against this constraint.

Next steps

Implement git-level rate limiting at Caddy (/git/* endpoints)
Test against larger diffs (Linux kernel scale) per SZ-79 task
Monitor Sztabina CPU in production — consider horizontal scaling if diff load grows beyond single-pod capacity

rk@tigase.net commented 4 months ago
Git endpoint rate limiting policy

Git operations (clone, fetch, diff) are significantly more expensive than REST requests — each one triggers disk I/O and git object graph traversal in Sztabina. Unlike REST endpoints, git operations are stateless from the client's perspective, meaning a bot with a valid PAT can hammer the same repo repeatedly without any server-side memory of prior requests.

Two rate limit zones are applied at the Caddy layer:

Anonymous git (no Authorization header): 5 requests/min per IP. Public repo access is permitted but tightly throttled. A legitimate user cloning a repo once is unaffected; a crawler hitting the endpoint repeatedly is blocked immediately.

Authenticated git (valid PAT): 30 requests/min per IP. Generous enough for CI pipelines and active developer workflows, tight enough to prevent a compromised or bot-controlled PAT from saturating Sztabina under repeated clone/fetch load.

PAT authentication proves identity but does not limit volume. Rate limiting at the proxy layer is the correct control for volume — it applies regardless of whether the requestor is human or automated, internal or external.

rk@tigase.net commented 4 months ago

Git rate limiting directives in Caddy:

# ------------------------------
    # Rate limiting — anonymous REST traffic only (Layer 2b)
    # Authenticated requests (JSESSIONID cookie or Authorization header)
    # bypass this — they are governed by Layer 1 (Spring Security) and
    # Layer 4 (permission-based access).
    # 30 events/min per IP gives legitimate anonymous browsers ample
    # headroom while decisively blocking crawlers.
    # {remote_ip} is used as the rate limit key — cheaper than {remote_host}
    # which would trigger a reverse DNS lookup on every request.
    #
    # NOTE: /git/* is explicitly excluded here to ensure git traffic is
    # governed solely by the git-specific rate limit zones below.
    # This makes the zones mutually exclusive and order-independent.
    # ------------------------------
    @anonymous {
        not path /git/*
        not header_regexp Cookie JSESSIONID
        not header Authorization *
    }
    rate_limit @anonymous {
        zone anonymous_zone {
            key    {remote_ip}
            events 30
            window 1m
        }
    }

    # ------------------------------
    # Rate limiting — anonymous git traffic (Layer 2c)
    # Git operations (clone, fetch, diff) are significantly more expensive
    # than REST requests — each triggers disk I/O and git object graph
    # traversal in Sztabina. Anonymous access to public repos is permitted
    # but tightly throttled.
    #
    # 10 r/min accounts for git clone burst behavior — a single clone
    # generates multiple HTTP requests (info/refs, pack negotiation, object
    # fetch). 5 r/min was too tight; 10 r/min blocks sustained crawling
    # while allowing legitimate one-time clones.
    #
    # Rate limit key is {remote_ip}{path} — scoped per IP per repository.
    # Git cost is repo-specific: cloning repo A is independent of cloning
    # repo B. A CI pipeline cloning multiple repos is not penalized the
    # same as a bot hammering a single repo repeatedly.
    #
    # IP-only key for anonymous traffic — no token available.
    # NAT/corporate network tradeoff accepted: anonymous git from a shared
    # IP is already a suspicious pattern.
    # ------------------------------
    @git_anonymous {
        path /git/*
        not header Authorization *
    }
    rate_limit @git_anonymous {
        zone git_anonymous_zone {
            key    {remote_ip}{path}
            events 10
            window 1m
        }
    }

    # ------------------------------
    # Rate limiting — authenticated git traffic (Layer 2d)
    # PAT authentication proves identity but does not limit volume.
    # A compromised or bot-controlled PAT can saturate Sztabina with
    # repeated clone/fetch operations. 30 r/min per IP per repo is
    # generous enough for CI pipelines and active developer workflows
    # while blocking bots.
    #
    # Rate limit key is {remote_ip}{path} — scoped per IP per repository.
    # A developer or CI pipeline working across multiple repos is not
    # penalized; a bot hammering a single repo is throttled.
    #
    # NOTE: Authorization header value (Base64-encoded Basic credentials)
    # is intentionally NOT used as part of the key. The same credentials
    # can produce different encodings across clients (whitespace, padding
    # variations), making the raw header an unreliable key. IP + path
    # is simpler, stable, and matches the actual cost model.
    #
    # Git HTTP protocol uses Authorization headers (Basic/PAT).
    # Session cookies (JSESSIONID) are not used by git clients and
    # are intentionally ignored here to avoid incorrect classification.
    # ------------------------------
    @git_authenticated {
        path /git/*
        header Authorization *
    }
    rate_limit @git_authenticated {
        zone git_authenticated_zone {
            key    {remote_ip}{path}
            events 30
            window 1m
        }
    }

Validation

Rate limiting can be verified manually by issuing repeated requests to the git info/refs endpoint and observing HTTP 429 responses once the configured thresholds are exceeded.

In addition, the existing k6 stress test will be extended to include git endpoints. Validation criteria:

Anonymous git traffic is throttled at ~10 r/min per IP per path
Authenticated git traffic is throttled at ~30 r/min per IP per path
Legitimate single clone/fetch operations complete without 429
Sztabina CPU usage decreases under bot load compared to baseline

rk@tigase.net commented 4 months ago

Test git Rate Limiting: Unauthenticated Git

rksuma@Ramakrishnans-MacBook-Pro sztab % for i in $(seq 1 15); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    "http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/git/TESTSZTAB.git/info/refs?service=git-upload-pack")
  echo "Request $i: $STATUS"
done
Request 1: 200
Request 2: 200
Request 3: 200
Request 4: 200
Request 5: 200
Request 6: 200
Request 7: 200
Request 8: 200
Request 9: 200
Request 10: 200
Request 11: 429
Request 12: 429
Request 13: 429
Request 14: 429
Request 15: 429
rksuma@Ramakrishnans-MacBook-Pro sztab %

Test git Rate Limiting: Authenticated Git

rksuma@Ramakrishnans-MacBook-Pro sztab % for i in $(seq 1 35); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    -H "Authorization: Basic $(echo -n 'admin:szt_6G9__eAmsumQr2C79F9c5ScAl5NkgwIySshIPE7v' | base64)" \
    "http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com/git/TESTSZTAB.git/info/refs?service=git-upload-pack")
  echo "Request $i: $STATUS"
done
Request 1: 200
Request 2: 200
Request 3: 200
Request 4: 200
Request 5: 200
Request 6: 200
Request 7: 200
Request 8: 200
Request 9: 200
Request 10: 200
Request 11: 200
Request 12: 200
Request 13: 200
Request 14: 200
Request 15: 200
Request 16: 200
Request 17: 200
Request 18: 200
Request 19: 200
Request 20: 200
Request 21: 200
Request 22: 200
Request 23: 200
Request 24: 200
Request 25: 200
Request 26: 200
Request 27: 200
Request 28: 200
Request 29: 200
Request 30: 200
Request 31: 429
Request 32: 429
Request 33: 429
Request 34: 429
Request 35: 429
rksuma@Ramakrishnans-MacBook-Pro sztab %

Summary:

- 30 consecutive requests → HTTP 200
- 31st request onward → HTTP 429

Confirms:

- authenticated git limit enforced at configured threshold
- Authorization-based classification working correctly
- no overlap with anonymous rate limit zone

rk@tigase.net commented 4 months ago

PR: https://tigase.dev/sztab/~pulls/17
rk@tigase.net changed state to 'Pending approval' 4 months ago
Previous Value Current Value
In Progress

Pending approval
rk@tigase.net referenced from other issue 4 months ago

Flexible permissions system (SZ-74)

In Progress
rk@tigase.net commented 4 months ago

Merged the changes into wolsonsc.
rk@tigase.net changed state to 'Closed' 4 months ago
Previous Value Current Value
Pending approval

Closed
Login to comment

Previous Value	Current Value
In Progress	Pending approval

Previous Value	Current Value
Pending approval	Closed

Type	New Feature
Priority	Normal
Assignee	rk@tigase.net
Version	none
Sprints	n/a
Customer	n/a

Issue Votes (0)

Watchers (2)

Reference

SZ-73

Bot Attack Surface Area

Description

1) Test Approach:

2) Identify tools to measure impact of Bot (CPU usage or I/O usage)

3) Identify tools to induce Bot-like stress

A) k6 — open source load testing tool

B) Java with Gatling

C) JMX

4) Layered approach to Bot mitigation

4.1 Layer 1

4.2 Layer 2

4.3 Layer 3 — robots.txt

4.4 Layer 4 — Permission-based access

4.5 Layer 5 (Host Layer) — Using Host IDS (such as OSSEC)

Known limitations

SZ-73 Bot Protection — Baseline Measurements

Purpose

Environment

Idle Baseline (no load)

Bot Stress Baseline (under simulated load)

Target Endpoints

k6 Test Parameters

Results

Post-Mitigation Measurements

Baseline stress test results (pre-protection, 2026-03-14)

Pod metrics (idle → under load)

Observations

Known limitations

Next steps

Layer 2 stress test results (Caddy rate limiting, 2026-03-14)

Setup

Pod metrics (idle => under load)

Comparison vs baseline (Layer 1 only)

Observations

Next steps

Layer 4: Permission-based access gating — design rationale

SZ-73 – Layer 4 Bot Mitigation Performance Experiment

Objective

Change Introduced

Test Environment

Load Generation

k6 Results

Pod Metrics (Post-Stress)

Comparison with Previous Run

Interpretation

Conclusion

Next: Git-level bot mitigation

SZ-73 Work Log

Summary

SZ-77 (blocker, fixed first)

SZ-73 Layer 1: Spring Security audit

SZ-73 Layer 2: Caddy rate limiting and UA blocklist

SZ-73 Layer 3: robots.txt and sitemap.xml

Load testing infrastructure

Baseline measurements

SZ-73 Layer 4: Permission-based access gating

Documentation

Total

SZ-78 Baseline: Sztabina diff endpoint under bot load (2026-03-17)

Setup

Pod metrics (idle → under load)

Observations

Comparison with previous baselines

Key finding

Next steps

Git endpoint rate limiting policy

Validation

Test git Rate Limiting: Unauthenticated Git

Test git Rate Limiting: Authenticated Git

Summary: