Task: Validate diff handling against large public repositories (SZ-120)
rk@tigase.net opened 2 weeks ago

As a follow-up to the 16MB buffer fix, validate that the system handles diffs from large real-world repositories without hitting the buffer limit or degrading performance unacceptably.

Test candidates

  • Linux kernel (https://github.com/torvalds/linux) — largest widely available public repo. Major release merges produce diffs of 100MB+. Use a PR between two release tags (e.g. v6.7..v6.8) to simulate a realistic large diff.
  • Chromium (https://chromium.googlesource.com/chromium/src) — large C++ codebase with frequent large refactors.
  • LLVM (https://github.com/llvm/llvm-project) — moderate scale, good for mid-range testing.

What to measure

  • Does the diff endpoint return a result or hit DataBufferLimitException?
  • At what diff size does the 16MB buffer become insufficient?
  • What is Sztabina CPU and memory usage when computing a 50MB+ diff?
  • What is end-to-end latency for large diff requests?

Expected outcome

16MB will be insufficient for Linux-scale diffs. The expected failure mode is a DataBufferLimitException with a clear error message (SZ-73 fix). This will confirm that streaming is required for that scale — see SZ-73 trade-offs section.

🔴 Important: The goal is not to support worst-case diffs (100MB+), but to understand the distribution of real-world diffs and choose a limit that covers typical usage while failing safely for outliers

Notes

  • This test requires cloning a large external repo into Sztabina — network and disk I/O will be significant on the staging EC2 instance.
  • Consider running against a local Sztabina instance first to avoid impacting staging.
  • Results feed directly into the streaming architecture decision.
  • rk@tigase.net commented 2 weeks ago
    TaskHours
    Clone Linux kernel into staging Sztabina and create test project1h
    Create PRs between release tags (v6.7..v6.8)0.5h
    Run diff/detail endpoints, capture results0.5h
    Analyze failure point — at what diff size does 16MB break1h
    Document findings, update SZ-79 with results0.5h
    Evaluate streaming approach if buffer fails1.5h
    Total5h
issue 1 of 1
Type
Task
Priority
Normal
Assignee
Version
1.10.0
Sprints
n/a
Customer
n/a
Issue Votes (0)
Watchers (3)
Reference
SZ-120
Please wait...
Page is in error, reload to recover