WebClient in-memory buffer limit causes diff failures for large PRs (SZ-119)
rk@tigase.net opened 2 weeks ago

Problem

SztabinaClient uses Spring WebFlux WebClient to call Sztabina's diff endpoint.

{ Sztabina } ==> {  (Sztabina REST Client) -- (Sztab) }

The client currently uses bodyToMono(...), which causes WebFlux to buffer the entire response in memory before deserialization. By default, the maximum in-memory buffer size is 256 KB.

For pull requests with large diffs (many files / large patches), the response size exceeds this limit. When that happens, Spring throws a DataBufferLimitException internally.

This exception is not handled explicitly. Instead, it is caught by the generic error handler in SztabinaClient.diffByUrl() and surfaced as:

"Failed to diff refs via Sztabina"

As a result, the root cause is obscured and difficult to diagnose from logs.


Reproduction

Observed during SZ-78 stress test seeding:

  • Synthetic PR with ~20 files and ~3600 lines of diff
  • Diff response exceeded WebFlux default buffer (256 KB)
  • Triggered DataBufferLimitException

Impact

  • Any PR whose unified diff exceeds ~256 KB fails to render:

    • PR detail view
    • diff preview
    • diff comparison endpoints
  • Failure is asymmetric:

    • Sztabina successfully computes and returns the diff
    • Sztab fails during HTTP client processing
  • Error is misleading and non-actionable:

    • Root cause (buffer overflow) is hidden
    • Appears as generic Sztabina failure

  • Discovered during: SZ-78 (Sztabina diff endpoint stress test)
  • Context: SZ-73 bot mitigation + load testing work
  • rk@tigase.net commented 2 weeks ago

    Root Cause and Fix

    Mismatch between:

    • Unbounded response size (diffs can grow arbitrarily)
    • Fixed WebFlux in-memory buffer limit (256 KB)

    Combined with:

    • Use of bodyToMono(...), which forces full buffering

    Fix

    Increase the WebFlux in-memory buffer size used by WebClient in SztabinaClient.

    this.webClient = builder
            .baseUrl(baseUrl)
            .codecs(configurer -> configurer
                    .defaultCodecs()
                    .maxInMemorySize(maxInMemorySize)). // configuration knob
            .build();
    

    Make the limit configurable via application properties:

    @Value("${sztab.webclient.max-in-memory-size:16777216}")
    private int maxInMemorySize;
    

    Detection

    Add specific handling for DataBufferLimitException in SztabinaClient so the error message is actionable:

    } catch (DataBufferLimitException e) {
        log.error("Diff response exceeded buffer limit — consider increasing " +
                  "sztab.webclient.max-in-memory-size: {}", e.getMessage());
        throw new RuntimeException("Diff too large to process: " + e.getMessage(), e);
    }
    

    Sizing rationale

    ScenarioEstimated diff size
    Typical PR (50 files, 100 lines each)~1MB
    Large PR (200 files, 200 lines each)~8MB
    Very large PR (500 files, 200 lines each)~20MB

    16MB covers the vast majority of real-world PRs for an engineering team tool. Extreme outliers (monorepo-scale diffs) should be handled separately via pagination or streaming — out of scope for this fix.

  • rk@tigase.net commented 2 weeks ago

    Time Log:

    30min — fix SztabinaClient (add codec config, make it @Value configurable)
    30min — add specific DataBufferLimitException catch with clear error message
    30min — add application property to application.yml with sensible default and comment
    30min — test with the TESTSZTAB repo to confirm large diffs now work
    
    

    Straightforward contained fix, no schema changes, no API changes.

  • rk@tigase.net commented 2 weeks ago

    Trade-offs

    Increasing the buffer size increases per-request memory usage under load. If many large diffs are processed concurrently, this can lead to elevated heap pressure.

    This is acceptable for now given expected usage patterns, but reinforces the need for a streaming-based approach for large diffs in the future.

  • rk@tigase.net commented 2 weeks ago
    rksuma@Ramakrishnans-MacBook-Pro sztab % git checkout wolnosc
    git pull origin wolnosc
    git checkout -b bugfix/SZ-79-webclient-buffer-limit
    Switched to branch 'wolnosc'
    Your branch is up to date with 'origin/wolnosc'.
    From https://tigase.dev/sztab
     * branch            wolnosc    -> FETCH_HEAD
    Already up to date.
    Switched to a new branch 'bugfix/SZ-79-webclient-buffer-limit'
    rksuma@Ramakrishnans-MacBook-Pro sztab % 
    
    
  • rk@tigase.net changed state to 'In Progress' 2 weeks ago
    Previous Value Current Value
    Open
    In Progress
  • rk@tigase.net commented 2 weeks ago

    rksuma@Ramakrishnans-MacBook-Pro sztab % git commit -m "SZ-79: enforce WebClient buffer limit and add stress tests for large diff payloads

    • Configure maxInMemorySize for WebClient
    • Add stress tests covering oversized responses
    • Improve error message to include buffer limit
    • Add valid large payload test for realism" [bugfix/SZ-79-webclient-buffer-limit 746cc2c] SZ-79: enforce WebClient buffer limit and add stress tests for large diff payloads 4 files changed, 171 insertions(+), 2 deletions(-) create mode 100644 backend/src/test/java/com/sztab/sztabina/client/SztabinaClientStressTest.java rksuma@Ramakrishnans-MacBook-Pro sztab % git push --set-upstream origin bugfix/SZ-79-webclient-buffer-limit --force-with-lease Enumerating objects: 37, done. Counting objects: 100% (37/37), done. Delta compression using up to 12 threads Compressing objects: 100% (15/15), done. Writing objects: 100% (21/21), 4.11 KiB | 4.11 MiB/s, done. Total 21 (delta 6), reused 0 (delta 0), pack-reused 0 (from 0) remote:
      remote: Create a pull request for 'bugfix/SZ-79-webclient-buffer-limit' by visiting: remote: https://tigase.dev/sztab/~pulls/new?target=1325:wolnosc&source=1325:bugfix/SZ-79-webclient-buffer-limit remote:
      To https://tigase.dev/sztab.git
    • [new branch] bugfix/SZ-79-webclient-buffer-limit -> bugfix/SZ-79-webclient-buffer-limit branch 'bugfix/SZ-79-webclient-buffer-limit' set up to track 'origin/bugfix/SZ-79-webclient-buffer-limit'.
    rksuma@Ramakrishnans-MacBook-Pro sztab % git checkout wolnosc
    git pull origin wolnosc
    git merge bugfix/SZ-79-webclient-buffer-limit
    git push origin wolnosc
    Switched to branch 'wolnosc'
    Your branch is up to date with 'origin/wolnosc'.
    From https://tigase.dev/sztab
     * branch            wolnosc    -> FETCH_HEAD
    Already up to date.
    Updating affba53..746cc2c
    Fast-forward
     backend/pom.xml                                                                     |   7 ++++
     backend/src/main/java/com/sztab/sztabina/client/impl/SztabinaClient.java            |  22 ++++++++++-
     backend/src/test/java/com/sztab/sztabina/client/SztabinaClientListBranchesTest.java |   1 +
     backend/src/test/java/com/sztab/sztabina/client/SztabinaClientStressTest.java       | 143 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     4 files changed, 171 insertions(+), 2 deletions(-)
     create mode 100644 backend/src/test/java/com/sztab/sztabina/client/SztabinaClientStressTest.java
    Total 0 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
    To https://tigase.dev/sztab.git
       affba53..746cc2c  wolnosc -> wolnosc
    rksuma@Ramakrishnans-MacBook-Pro sztab % 
    
    
  • rk@tigase.net changed state to 'In QA' 2 weeks ago
    Previous Value Current Value
    In Progress
    In QA
  • rk@tigase.net commented 2 weeks ago
  • rk@tigase.net changed state to 'Closed' 2 weeks ago
    Previous Value Current Value
    In QA
    Closed
issue 1 of 1
Type
Performance
Priority
Normal
Assignee
Version
1.10.0
Sprints
n/a
Customer
n/a
Issue Votes (0)
Watchers (3)
Reference
SZ-119
Please wait...
Page is in error, reload to recover