HTTP 429 Too Many Requests if too many requests run in parallel.
In this article, we’ll extend our previous rate-limiting example with Bulkheads and two retry strategies:
- A short exponential backoff for global quota overshoots and transient failures.
- A long 1-minute backoff for concurrency overshoots when Phrase signals via headers.
Why Bulkheads?
Without concurrency control, bursts of parallel requests can:- Trigger 429s even if the RPM (requests per minute) budget isn’t exhausted.
- Cause retry storms if every client retries in lockstep.
- Waste resources since failed concurrent requests still consume capacity.
RateLimiter, it ensures you respect both Phrase’s RPM and concurrent caps.
Setup: RateLimiter + Bulkhead
We configure both quota- and concurrency-based protection:- RateLimiter: caps requests per minute.
- Bulkhead: enforces max parallel requests (configurable via Spring - the example assumes you have a variable for that).
Retry strategies
We define twoRetryConfigs with different backoff functions:
- Global retry: exponential backoff, short delays, jittered to avoid retry storms.
- Concurrent retry: fixed 1-minute delay with jitter, conservative and respectful.
The API call
-
Both
BulkheadOperatorandRateLimiterOperatordecorate the call. -
On
429 Too Many Requests, the presence or absence of specific headers tells you which type of limit has been hit. This behavior is particular to Phrase TMS: -
If the response includes
Ratelimit-LimitandRatelimit-Remainingheaders, it means the concurrent request limit was exceeded. - If those headers are absent, the failure was caused by the global per-minute quota.
- Global limits are applied at the proxy layer, so no application specific headers are there.
- Concurrent limits are enforced within the application itself, and that’s why the extra headers are present when they trigger.
Polling with logging
- The RateLimiter was tripped,
- The Bulkhead was full, or
- Some other error occurred.
Summary
| Concern | Solution |
|---|---|
| Avoiding per-minute quota | RateLimiter |
| Avoiding concurrency bursts | Bulkhead |
| Global overshoot / 5xx | globalRetry (exponential + jitter) |
| Concurrency overshoot | concurrentRetry (1m + jitter) |
| Logging & observability | Explicit branches in poll() error handling |
| Non-blocking design | Reactive Mono, no threads are blocked |
Final Result
With this setup you get a fully reactive, header-aware, concurrency-safe client:- Bulkhead prevents overload before requests leave your service.
- RateLimiter enforces RPM quotas.
- Two retry configs ensure retries are smart, respectful, and jittered.
- Logs clearly differentiate what kind of limit was exceeded.