Handling Concurrent API Rate Limits

Phrase’s API enforces not only per-minute quotas but also concurrent request caps. You can stay under your 6000 RPM limit and still hit HTTP 429 Too Many Requests if too many requests run in parallel. In this article, we’ll extend our previous rate-limiting example with Bulkheads and two retry strategies:

A short exponential backoff for global quota overshoots and transient failures.
A long 1-minute backoff for concurrency overshoots when Phrase signals via headers.

Why Bulkheads?

Without concurrency control, bursts of parallel requests can:

Trigger 429s even if the RPM (requests per minute) budget isn’t exhausted.
Cause retry storms if every client retries in lockstep.
Waste resources since failed concurrent requests still consume capacity.

A Bulkhead protects your service by limiting simultaneous in-flight calls. Combined with a RateLimiter, it ensures you respect both Phrase’s RPM and concurrent caps.

Setup: RateLimiter + Bulkhead

We configure both quota- and concurrency-based protection:

// Per-minute rate limiter
RateLimiterConfig rlConfig = RateLimiterConfig.custom()
        .timeoutDuration(Duration.ZERO)
        .limitRefreshPeriod(Duration.ofMinutes(1))
        .limitForPeriod(requestsPerMinute)
        .build();
this.rateLimiter = RateLimiter.of("apiLimiter", rlConfig);

// Bulkhead for concurrent calls
BulkheadConfig bhConfig = BulkheadConfig.custom()
        .maxConcurrentCalls(concurrentRequests) // e.g. 50
        .maxWaitDuration(Duration.ZERO) // fail fast
        .build();
this.bulkhead = Bulkhead.of("apiBulkhead", bhConfig);

RateLimiter: caps requests per minute.
Bulkhead: enforces max parallel requests (configurable via Spring - the example assumes you have a variable for that).

Retry strategies

We define two RetryConfigs with different backoff functions:

// Global quota or transient errors: exponential backoff with jitter
IntervalFunction globalBackoff =
        IntervalFunction.ofExponentialRandomBackoff(
                1000,  // base 1s
                2.0,   // multiplier
                0.5    // ±50% jitter
        );
RetryConfig globalRetryConfig = RetryConfig.custom()
        .maxAttempts(5)
        .intervalFunction(globalBackoff)
        .retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests
                || ex instanceof WebClientResponseException.ServiceUnavailable)
        .build();
this.globalRetry = Retry.of("globalRetry", globalRetryConfig);

// Concurrency overshoot: long 1 min backoff with jitter
IntervalFunction concurrentBackoff =
        IntervalFunction.ofRandomized(
                Duration.ofMinutes(1).toMillis(), 0.1); // ±10%
RetryConfig concurrentRetryConfig = RetryConfig.custom()
        .maxAttempts(3)
        .intervalFunction(concurrentBackoff)
        .retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests)
        .build();
this.concurrentRetry = Retry.of("concurrentRetry", concurrentRetryConfig);

Global retry: exponential backoff, short delays, jittered to avoid retry storms.
Concurrent retry: fixed 1-minute delay with jitter, conservative and respectful.

The API call

public Mono<List<String>> listProjects() {
    // this is just calling the API
    Supplier<Mono<List<String>>> call = () -> webClient.get()
            .uri(uriBuilder -> uriBuilder
                    .path("projects")
                    .queryParam("pageSize", 50)
                    .queryParam("pageNumber", 0)
                    .queryParam("includeArchived", false)
                    .build())
            .retrieve()
            .bodyToMono(JsonNode.class)
            .map(this::extractProjectNames);

    // here the prevention & recovery are put together
    return Mono.defer(call)
            .transformDeferred(BulkheadOperator.of(bulkhead))
            .transformDeferred(RateLimiterOperator.of(rateLimiter))
            .onErrorResume(WebClientResponseException.TooManyRequests.class, ex -> {
                HttpHeaders headers = ex.getHeaders();
                if (headers != null &&
                        headers.containsKey("Ratelimit-Limit") &&
                        headers.containsKey("Ratelimit-Remaining")) {
                    // concurrent limit → long backoff
                    return Mono.defer(call)
                            .transformDeferred(RetryOperator.of(concurrentRetry));
                } else {
                    // global limit → shorter backoff
                    return Mono.defer(call)
                            .transformDeferred(RetryOperator.of(globalRetry));
                }
            });
}

Both BulkheadOperator and RateLimiterOperator decorate the call.
On 429 Too Many Requests, the presence or absence of specific headers tells you which type of limit has been hit. This behavior is particular to Phrase TMS:
If the response includes Ratelimit-Limit and Ratelimit-Remaining headers, it means the concurrent request limit was exceeded.
If those headers are absent, the failure was caused by the global per-minute quota.

The distinction comes from how Phrase enforces limits internally:

Global limits are applied at the proxy layer, so no application specific headers are there.
Concurrent limits are enforced within the application itself, and that’s why the extra headers are present when they trigger.

Polling with logging

@Scheduled(fixedDelayString = "${phrase.poll.delay-ms:200}")
public void poll() {
    listProjects()
            .subscribe(
                    names -> log.info("Projects: {}", names),
                    error -> {
                        if (error instanceof RequestNotPermitted) {
                            log.warn("Rate limiter triggered: too many requests per minute");
                        } else if (error instanceof BulkheadFullException) {
                            log.warn("Bulkhead full: too many concurrent requests");
                        } else {
                            log.error("Unexpected error during project poll", error);
                        }
                    }
            );
}

This makes it clear in logs whether:

The RateLimiter was tripped,
The Bulkhead was full, or
Some other error occurred.

Summary

Concern	Solution
Avoiding per-minute quota	`RateLimiter`
Avoiding concurrency bursts	`Bulkhead`
Global overshoot / 5xx	`globalRetry` (exponential + jitter)
Concurrency overshoot	`concurrentRetry` (1m + jitter)
Logging & observability	Explicit branches in `poll()` error handling
Non-blocking design	Reactive `Mono`, no threads are blocked

Final Result

With this setup you get a fully reactive, header-aware, concurrency-safe client:

Bulkhead prevents overload before requests leave your service.
RateLimiter enforces RPM quotas.
Two retry configs ensure retries are smart, respectful, and jittered.
Logs clearly differentiate what kind of limit was exceeded.

Guides

API Documentation

Handling Concurrent API Rate Limits

Why Bulkheads?

Setup: RateLimiter + Bulkhead

Retry strategies

The API call

Polling with logging

Summary

Final Result

Guides

API Documentation

​Why Bulkheads?

​Setup: RateLimiter + Bulkhead

​Retry strategies

​The API call

​Polling with logging

​Summary

​Final Result

Why Bulkheads?

Setup: RateLimiter + Bulkhead

Retry strategies

The API call

Polling with logging

Summary

Final Result