> ## Documentation Index
> Fetch the complete documentation index at: https://developers.phrase.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Handling Concurrent API Rate Limits

Phrase’s API enforces not only **per-minute quotas** but also **concurrent request caps**. You can stay under your 6000 RPM limit and still hit `HTTP 429 Too Many Requests` if too many requests run in parallel.

In this article, we’ll extend our previous rate-limiting example with **Bulkheads** and **two retry strategies**:

* A **short exponential backoff** for global quota overshoots and transient failures.
* A **long 1-minute backoff** for concurrency overshoots when Phrase signals via headers.

## Why Bulkheads?

Without concurrency control, bursts of parallel requests can:

* Trigger 429s even if the RPM (requests per minute) budget isn’t exhausted.
* Cause retry storms if every client retries in lockstep.
* Waste resources since failed concurrent requests still consume capacity.

A **Bulkhead** protects your service by limiting simultaneous in-flight calls. Combined with a `RateLimiter`, it ensures you respect both Phrase’s RPM and concurrent caps.

## Setup: RateLimiter + Bulkhead

We configure both quota- and concurrency-based protection:

```java theme={null}
// Per-minute rate limiter
RateLimiterConfig rlConfig = RateLimiterConfig.custom()
        .timeoutDuration(Duration.ZERO)
        .limitRefreshPeriod(Duration.ofMinutes(1))
        .limitForPeriod(requestsPerMinute)
        .build();
this.rateLimiter = RateLimiter.of("apiLimiter", rlConfig);

// Bulkhead for concurrent calls
BulkheadConfig bhConfig = BulkheadConfig.custom()
        .maxConcurrentCalls(concurrentRequests) // e.g. 50
        .maxWaitDuration(Duration.ZERO) // fail fast
        .build();
this.bulkhead = Bulkhead.of("apiBulkhead", bhConfig);

```

* **RateLimiter**: caps requests per minute.
* **Bulkhead**: enforces max parallel requests (configurable via Spring - the example assumes you have a variable for that).

## Retry strategies

We define two `RetryConfig`s with different backoff functions:

```java theme={null}
// Global quota or transient errors: exponential backoff with jitter
IntervalFunction globalBackoff =
        IntervalFunction.ofExponentialRandomBackoff(
                1000,  // base 1s
                2.0,   // multiplier
                0.5    // ±50% jitter
        );
RetryConfig globalRetryConfig = RetryConfig.custom()
        .maxAttempts(5)
        .intervalFunction(globalBackoff)
        .retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests
                || ex instanceof WebClientResponseException.ServiceUnavailable)
        .build();
this.globalRetry = Retry.of("globalRetry", globalRetryConfig);

// Concurrency overshoot: long 1 min backoff with jitter
IntervalFunction concurrentBackoff =
        IntervalFunction.ofRandomized(
                Duration.ofMinutes(1).toMillis(), 0.1); // ±10%
RetryConfig concurrentRetryConfig = RetryConfig.custom()
        .maxAttempts(3)
        .intervalFunction(concurrentBackoff)
        .retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests)
        .build();
this.concurrentRetry = Retry.of("concurrentRetry", concurrentRetryConfig);

```

* **Global retry**: exponential backoff, short delays, jittered to avoid retry storms.
* **Concurrent retry**: fixed 1-minute delay with jitter, conservative and respectful.

## The API call

```java theme={null}
public Mono<List<String>> listProjects() {
    // this is just calling the API
    Supplier<Mono<List<String>>> call = () -> webClient.get()
            .uri(uriBuilder -> uriBuilder
                    .path("projects")
                    .queryParam("pageSize", 50)
                    .queryParam("pageNumber", 0)
                    .queryParam("includeArchived", false)
                    .build())
            .retrieve()
            .bodyToMono(JsonNode.class)
            .map(this::extractProjectNames);

    // here the prevention & recovery are put together
    return Mono.defer(call)
            .transformDeferred(BulkheadOperator.of(bulkhead))
            .transformDeferred(RateLimiterOperator.of(rateLimiter))
            .onErrorResume(WebClientResponseException.TooManyRequests.class, ex -> {
                HttpHeaders headers = ex.getHeaders();
                if (headers != null &&
                        headers.containsKey("Ratelimit-Limit") &&
                        headers.containsKey("Ratelimit-Remaining")) {
                    // concurrent limit → long backoff
                    return Mono.defer(call)
                            .transformDeferred(RetryOperator.of(concurrentRetry));
                } else {
                    // global limit → shorter backoff
                    return Mono.defer(call)
                            .transformDeferred(RetryOperator.of(globalRetry));
                }
            });
}

```

* Both `BulkheadOperator` and `RateLimiterOperator` decorate the call.

* On `429 Too Many Requests`, the presence or absence of specific headers tells you which type of limit has been hit. This behavior is particular to **Phrase TMS**:

* If the response **includes** `Ratelimit-Limit` **and** `Ratelimit-Remaining` **headers**, it means the **concurrent request limit** was exceeded.

* If those headers are **absent**, the failure was caused by the **global per-minute quota**.

The distinction comes from how Phrase enforces limits internally:

* **Global limits** are applied at the proxy layer, so no application specific headers are there.
* **Concurrent limits** are enforced within the application itself, and that’s why the extra headers are present when they trigger.

## Polling with logging

```java theme={null}
@Scheduled(fixedDelayString = "${phrase.poll.delay-ms:200}")
public void poll() {
    listProjects()
            .subscribe(
                    names -> log.info("Projects: {}", names),
                    error -> {
                        if (error instanceof RequestNotPermitted) {
                            log.warn("Rate limiter triggered: too many requests per minute");
                        } else if (error instanceof BulkheadFullException) {
                            log.warn("Bulkhead full: too many concurrent requests");
                        } else {
                            log.error("Unexpected error during project poll", error);
                        }
                    }
            );
}

```

This makes it clear in logs whether:

* The **RateLimiter** was tripped,
* The **Bulkhead** was full, or
* Some other error occurred.

## Summary

| Concern                     | Solution                                     |
| :-------------------------- | :------------------------------------------- |
| Avoiding per-minute quota   | `RateLimiter`                                |
| Avoiding concurrency bursts | `Bulkhead`                                   |
| Global overshoot / 5xx      | `globalRetry` (exponential + jitter)         |
| Concurrency overshoot       | `concurrentRetry` (1m + jitter)              |
| Logging & observability     | Explicit branches in `poll()` error handling |
| Non-blocking design         | Reactive `Mono`, no threads are blocked      |

## Final Result

With this setup you get a **fully reactive, header-aware, concurrency-safe client**:

* **Bulkhead** prevents overload before requests leave your service.
* **RateLimiter** enforces RPM quotas.
* **Two retry configs** ensure retries are smart, respectful, and jittered.
* **Logs** clearly differentiate what kind of limit was exceeded.