Documentation Index
Fetch the complete documentation index at: https://developers.phrase.com/llms.txt
Use this file to discover all available pages before exploring further.
Phrase’s API enforces not only per-minute quotas but also concurrent request caps. You can stay under your 6000 RPM limit and still hit HTTP 429 Too Many Requests if too many requests run in parallel.
In this article, we’ll extend our previous rate-limiting example with Bulkheads and two retry strategies:
- A short exponential backoff for global quota overshoots and transient failures.
- A long 1-minute backoff for concurrency overshoots when Phrase signals via headers.
Why Bulkheads?
Without concurrency control, bursts of parallel requests can:
- Trigger 429s even if the RPM (requests per minute) budget isn’t exhausted.
- Cause retry storms if every client retries in lockstep.
- Waste resources since failed concurrent requests still consume capacity.
A Bulkhead protects your service by limiting simultaneous in-flight calls. Combined with a RateLimiter, it ensures you respect both Phrase’s RPM and concurrent caps.
Setup: RateLimiter + Bulkhead
We configure both quota- and concurrency-based protection:
// Per-minute rate limiter
RateLimiterConfig rlConfig = RateLimiterConfig.custom()
.timeoutDuration(Duration.ZERO)
.limitRefreshPeriod(Duration.ofMinutes(1))
.limitForPeriod(requestsPerMinute)
.build();
this.rateLimiter = RateLimiter.of("apiLimiter", rlConfig);
// Bulkhead for concurrent calls
BulkheadConfig bhConfig = BulkheadConfig.custom()
.maxConcurrentCalls(concurrentRequests) // e.g. 50
.maxWaitDuration(Duration.ZERO) // fail fast
.build();
this.bulkhead = Bulkhead.of("apiBulkhead", bhConfig);
- RateLimiter: caps requests per minute.
- Bulkhead: enforces max parallel requests (configurable via Spring - the example assumes you have a variable for that).
Retry strategies
We define two RetryConfigs with different backoff functions:
// Global quota or transient errors: exponential backoff with jitter
IntervalFunction globalBackoff =
IntervalFunction.ofExponentialRandomBackoff(
1000, // base 1s
2.0, // multiplier
0.5 // ±50% jitter
);
RetryConfig globalRetryConfig = RetryConfig.custom()
.maxAttempts(5)
.intervalFunction(globalBackoff)
.retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests
|| ex instanceof WebClientResponseException.ServiceUnavailable)
.build();
this.globalRetry = Retry.of("globalRetry", globalRetryConfig);
// Concurrency overshoot: long 1 min backoff with jitter
IntervalFunction concurrentBackoff =
IntervalFunction.ofRandomized(
Duration.ofMinutes(1).toMillis(), 0.1); // ±10%
RetryConfig concurrentRetryConfig = RetryConfig.custom()
.maxAttempts(3)
.intervalFunction(concurrentBackoff)
.retryOnException(ex -> ex instanceof WebClientResponseException.TooManyRequests)
.build();
this.concurrentRetry = Retry.of("concurrentRetry", concurrentRetryConfig);
- Global retry: exponential backoff, short delays, jittered to avoid retry storms.
- Concurrent retry: fixed 1-minute delay with jitter, conservative and respectful.
The API call
public Mono<List<String>> listProjects() {
// this is just calling the API
Supplier<Mono<List<String>>> call = () -> webClient.get()
.uri(uriBuilder -> uriBuilder
.path("projects")
.queryParam("pageSize", 50)
.queryParam("pageNumber", 0)
.queryParam("includeArchived", false)
.build())
.retrieve()
.bodyToMono(JsonNode.class)
.map(this::extractProjectNames);
// here the prevention & recovery are put together
return Mono.defer(call)
.transformDeferred(BulkheadOperator.of(bulkhead))
.transformDeferred(RateLimiterOperator.of(rateLimiter))
.onErrorResume(WebClientResponseException.TooManyRequests.class, ex -> {
HttpHeaders headers = ex.getHeaders();
if (headers != null &&
headers.containsKey("Ratelimit-Limit") &&
headers.containsKey("Ratelimit-Remaining")) {
// concurrent limit → long backoff
return Mono.defer(call)
.transformDeferred(RetryOperator.of(concurrentRetry));
} else {
// global limit → shorter backoff
return Mono.defer(call)
.transformDeferred(RetryOperator.of(globalRetry));
}
});
}
-
Both
BulkheadOperator and RateLimiterOperator decorate the call.
-
On
429 Too Many Requests, the presence or absence of specific headers tells you which type of limit has been hit. This behavior is particular to Phrase TMS:
-
If the response includes
Ratelimit-Limit and Ratelimit-Remaining headers, it means the concurrent request limit was exceeded.
-
If those headers are absent, the failure was caused by the global per-minute quota.
The distinction comes from how Phrase enforces limits internally:
- Global limits are applied at the proxy layer, so no application specific headers are there.
- Concurrent limits are enforced within the application itself, and that’s why the extra headers are present when they trigger.
Polling with logging
@Scheduled(fixedDelayString = "${phrase.poll.delay-ms:200}")
public void poll() {
listProjects()
.subscribe(
names -> log.info("Projects: {}", names),
error -> {
if (error instanceof RequestNotPermitted) {
log.warn("Rate limiter triggered: too many requests per minute");
} else if (error instanceof BulkheadFullException) {
log.warn("Bulkhead full: too many concurrent requests");
} else {
log.error("Unexpected error during project poll", error);
}
}
);
}
This makes it clear in logs whether:
- The RateLimiter was tripped,
- The Bulkhead was full, or
- Some other error occurred.
Summary
| Concern | Solution |
|---|
| Avoiding per-minute quota | RateLimiter |
| Avoiding concurrency bursts | Bulkhead |
| Global overshoot / 5xx | globalRetry (exponential + jitter) |
| Concurrency overshoot | concurrentRetry (1m + jitter) |
| Logging & observability | Explicit branches in poll() error handling |
| Non-blocking design | Reactive Mono, no threads are blocked |
Final Result
With this setup you get a fully reactive, header-aware, concurrency-safe client:
- Bulkhead prevents overload before requests leave your service.
- RateLimiter enforces RPM quotas.
- Two retry configs ensure retries are smart, respectful, and jittered.
- Logs clearly differentiate what kind of limit was exceeded.