Handling API Rate Limits

This guide explains how to implement rate limit prevention and recovery in a fully reactive Java service, using:

Spring Boot @Scheduled polling (for illustration only)
WebClient for non-blocking HTTP
Resilience4j RateLimiter for token-bucket control
Reactor Mono for reactive chaining

Why avoid hitting rate limits?

Rate-limited APIs (like Phrase’s) reject requests when you exceed a quota — for logged in users at Phrase, 6000 requests per minute. Hitting the limit can cause:

HTTP 429 Too Many Requests responses
Retries that worsen load (retry storms)
Degradation of service or even temporary API bans (very rarely)

Best practice: stay within the quota and handle overshoots gracefully.

How to recover from hitting the limit

Even with controls in place, you might overshoot occasionally. You should:

Detect 429 responses
Retry once with jitter (random delay)
Suppress stack traces for expected rate-limit errors
Never block threads

Our Example

This component polls the Phrase API periodically, respecting the rate limit and logging project names reactively. Retrieving a list of project names is really just an example that was used during the creation of this article to make sure the code works as intended.

⚙️ WebClient + Resilience4j Setup

private final WebClient webClient;
private final RateLimiter rateLimiter;

public ScheduledPoller(
        @Value("${phrase.base-url:https://cloud.memsource.com/web/api2/v1/}") String baseUrl,
        @Value("${phrase.api-token}") String apiToken,
        @Value("${phrase.rate-limit.rpm:6000}") int requestsPerMinute,
        WebClient.Builder builder
) {
    this.webClient = builder
            .baseUrl(baseUrl)
            .defaultHeader("Authorization", "ApiToken " + apiToken)
            .defaultHeader("Accept", "application/json")
            .build();

    RateLimiterConfig config = RateLimiterConfig.custom()
            .timeoutDuration(Duration.ofMillis(0))            // fail fast
            .limitRefreshPeriod(Duration.ofMinutes(1))        // 1-minute refill window
            .limitForPeriod(requestsPerMinute)                // e.g. 6000 RPM
            .build();

    rateLimiter = RateLimiter.of("apiLimiter", config);
}

🔍 Key points:

The limiter allows requestsPerMinute API calls per minute (configurable).
It fails immediately if the quota is exhausted (no queueing or waiting).
No new threads are created — everything stays non-blocking.

The API call method

public Mono<List<String>> listProjects() {
    return webClient.get()
            .uri(u -> u.path("projects")
                    .queryParam("pageSize", 50)
                    .queryParam("pageNumber", 0)
                    .queryParam("includeArchived", false)
                    .build())
            .retrieve()
            .bodyToMono(JsonNode.class)
            .transformDeferred(RateLimiterOperator.of(rateLimiter))
            .map(this::extractProjectNames)
            .retryWhen(
                    Retry.fixedDelay(1, Duration.ofSeconds(1))   // retry once
                         .jitter(0.5)                            // 50% jitter
                         .filter(ex -> ex instanceof WebClientResponseException.TooManyRequests)
            );
}

🔍 Highlights:

Applies the rate limiter reactively using transformDeferred(...).
Uses retryWhen(...) to retry only on 429 errors.
Adds jitter to avoid retry storms.

JSON → Project name extraction

private List<String> extractProjectNames(JsonNode json) {
    return StreamSupport.stream(json.path("content").spliterator(), false)
            .map(p -> p.path("name").asText())
            .collect(Collectors.toList());
}

🔍 Cleanly extracts "name" from the "content" array in the JSON response. Just for readability.

Polling logic with logging & error handling

@Scheduled(fixedDelayString = "${phrase.poll.delay-ms:100}")
public void poll() {
    listProjects()
        .subscribe(
            names -> log.info("Projects: {}", names),
            error -> {
                if (error instanceof RequestNotPermitted) {
                    log.warn("Rate limit exceeded - are you too fast for Phrase?");
                } else {
                    log.error("Unexpected error during project poll", error);
                }
            }
        );
}

🔍 Explanation:

The poller runs every 100 milliseconds by default (configurable via phrase.poll.delay-ms). While this is safely within the defined rate limit, it’s primarily for demonstration. Real-world applications will typically trigger API requests based on actual events, workflows, or user actions—not by polling a static endpoint in a tight loop.
Subscribes to the Mono<List<String>> returned by listProjects().
Handles:
RequestNotPermitted — client-side rate limit exceeded (token bucket empty)
Other exceptions (e.g. HTTP errors)

What happens when the limit is hit?

There are two possible failure scenarios:

1. Client-side limit exceeded

The RateLimiter detects that no tokens are left.
It immediately fails with RequestNotPermitted.
The subscribe() block catches it and logs:

Rate limit exceeded - are you too fast for Phrase?

✔️ No thread is blocked ✔️ No request is sent ✔️ No stack trace is thrown

2. Server returns HTTP 429

The server says “too many requests” via a 429 Too Many Requests response.
The .retryWhen(...) block triggers a single retry after a jittered delay.
If it fails again, the error is logged as usual.

Summary

Concern	This Example Handles It With
Avoiding rate limit	`RateLimiterOperator` with RPM config
Failing fast on quota hit	`.timeoutDuration(Duration.ofMillis(0))`
Retrying 429s	`.retryWhen(...).jitter(...).filter(...)`
Logging gracefully	`subscribe(…, error -> log.warn
Staying non-blocking	Fully reactive: WebClient + Mono + Operator

Final Result

A minimal but robust setup for:

Scheduled polling
Reactive rate limiting
Retry and error handling
Clean logs and no blocking

You can drop this class into any Spring Boot app that uses WebClient, and you’re good to go.

Guides

API Documentation

Why avoid hitting rate limits?

How to recover from hitting the limit

Our Example

⚙️ WebClient + Resilience4j Setup

The API call method

JSON → Project name extraction

Polling logic with logging & error handling

What happens when the limit is hit?

1. Client-side limit exceeded

2. Server returns HTTP 429

Summary

Final Result

Guides

API Documentation

​Why avoid hitting rate limits?

​How to recover from hitting the limit

​Our Example

​⚙️ WebClient + Resilience4j Setup

​The API call method

​JSON → Project name extraction

​Polling logic with logging & error handling

​What happens when the limit is hit?

​1. Client-side limit exceeded

​2. Server returns HTTP 429

​Summary

​Final Result

Why avoid hitting rate limits?

How to recover from hitting the limit

Our Example

⚙️ WebClient + Resilience4j Setup

The API call method

JSON → Project name extraction

Polling logic with logging & error handling

What happens when the limit is hit?

1. Client-side limit exceeded

2. Server returns HTTP 429

Summary

Final Result