Skip to main content
This guide explains how to implement rate limit prevention and recovery in a fully reactive Java service, using:
  • Spring Boot @Scheduled polling (for illustration only)
  • WebClient for non-blocking HTTP
  • Resilience4j RateLimiter for token-bucket control
  • Reactor Mono for reactive chaining

Why avoid hitting rate limits?

Rate-limited APIs (like Phrase’s) reject requests when you exceed a quota — for logged in users at Phrase, 6000 requests per minute. Hitting the limit can cause:
  • HTTP 429 Too Many Requests responses
  • Retries that worsen load (retry storms)
  • Degradation of service or even temporary API bans (very rarely)
Best practice: stay within the quota and handle overshoots gracefully.

How to recover from hitting the limit

Even with controls in place, you might overshoot occasionally. You should:
  • Detect 429 responses
  • Retry once with jitter (random delay)
  • Suppress stack traces for expected rate-limit errors
  • Never block threads

Our Example

This component polls the Phrase API periodically, respecting the rate limit and logging project names reactively. Retrieving a list of project names is really just an example that was used during the creation of this article to make sure the code works as intended.

⚙️ WebClient + Resilience4j Setup

private final WebClient webClient;
private final RateLimiter rateLimiter;

public ScheduledPoller(
        @Value("${phrase.base-url:https://cloud.memsource.com/web/api2/v1/}") String baseUrl,
        @Value("${phrase.api-token}") String apiToken,
        @Value("${phrase.rate-limit.rpm:6000}") int requestsPerMinute,
        WebClient.Builder builder
) {
    this.webClient = builder
            .baseUrl(baseUrl)
            .defaultHeader("Authorization", "ApiToken " + apiToken)
            .defaultHeader("Accept", "application/json")
            .build();

    RateLimiterConfig config = RateLimiterConfig.custom()
            .timeoutDuration(Duration.ofMillis(0))            // fail fast
            .limitRefreshPeriod(Duration.ofMinutes(1))        // 1-minute refill window
            .limitForPeriod(requestsPerMinute)                // e.g. 6000 RPM
            .build();

    rateLimiter = RateLimiter.of("apiLimiter", config);
}

🔍 Key points:
  • The limiter allows requestsPerMinute API calls per minute (configurable).
  • It fails immediately if the quota is exhausted (no queueing or waiting).
  • No new threads are created — everything stays non-blocking.

The API call method

public Mono<List<String>> listProjects() {
    return webClient.get()
            .uri(u -> u.path("projects")
                    .queryParam("pageSize", 50)
                    .queryParam("pageNumber", 0)
                    .queryParam("includeArchived", false)
                    .build())
            .retrieve()
            .bodyToMono(JsonNode.class)
            .transformDeferred(RateLimiterOperator.of(rateLimiter))
            .map(this::extractProjectNames)
            .retryWhen(
                    Retry.fixedDelay(1, Duration.ofSeconds(1))   // retry once
                         .jitter(0.5)                            // 50% jitter
                         .filter(ex -> ex instanceof WebClientResponseException.TooManyRequests)
            );
}

🔍 Highlights:
  • Applies the rate limiter reactively using transformDeferred(...).
  • Uses retryWhen(...) to retry only on 429 errors.
  • Adds jitter to avoid retry storms.

JSON → Project name extraction

private List<String> extractProjectNames(JsonNode json) {
    return StreamSupport.stream(json.path("content").spliterator(), false)
            .map(p -> p.path("name").asText())
            .collect(Collectors.toList());
}

🔍 Cleanly extracts "name" from the "content" array in the JSON response. Just for readability.

Polling logic with logging & error handling

@Scheduled(fixedDelayString = "${phrase.poll.delay-ms:100}")
public void poll() {
    listProjects()
        .subscribe(
            names -> log.info("Projects: {}", names),
            error -> {
                if (error instanceof RequestNotPermitted) {
                    log.warn("Rate limit exceeded - are you too fast for Phrase?");
                } else {
                    log.error("Unexpected error during project poll", error);
                }
            }
        );
}

🔍 Explanation:
  • The poller runs every 100 milliseconds by default (configurable via phrase.poll.delay-ms). While this is safely within the defined rate limit, it’s primarily for demonstration. Real-world applications will typically trigger API requests based on actual events, workflows, or user actions—not by polling a static endpoint in a tight loop.
  • Subscribes to the Mono<List<String>> returned by listProjects().
  • Handles:
  • RequestNotPermittedclient-side rate limit exceeded (token bucket empty)
  • Other exceptions (e.g. HTTP errors)

What happens when the limit is hit?

There are two possible failure scenarios:

1. Client-side limit exceeded

  • The RateLimiter detects that no tokens are left.
  • It immediately fails with RequestNotPermitted.
  • The subscribe() block catches it and logs:
Rate limit exceeded - are you too fast for Phrase?

✔️ No thread is blocked ✔️ No request is sent ✔️ No stack trace is thrown

2. Server returns HTTP 429

  • The server says “too many requests” via a 429 Too Many Requests response.
  • The .retryWhen(...) block triggers a single retry after a jittered delay.
  • If it fails again, the error is logged as usual.

Summary

ConcernThis Example Handles It With
Avoiding rate limitRateLimiterOperator with RPM config
Failing fast on quota hit.timeoutDuration(Duration.ofMillis(0))
Retrying 429s.retryWhen(...).jitter(...).filter(...)
Logging gracefully`subscribe(…, error -> log.warn
Staying non-blockingFully reactive: WebClient + Mono + Operator

Final Result

A minimal but robust setup for:
  • Scheduled polling
  • Reactive rate limiting
  • Retry and error handling
  • Clean logs and no blocking
You can drop this class into any Spring Boot app that uses WebClient, and you’re good to go.