INTERMEDIATEJAVAWEB & API

Rate Limit Annotation for Spring Boot (Token Bucket)

Custom @RateLimit annotation that throttles Spring Boot endpoints per-IP using a thread-safe token bucket. No external dependencies.

By Tested on Spring Boot 3.4, Java 21
Published Dec 1, 2025Updated May 25, 2026

@RateLimit is a method-level annotation that throttles a Spring Boot controller using a thread-safe token bucket. The aspect intercepts annotated methods, keys the bucket by client IP + method signature, and throws RateLimitExceededException when the bucket is empty. Zero external dependencies, just spring-aop and standard java.util.concurrent primitives.

Tested on Spring Boot 3.4, Java 21.

When to Use This

  • Public endpoints that need basic abuse protection without adding Bucket4j or Redis
  • Internal admin tools where you want one annotation, one parameter, done
  • Per-endpoint quotas like "max 5 OTP requests per minute"
  • Single-instance services or dev environments where in-memory counters are fine

Don't use this when you run multiple replicas behind a load balancer. Each pod keeps its own bucket, so a 10-rps limit becomes 10 * pod_count. For distributed limiting, swap the ConcurrentHashMap for a Redis-backed Bucket4j proxy bucket.

Code

package com.example.demo.aspect;
 
import jakarta.servlet.http.HttpServletRequest;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;
 
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
 
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimit {
    int limit() default 10;
    long duration() default 60;
    TimeUnit unit() default TimeUnit.SECONDS;
}
 
@Aspect
@Component
public class RateLimitAspect {
 
    private final Map<String, TokenBucket> buckets = new ConcurrentHashMap<>();
 
    @Around("@annotation(rateLimit)")
    public Object enforceRateLimit(ProceedingJoinPoint joinPoint, RateLimit rateLimit) throws Throwable {
        HttpServletRequest request =
            ((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes()).getRequest();
        String ip = clientIp(request);
        String key = ip + ":" + joinPoint.getSignature().toShortString();
 
        TokenBucket bucket = buckets.computeIfAbsent(key, k ->
            new TokenBucket(rateLimit.limit(), rateLimit.duration(), rateLimit.unit()));
 
        if (!bucket.tryConsume()) {
            throw new RateLimitExceededException(
                "Too many requests. Please try again later.");
        }
        return joinPoint.proceed();
    }
 
    private static String clientIp(HttpServletRequest request) {
        String forwarded = request.getHeader("X-Forwarded-For");
        if (forwarded != null && !forwarded.isBlank()) {
            return forwarded.split(",")[0].trim();
        }
        return request.getRemoteAddr();
    }
 
    private static class TokenBucket {
        private final long capacity;
        private final long refillPeriodNanos;
        private double tokens;
        private long lastRefillTime;
 
        TokenBucket(long capacity, long duration, TimeUnit unit) {
            this.capacity = capacity;
            this.refillPeriodNanos = unit.toNanos(duration) / capacity;
            this.tokens = capacity;
            this.lastRefillTime = System.nanoTime();
        }
 
        synchronized boolean tryConsume() {
            refill();
            if (tokens >= 1) {
                tokens--;
                return true;
            }
            return false;
        }
 
        private void refill() {
            long now = System.nanoTime();
            long elapsed = now - lastRefillTime;
            if (elapsed > 0) {
                tokens = Math.min(capacity, tokens + (double) elapsed / refillPeriodNanos);
                lastRefillTime = now;
            }
        }
    }
 
    public static class RateLimitExceededException extends RuntimeException {
        public RateLimitExceededException(String message) { super(message); }
    }
}

The token bucket refills continuously: each nanosecond since the last call adds a fractional token, capped at capacity. That means a burst of N requests after a quiet period passes, then the limit kicks in. Tweak refillPeriodNanos if you want stricter spacing.

Usage

@RestController
@RequestMapping("/api")
public class ProductController {
 
    @GetMapping("/products")
    @RateLimit(limit = 5, duration = 1, unit = TimeUnit.MINUTES)
    public List<Product> getProducts() {
        return productService.findAll();
    }
 
    @PostMapping("/orders")
    @RateLimit(limit = 1, duration = 10, unit = TimeUnit.SECONDS)
    public Order createOrder(@RequestBody Order order) {
        return orderService.create(order);
    }
}
 
// Map the exception to 429 Too Many Requests
@ControllerAdvice
public class RateLimitAdvice {
    @ExceptionHandler(RateLimitAspect.RateLimitExceededException.class)
    public ResponseEntity<String> handle(RateLimitAspect.RateLimitExceededException ex) {
        return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS).body(ex.getMessage());
    }
}

Without the @ControllerAdvice mapping, the exception bubbles as a 500. Wire it to 429 so well-behaved clients back off automatically.

Pitfalls

  • In-memory counters do not scale across pods. A 10-rps limit across 3 pods is effectively 30 rps. Move to Redis-backed Bucket4j the moment you horizontal scale.
  • getRemoteAddr lies behind a proxy. Always trust X-Forwarded-For (and validate the upstream is trusted), or every request looks like it comes from the load balancer.
  • The buckets map never evicts. A long-running service with many unique IPs leaks memory. Wrap the map in a Caffeine cache with expireAfterAccess(1, HOURS) for production.
  • Burst behaviour is intentional. Token buckets allow short bursts up to capacity. If your endpoint cannot tolerate that, use a fixed-window or sliding-window algorithm instead.
  • synchronized is fine for low contention. Under heavy load, switch to LongAdder for token accounting and CAS for the refill timestamp.

Frequently Asked Questions

When should you use this annotation instead of Bucket4j or Resilience4j?

Use this when you want a single-instance, per-IP throttle with zero new dependencies, typically for internal tools or low-traffic public endpoints. For multi-instance services, distributed counters, or sliding-window semantics, switch to Bucket4j (Redis backend) or Resilience4j RateLimiter. This snippet's bucket is in-memory, so each pod gets its own counter.

How do I get the real client IP behind a load balancer?

`HttpServletRequest#getRemoteAddr()` returns the upstream proxy address, not the user. Either read `X-Forwarded-For` after configuring Spring's `ForwardedHeaderFilter` or set `server.forward-headers-strategy=native` in application.properties. Without that, every request from your ALB or CloudFront edge appears as the same IP and the limit triggers globally.

X (Twitter)LinkedIn