Rate Limit Annotation for Spring Boot (Token Bucket)
Custom @RateLimit annotation that throttles Spring Boot endpoints per-IP using a thread-safe token bucket. No external dependencies.
@RateLimit is a method-level annotation that throttles a Spring Boot controller using a thread-safe token bucket. The aspect intercepts annotated methods, keys the bucket by client IP + method signature, and throws RateLimitExceededException when the bucket is empty. Zero external dependencies, just spring-aop and standard java.util.concurrent primitives.
Tested on Spring Boot 3.4, Java 21.
When to Use This
- Public endpoints that need basic abuse protection without adding Bucket4j or Redis
- Internal admin tools where you want one annotation, one parameter, done
- Per-endpoint quotas like "max 5 OTP requests per minute"
- Single-instance services or dev environments where in-memory counters are fine
Don't use this when you run multiple replicas behind a load balancer. Each pod keeps its own bucket, so a 10-rps limit becomes 10 * pod_count. For distributed limiting, swap the ConcurrentHashMap for a Redis-backed Bucket4j proxy bucket.
Code
package com.example.demo.aspect;
import jakarta.servlet.http.HttpServletRequest;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimit {
int limit() default 10;
long duration() default 60;
TimeUnit unit() default TimeUnit.SECONDS;
}
@Aspect
@Component
public class RateLimitAspect {
private final Map<String, TokenBucket> buckets = new ConcurrentHashMap<>();
@Around("@annotation(rateLimit)")
public Object enforceRateLimit(ProceedingJoinPoint joinPoint, RateLimit rateLimit) throws Throwable {
HttpServletRequest request =
((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes()).getRequest();
String ip = clientIp(request);
String key = ip + ":" + joinPoint.getSignature().toShortString();
TokenBucket bucket = buckets.computeIfAbsent(key, k ->
new TokenBucket(rateLimit.limit(), rateLimit.duration(), rateLimit.unit()));
if (!bucket.tryConsume()) {
throw new RateLimitExceededException(
"Too many requests. Please try again later.");
}
return joinPoint.proceed();
}
private static String clientIp(HttpServletRequest request) {
String forwarded = request.getHeader("X-Forwarded-For");
if (forwarded != null && !forwarded.isBlank()) {
return forwarded.split(",")[0].trim();
}
return request.getRemoteAddr();
}
private static class TokenBucket {
private final long capacity;
private final long refillPeriodNanos;
private double tokens;
private long lastRefillTime;
TokenBucket(long capacity, long duration, TimeUnit unit) {
this.capacity = capacity;
this.refillPeriodNanos = unit.toNanos(duration) / capacity;
this.tokens = capacity;
this.lastRefillTime = System.nanoTime();
}
synchronized boolean tryConsume() {
refill();
if (tokens >= 1) {
tokens--;
return true;
}
return false;
}
private void refill() {
long now = System.nanoTime();
long elapsed = now - lastRefillTime;
if (elapsed > 0) {
tokens = Math.min(capacity, tokens + (double) elapsed / refillPeriodNanos);
lastRefillTime = now;
}
}
}
public static class RateLimitExceededException extends RuntimeException {
public RateLimitExceededException(String message) { super(message); }
}
}The token bucket refills continuously: each nanosecond since the last call adds a fractional token, capped at capacity. That means a burst of N requests after a quiet period passes, then the limit kicks in. Tweak refillPeriodNanos if you want stricter spacing.
Usage
@RestController
@RequestMapping("/api")
public class ProductController {
@GetMapping("/products")
@RateLimit(limit = 5, duration = 1, unit = TimeUnit.MINUTES)
public List<Product> getProducts() {
return productService.findAll();
}
@PostMapping("/orders")
@RateLimit(limit = 1, duration = 10, unit = TimeUnit.SECONDS)
public Order createOrder(@RequestBody Order order) {
return orderService.create(order);
}
}
// Map the exception to 429 Too Many Requests
@ControllerAdvice
public class RateLimitAdvice {
@ExceptionHandler(RateLimitAspect.RateLimitExceededException.class)
public ResponseEntity<String> handle(RateLimitAspect.RateLimitExceededException ex) {
return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS).body(ex.getMessage());
}
}Without the @ControllerAdvice mapping, the exception bubbles as a 500. Wire it to 429 so well-behaved clients back off automatically.
Pitfalls
- In-memory counters do not scale across pods. A 10-rps limit across 3 pods is effectively 30 rps. Move to Redis-backed Bucket4j the moment you horizontal scale.
getRemoteAddrlies behind a proxy. Always trustX-Forwarded-For(and validate the upstream is trusted), or every request looks like it comes from the load balancer.- The buckets map never evicts. A long-running service with many unique IPs leaks memory. Wrap the map in a
Caffeinecache withexpireAfterAccess(1, HOURS)for production. - Burst behaviour is intentional. Token buckets allow short bursts up to
capacity. If your endpoint cannot tolerate that, use a fixed-window or sliding-window algorithm instead. synchronizedis fine for low contention. Under heavy load, switch toLongAdderfor token accounting and CAS for the refill timestamp.
Related Snippets & Reading
- Next.js Route Handler Rate Limit — same pattern on the Node side, backed by Upstash Redis
- Retry Executor Utility — pairs with rate-limited downstream calls
- Spring Security Component Revolution — broader security primitives for Spring Boot APIs
Frequently Asked Questions
When should you use this annotation instead of Bucket4j or Resilience4j?
Use this when you want a single-instance, per-IP throttle with zero new dependencies, typically for internal tools or low-traffic public endpoints. For multi-instance services, distributed counters, or sliding-window semantics, switch to Bucket4j (Redis backend) or Resilience4j RateLimiter. This snippet's bucket is in-memory, so each pod gets its own counter.
How do I get the real client IP behind a load balancer?
`HttpServletRequest#getRemoteAddr()` returns the upstream proxy address, not the user. Either read `X-Forwarded-For` after configuring Spring's `ForwardedHeaderFilter` or set `server.forward-headers-strategy=native` in application.properties. Without that, every request from your ALB or CloudFront edge appears as the same IP and the limit triggers globally.