Have you ever noticed that some APIs don't receive traffic evenly?
Instead of steady, predictable calls, certain endpoints get sudden bursts - a few calls at once, followed by quiet periods, and then another burst.
When multiple users trigger these bursts at the same time, the API can become overloaded. Response times increase, CPU usage spikes, and users start experiencing delays. This is where a more advanced rate limiting technique comes into play - the Sliding Window Rate Limiter.
This blog focuses on how the Sliding Window limiter works in ASP.NET Core and how it control unpredictable traffic spikes.
Rate Limiting: What Is It?
Rate limiting is a technique that controls how many requests a client can send to an API in a given time period so that the system stays stable, secure, and fair for all users.
Traffic suddenly increases
Scripts or scheduled jobs repeatedly make calls
Brute-force attacks or accidental request floods occur
With rate limiting in place, the API stays fair and consistent for all clients.
For a general introduction to ASP.NET Core rate limiting, refer to the first blog in this series:
What Is a Sliding Window Rate Limiter?
The Sliding Window Rate Limiter is designed to handle bursty, uneven traffic patterns.
Unlike the Fixed Window limiter (which resets after every full window), the Sliding Window takes a more dynamic approach. It divides the total time window into smaller segments and tracks the number of requests within each segment. As time moves forward, the window slides, and the limiter continuously recalculates the request count based on the recent activity.
This results in a much smoother, fairer distribution of requests over time.

Think of the Sliding Window as a rolling time period.
If the limit is 5 requests per minute, it always counts requests in the last 60 seconds, not only between minute boundaries.
The window is divided into smaller segments (e.g., 3 segments of 20 seconds each).
When a segment expires, its count is recycled and deducted cleanly.
This prevents a user from making 5 requests at the end of one window and another 5 at the beginning of the next - a common issue in the Fixed Window algorithm.
The Sliding Window limiter gives a more realistic limit based on actual traffic patterns.
Example Scenario
Taking an example of a real-world production scenario - where an API is responsible for generating an Inventory Report.
This Inventory report is heavy, involves multiple database calls, and takes time to process.
Under normal conditions, the API received 1–2 requests per minute, which the system handled easily. During busy periods, or when partner systems triggered reports at slightly different times, requests arrived in short bursts. 5 requests arrived within 30 seconds, followed by a brief pause, and then another burst of 5 requests.
Even though the total number of requests within 1 minute was not very high, the uneven timing of these requests created sudden pressure on the system.
This bursty traffic overloaded the database, slowed down report generation, and affected other APIs running on the same resources.
What happened without a rate limiter?
CPU usage spiked
Database queries slowed down
Report generation time increased
Other APIs sharing the same database became slower
Occasional timeouts occurred during peak usage
The issue wasn’t the overall volume - it was the uneven timing of the requests.
The system needed a way to smooth out these bursts and prevent too many report requests from arriving too close to each other.
What happened after adding a Sliding Window limiter?
A Sliding Window limiter solved this problem effectively.
Instead of looking at requests only inside a strict 1 minute window, the Sliding Window algorithm:
Divided the 1 minute window into 3 smaller segments of 20 seconds
Counted requests across the actual last 1 minute
Smoothed out inconsistent spikes
Prevented bursts from piling up too quickly
Ensured fair request distribution over time
By smoothing out the bursty traffic, the Sliding Window limiter kept the Inventory Report API reliable and prevented sudden overload.
Below is a sample implementation
// No additional package installation is required. Rate limiting is built into ASP.NET Core starting from .NET 7.
using Microsoft.AspNetCore.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddRateLimiter(options =>
{
options.AddSlidingWindowLimiter("InventoryReportLimit", limiterOptions =>
{
limiterOptions.PermitLimit = 5; // allow 5 total requests
limiterOptions.Window = TimeSpan.FromSeconds(60); // sliding 60-second window
limiterOptions.SegmentsPerWindow = 3; // splits into sub-windows
limiterOptions.QueueLimit = 0; // no queuing
});
options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
await context.HttpContext.Response
.WriteAsync("Too many report requests. Try again later.");
};
});
var app = builder.Build();
app.UseRouting();
app.UseRateLimiter();
app.MapControllers();
app.Run();
Setting | Meaning (Sliding Window limiter) |
PermitLimit = 5 | Maximum number of allowed requests in the rolling window for one client. |
Window = 60 seconds | Total length of the sliding time window used to calculate the request rate. |
SegmentsPerWindow=3 | Number of sub‑windows that the 60‑second window is divided into (20 seconds each here). |
QueueLimit = 0 | Maximum number of extra requests that can wait in the queue when the limit is reached (0 = none). |
This configuration allows 5 requests in any rolling 60 second period, not just per minute block.
Advantages of Sliding Window rate limiter
Smooths out uneven, bursty traffic
Prevents double-burst issues at window boundaries
More accurate and fair than Fixed Window
Ideal for real-time user actions (typing, scrolling, clicking)
Reduces sudden CPU spikes
Keeps APIs responsive during peak usage
Works well for external integrations and mobile apps
When should you use a Sliding Window rate limiter?
Use this limiter when:
Traffic is bursty and unpredictable
Users perform rapid actions (typing, searching, tapping)
Partner systems send requests in uneven batches
You want a fair limit based on the actual last time period
Fixed Window is too rigid for your workload
When should you avoid using a Sliding Window rate limiter?
Avoid this limiter when:
Traffic is consistent and predictable → Fixed Window rate limiter option works better in this case
You want to allow intentional short bursts → Token Bucket rate limiter works better in this case
You want to limit concurrent requests instead of rate → Concurrency rate limiter works better in this case
Benchmark: Sliding Window Rate Limiter (Local Runtime Test)
To understand the runtime impact of the Sliding Window Rate Limiter, a local benchmark was performed using Visual Studio Diagnostic Tools.
Test Environment
Environment: Local development machine
Framework: ASP.NET Core
Monitoring: Visual Studio Diagnostic Tools (Debug mode)
Endpoint: Inventory Report API
Test Scenarios
The same load pattern was applied in both scenarios to ensure a fair comparison.
Concurrent users: 50
Traffic pattern: Continuous requests for 2 minutes
Rate limit configuration: Sliding Window – 5 requests per 1 minute
Metric Category | Metric | Without Rate Limiter | Sliding Window Rate Limiter |
CPU Usage | Peak CPU Usage | 72% | 54% |
Average CPU Usage | 35-40% | 15–25% | |
Thread Pool | Peak Threads | 45 | 25 |
Thread Starvation | Yes | No | |
Requests | Successful Requests | 100% (slow) | 12% |
Rejected (429) | 0 | 82% | |
Response Time | P95 | 9.8s | 130ms |
Note: These benchmarks were captured on a local development machine using Visual Studio Diagnostic Tools. Results may vary based on hardware, workload, and runtime configuration.
Summary
The Sliding Window Rate Limiter is a great choice for APIs that receive uneven, bursty traffic.
By splitting the time window into smaller segments and sliding the window continuously, it keeps your API fast, fair, and stable - even when users or systems send multiple requests in a short time.
If you want to explore other rate limiting strategies, check out the blogs on Fixed Window, Token Bucket, and Concurrency limiters.
FAQs
Q1: Can Sliding Window rate limiting work across multiple servers?
A: By default, No.
Sliding Window rate limiting works per server by default and does not synchronize across multiple servers.
To make it work across multiple servers, a distributed store like Redis is required so all instances share the same counters. We’ll explore distributed rate limiting in an upcoming blog post.
builder.Services.AddStackExchangeRedisRateLimiter(options =>
{
options.ConnectionMultiplexer =
ConnectionMultiplexer.Connect("localhost:6379");
});
Q2: How do I test if my rate limiting is working correctly?
A: Rate limiting can be verified using a unit test that sends multiple requests in quick succession and checks that the API returns 429 Too Many Requests once the configured limit is exceeded.
[Fact]
public async Task SlidingWindowLimiter_Returns_429_On_6th_Request()
{
// Arrange
var responses = new List<HttpResponseMessage>();
// Act
for (int i = 0; i < 6; i++)
{
responses.Add(await _client.GetAsync("/api/inventory-report"));
}
// Assert
Assert.All(
responses.Take(5),
r => Assert.Equal(HttpStatusCode.OK, r.StatusCode)
);
Assert.Equal(
HttpStatusCode.TooManyRequests,
responses.Last().StatusCode
);
}
Q3: Can I use multiple rate limiters on the same endpoint?
A: Yes.
ASP.NET Core allows you to combine multiple rate limiters on a single endpoint.
For example, you can use:
A Sliding Window limiter to control bursty traffic
An IP-based limiter to prevent abuse from a single client
This approach provides better protection for public or high-risk APIs.
Q4: What happens to requests in the queue (QueueLimit)?
A: When the limit is reached, requests can optionally be queued instead of rejected.
Queued requests wait until permits become available. If the queue is full, new requests are rejected with HTTP 429.
For most production APIs - especially long-running ones - it is recommended to set QueueLimit = 0 and reject excess requests immediately.
