Sliding Window Rate Limiter in ASP.NET Core: What is it and when to use it?

.NET Development
Calculating...
📅 Jan 13, 2026
blog-detail-image

Have you ever noticed that some APIs don't receive traffic evenly?

Instead of steady, predictable calls, certain endpoints get sudden bursts - a few calls at once, followed by quiet periods, and then another burst.

When multiple users trigger these bursts at the same time, the API can become overloaded. Response times increase, CPU usage spikes, and users start experiencing delays. This is where a more advanced rate limiting technique comes into play - the Sliding Window Rate Limiter.

This blog focuses on how the Sliding Window limiter works in ASP.NET Core and how it control unpredictable traffic spikes.

Rate Limiting: What Is It?

Rate limiting is a technique that controls how many requests a client can send to an API in a given time period so that the system stays stable, secure, and fair for all users.

  • Traffic suddenly increases

  • Scripts or scheduled jobs repeatedly make calls

  • Brute-force attacks or accidental request floods occur

With rate limiting in place, the API stays fair and consistent for all clients.

For a general introduction to ASP.NET Core rate limiting, refer to the first blog in this series:

"Introduction: Rate Limiting Middleware in ASP.NET Core"

What Is a Sliding Window Rate Limiter?

The Sliding Window Rate Limiter is designed to handle bursty, uneven traffic patterns.

Unlike the Fixed Window limiter (which resets after every full window), the Sliding Window takes a more dynamic approach. It divides the total time window into smaller segments and tracks the number of requests within each segment. As time moves forward, the window slides, and the limiter continuously recalculates the request count based on the recent activity.

This results in a much smoother, fairer distribution of requests over time.

Sliding window rate limiter example

Sliding Window Rate Limiter Example

Think of the Sliding Window as a rolling time period.

  • If the limit is 5 requests per minute, it always counts requests in the last 60 seconds, not only between minute boundaries.

  • The window is divided into smaller segments (e.g., 3 segments of 20 seconds each).

  • When a segment expires, its count is recycled and deducted cleanly.

  • This prevents a user from making 5 requests at the end of one window and another 5 at the beginning of the next - a common issue in the Fixed Window algorithm.

The Sliding Window limiter gives a more realistic limit based on actual traffic patterns.

Example Scenario

Taking an example of a real-world production scenario - where an API is responsible for generating an Inventory Report.

This Inventory report is heavy, involves multiple database calls, and takes time to process.

Under normal conditions, the API received 1–2 requests per minute, which the system handled easily. During busy periods, or when partner systems triggered reports at slightly different times, requests arrived in short bursts. 5 requests arrived within 30 seconds, followed by a brief pause, and then another burst of 5 requests.

Even though the total number of requests within 1 minute was not very high, the uneven timing of these requests created sudden pressure on the system.

This bursty traffic overloaded the database, slowed down report generation, and affected other APIs running on the same resources.

What happened without a rate limiter?

  • CPU usage spiked

  • Database queries slowed down

  • Report generation time increased

  • Other APIs sharing the same database became slower

  • Occasional timeouts occurred during peak usage

The issue wasn’t the overall volume - it was the uneven timing of the requests.

The system needed a way to smooth out these bursts and prevent too many report requests from arriving too close to each other.

What happened after adding a Sliding Window limiter?

A Sliding Window limiter solved this problem effectively.

Instead of looking at requests only inside a strict 1 minute window, the Sliding Window algorithm:

  • Divided the 1 minute window into 3 smaller segments of 20 seconds

  • Counted requests across the actual last 1 minute

  • Smoothed out inconsistent spikes

  • Prevented bursts from piling up too quickly

  • Ensured fair request distribution over time

By smoothing out the bursty traffic, the Sliding Window limiter kept the Inventory Report API reliable and prevented sudden overload.

Below is a sample implementation

Program.cs

// No additional package installation is required. Rate limiting is built into ASP.NET Core starting from .NET 7.

using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddControllers();

builder.Services.AddEndpointsApiExplorer();

builder.Services.AddRateLimiter(options =>

{

options.AddSlidingWindowLimiter("InventoryReportLimit", limiterOptions =>

{

limiterOptions.PermitLimit = 5; // allow 5 total requests

limiterOptions.Window = TimeSpan.FromSeconds(60); // sliding 60-second window

limiterOptions.SegmentsPerWindow = 3; // splits into sub-windows

limiterOptions.QueueLimit = 0; // no queuing

});

options.OnRejected = async (context, cancellationToken) =>

{

context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

await context.HttpContext.Response

.WriteAsync("Too many report requests. Try again later.");

};

});

var app = builder.Build();

app.UseRouting();

app.UseRateLimiter();

app.MapControllers();

app.Run();

Setting

Meaning (Sliding Window limiter)

PermitLimit = 5

Maximum number of allowed requests in the rolling window for one client.

Window = 60 seconds

Total length of the sliding time window used to calculate the request rate.

SegmentsPerWindow=3

Number of sub‑windows that the 60‑second window is divided into (20 seconds each here).

QueueLimit = 0

Maximum number of extra requests that can wait in the queue when the limit is reached (0 = none).

This configuration allows 5 requests in any rolling 60 second period, not just per minute block.

Advantages of Sliding Window rate limiter

  • Smooths out uneven, bursty traffic

  • Prevents double-burst issues at window boundaries

  • More accurate and fair than Fixed Window

  • Ideal for real-time user actions (typing, scrolling, clicking)

  • Reduces sudden CPU spikes

  • Keeps APIs responsive during peak usage

  • Works well for external integrations and mobile apps

When should you use a Sliding Window rate limiter?

Use this limiter when:

  • Traffic is bursty and unpredictable

  • Users perform rapid actions (typing, searching, tapping)

  • Partner systems send requests in uneven batches

  • You want a fair limit based on the actual last time period

  • Fixed Window is too rigid for your workload

When should you avoid using a Sliding Window rate limiter?

Avoid this limiter when:

  • Traffic is consistent and predictable → Fixed Window rate limiter option works better in this case

  • You want to allow intentional short bursts → Token Bucket rate limiter works better in this case

  • You want to limit concurrent requests instead of rate → Concurrency rate limiter works better in this case

Benchmark: Sliding Window Rate Limiter (Local Runtime Test)

To understand the runtime impact of the Sliding Window Rate Limiter, a local benchmark was performed using Visual Studio Diagnostic Tools.

Test Environment

  • Environment: Local development machine

  • Framework: ASP.NET Core

  • Monitoring: Visual Studio Diagnostic Tools (Debug mode)

  • Endpoint: Inventory Report API

Test Scenarios

The same load pattern was applied in both scenarios to ensure a fair comparison.

  • Concurrent users: 50

  • Traffic pattern: Continuous requests for 2 minutes

  • Rate limit configuration: Sliding Window – 5 requests per 1 minute

Metric Category

Metric

Without Rate Limiter

Sliding Window Rate Limiter

CPU Usage

Peak CPU Usage

72%

54%

Average CPU Usage

35-40%

15–25%

Thread Pool

Peak Threads

45

25

Thread Starvation

Yes

No

Requests

Successful Requests

100% (slow)

12%

Rejected (429)

0

82%

Response Time

P95

9.8s

130ms

Note: These benchmarks were captured on a local development machine using Visual Studio Diagnostic Tools. Results may vary based on hardware, workload, and runtime configuration.

Summary

The Sliding Window Rate Limiter is a great choice for APIs that receive uneven, bursty traffic.

By splitting the time window into smaller segments and sliding the window continuously, it keeps your API fast, fair, and stable - even when users or systems send multiple requests in a short time.

If you want to explore other rate limiting strategies, check out the blogs on Fixed Window, Token Bucket, and Concurrency limiters.

FAQs

Q1: Can Sliding Window rate limiting work across multiple servers?

A: By default, No.

Sliding Window rate limiting works per server by default and does not synchronize across multiple servers.

To make it work across multiple servers, a distributed store like Redis is required so all instances share the same counters. We’ll explore distributed rate limiting in an upcoming blog post.

builder.Services.AddStackExchangeRedisRateLimiter(options =>

{

options.ConnectionMultiplexer =

ConnectionMultiplexer.Connect("localhost:6379");

});

Q2: How do I test if my rate limiting is working correctly?

A: Rate limiting can be verified using a unit test that sends multiple requests in quick succession and checks that the API returns 429 Too Many Requests once the configured limit is exceeded.

[Fact]

public async Task SlidingWindowLimiter_Returns_429_On_6th_Request()

{

// Arrange

var responses = new List<HttpResponseMessage>();

// Act

for (int i = 0; i < 6; i++)

{

responses.Add(await _client.GetAsync("/api/inventory-report"));

}

// Assert

Assert.All(

responses.Take(5),

r => Assert.Equal(HttpStatusCode.OK, r.StatusCode)

);

Assert.Equal(

HttpStatusCode.TooManyRequests,

responses.Last().StatusCode

);

}

Q3: Can I use multiple rate limiters on the same endpoint?

A: Yes.

ASP.NET Core allows you to combine multiple rate limiters on a single endpoint.

For example, you can use:

  • A Sliding Window limiter to control bursty traffic

  • An IP-based limiter to prevent abuse from a single client

This approach provides better protection for public or high-risk APIs.

Q4: What happens to requests in the queue (QueueLimit)?

A: When the limit is reached, requests can optionally be queued instead of rejected.

Queued requests wait until permits become available. If the queue is full, new requests are rejected with HTTP 429.

For most production APIs - especially long-running ones - it is recommended to set QueueLimit = 0 and reject excess requests immediately.

Tags

Sliding Window Rate Limiter ASP.NET Core Rate Limiting API Throttling Distributed Systems Performance Web API Traffic Control