When Hundreds of Users Hit Your API - Part 1

SERIES 1 • PART 1 OF 4

The Life of a Backend Request - How Real APIs Work

When we first build a backend API, things look very simple.

A request comes in, the server processes it, and a response goes out.

For example, imagine we are building a backend for a school tech fest registration system. Students from different schools can register for events like coding competitions, robotics challenges, and quizzes.

Our backend uses a simple stack:

FastAPI for the backend framework
Gunicorn as the server
PostgreSQL as the database
Redis workers for background jobs
External email service for confirmations

The architecture roughly looks like this:


Students
   ↓
Load Balancer
   ↓
Gunicorn
   ↓
Worker Processes
   ↓
FastAPI
   ↓
Database + External APIs

A registration endpoint might look like this:


@app.post("/register")
async def register(student: Student):
    save_registration(student)
    reserve_seat(student)
    generate_ticket(student)
    send_confirmation_email(student)

Looks harmless.

But the moment the registration link is shared in the school WhatsApp group, hundreds of students start clicking Register.

Now the backend has a real problem.

The first challenge: too many requests

If the server processed requests one by one, every student would wait for the previous one.

That is not acceptable for real systems.

Instead, backend servers run multiple worker processes.

A process is simply a running instance of a program managed by the operating system.

When we start our server like this:


gunicorn -w 4 main:app

Gunicorn creates:

1 master process
4 worker processes

Each worker loads the same backend code.


Gunicorn Master
   |
   |--- Worker 1 → FastAPI running
   |--- Worker 2 → FastAPI running
   |--- Worker 3 → FastAPI running
   |--- Worker 4 → FastAPI running

So yes - our backend application is essentially running four times, once inside each worker.

Each worker can:

• receive HTTP requests
• execute backend logic
• return responses

How requests reach workers

Before reaching workers, requests usually pass through a load balancer.

A load balancer distributes incoming traffic across available workers.


Students → Load Balancer
                 ↓
        Worker 1  Worker 2  Worker 3  Worker 4

In many production systems the flow is:


Internet
   ↓
Nginx (Load Balancer)
   ↓
Gunicorn
   ↓
Workers

The load balancer spreads traffic, while Gunicorn manages workers.

If there are 4 workers, can only 4 users connect?

No.

Workers are not fixed seats.

Think of them like cashiers in a supermarket. If there are four cashiers, hundreds of customers can still enter the store. Each cashier serves one customer at a time and then moves to the next.

Workers behave the same way.

Why workers alone are not enough

Even with workers, something else slows the system down.

Network calls.

For example:


send_confirmation_email(student)

This operation might take two seconds.

If the worker waits for those two seconds doing nothing, performance drops quickly.

This is where asynchronous programming becomes important.

And that is where our backend story continues.

NEXT IN SERIES →

Part 2: Why Async APIs Can Handle Thousands of Requests

The Life of a Backend Request Series

Part 1: When Hundreds of Users Hit Your API
Part 2: Why Async APIs Can Handle Thousands of Requests
Part 3: Race Conditions, Locks and Safe Data Handling
Part 4: Making APIs Reliable in the Real World