When we first build a backend API, things look very simple.
A request comes in, the server processes it, and a response goes out.
For example, imagine we are building a backend for a school tech fest registration system. Students from different schools can register for events like coding competitions, robotics challenges, and quizzes.
Our backend uses a simple stack:
FastAPI for the backend framework
Gunicorn as the server
PostgreSQL as the database
Redis workers for background jobs
External email service for confirmations
The architecture roughly looks like this:
Students
↓
Load Balancer
↓
Gunicorn
↓
Worker Processes
↓
FastAPI
↓
Database + External APIs
A registration endpoint might look like this:
@app.post("/register")
async def register(student: Student):
save_registration(student)
reserve_seat(student)
generate_ticket(student)
send_confirmation_email(student)
Looks harmless.
But the moment the registration link is shared in the school WhatsApp group, hundreds of students start clicking Register.
Now the backend has a real problem.
The first challenge: too many requests
If the server processed requests one by one, every student would wait for the previous one.
That is not acceptable for real systems.
Instead, backend servers run multiple worker processes.
A process is simply a running instance of a program managed by the operating system.
When we start our server like this:
gunicorn -w 4 main:app
Gunicorn creates:
1 master process
4 worker processes
Each worker loads the same backend code.
Gunicorn Master
|
|--- Worker 1 → FastAPI running
|--- Worker 2 → FastAPI running
|--- Worker 3 → FastAPI running
|--- Worker 4 → FastAPI running
So yes - our backend application is essentially running four times, once inside each worker.
Each worker can:
• receive HTTP requests
• execute backend logic
• return responses
How requests reach workers
Before reaching workers, requests usually pass through a load balancer.
A load balancer distributes incoming traffic across available workers.
Students → Load Balancer
↓
Worker 1 Worker 2 Worker 3 Worker 4
In many production systems the flow is:
Internet
↓
Nginx (Load Balancer)
↓
Gunicorn
↓
Workers
The load balancer spreads traffic, while Gunicorn manages workers.
If there are 4 workers, can only 4 users connect?
No.
Workers are not fixed seats.
Think of them like cashiers in a supermarket. If there are four cashiers, hundreds of customers can still enter the store. Each cashier serves one customer at a time and then moves to the next.
Workers behave the same way.
Why workers alone are not enough
Even with workers, something else slows the system down.
Network calls.
For example:
send_confirmation_email(student)
This operation might take two seconds.
If the worker waits for those two seconds doing nothing, performance drops quickly.
This is where asynchronous programming becomes important.
And that is where our backend story continues.