Skip to content

Commit 853414d

Browse files
authored
examples: fix and improve vanilla_http_server (#25905)
1 parent 75ecc8d commit 853414d

File tree

3 files changed

+537
-128
lines changed

3 files changed

+537
-128
lines changed
Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# Vanilla HTTP Server - Data Flow
2+
3+
This document describes the complete data flow through the high-performance HTTP server, from
4+
connection establishment to response delivery.
5+
6+
## System Architecture Overview
7+
8+
```
9+
┌─────────────────────────────────────────────────────────────┐
10+
│ Main Thread │
11+
│ - Creates 16 worker threads │
12+
│ - Each worker gets its own listen socket (SO_REUSEPORT) │
13+
│ - Each worker gets its own epoll instance │
14+
│ - Waits for workers (blocking, no CPU waste) │
15+
└─────────────────────────────────────────────────────────────┘
16+
17+
18+
┌──────────────────────────────────────────┐
19+
│ Worker Threads (x16) │
20+
│ Each runs process_events() in loop │
21+
└──────────────────────────────────────────┘
22+
23+
24+
```
25+
26+
## Connection Flow
27+
28+
### 1. Initial Setup (per worker)
29+
30+
```
31+
create_server_socket(port)
32+
├── socket(AF_INET, SOCK_STREAM, 0)
33+
├── set_blocking(fd, false) // Non-blocking listen socket
34+
├── setsockopt(SO_REUSEADDR) // Allow immediate rebind
35+
├── setsockopt(SO_REUSEPORT) // Multiple sockets on same port
36+
├── bind(0.0.0.0:3001)
37+
└── listen(backlog=4096)
38+
39+
epoll_create1(0) // Create per-worker epoll
40+
└── add_fd_to_epoll(listen_fd, EPOLLIN | EPOLLET | EPOLLEXCLUSIVE)
41+
```
42+
43+
**Key Points:**
44+
45+
- Each worker has its own listening socket bound to the same port
46+
- `SO_REUSEPORT` enables kernel-level load balancing
47+
- `EPOLLEXCLUSIVE` prevents thundering herd on accept
48+
49+
### 2. Accept New Connection
50+
51+
```
52+
epoll_wait() returns EPOLLIN on listen_fd
53+
└── handle_accept_loop(epoll_fd, listen_fd)
54+
└── Loop until EAGAIN:
55+
├── accept4(listen_fd, NULL, NULL, SOCK_NONBLOCK)
56+
│ └── Returns client_fd (already non-blocking)
57+
├── setsockopt(client_fd, TCP_NODELAY, 1)
58+
│ └── Disable Nagle's algorithm for low latency
59+
└── add_fd_to_epoll(epoll_fd, client_fd, EPOLLIN | EPOLLET)
60+
└── Register client for read events (edge-triggered)
61+
```
62+
63+
**Data Path:**
64+
65+
```
66+
Client connects → Kernel queues on one listen socket (SO_REUSEPORT)
67+
→ One worker woken (EPOLLEXCLUSIVE)
68+
→ accept4() returns non-blocking client_fd
69+
→ TCP_NODELAY enabled
70+
→ Client fd added to worker's epoll
71+
```
72+
73+
### 3. Request Processing
74+
75+
```
76+
epoll_wait() returns EPOLLIN on client_fd
77+
└── Process readable event:
78+
├── recv(client_fd, buffer, 140-1, 0)
79+
│ └── Read HTTP request into reused buffer
80+
81+
├── decode_http_request(buffer)
82+
│ ├── parse_request_line()
83+
│ │ ├── Extract method (GET/POST/etc)
84+
│ │ ├── Extract path (/user/123)
85+
│ │ └── Extract HTTP version (HTTP/1.1)
86+
│ └── Return HttpRequest{buffer, method, path, version}
87+
88+
├── server.request_handler(decoded_request)
89+
│ ├── Route to controller based on method + path
90+
│ │ ├── GET / → home_controller()
91+
│ │ ├── GET /user/:id → get_user_controller(id)
92+
│ │ └── POST /user → create_user_controller()
93+
│ └── Return response buffer []u8
94+
95+
└── send(client_fd, response, len, MSG_NOSIGNAL | MSG_DONTWAIT)
96+
└── Non-blocking send with no SIGPIPE
97+
```
98+
99+
**Data Structure Flow:**
100+
101+
```
102+
Raw bytes [140]u8
103+
↓ (push_many)
104+
[]u8 (heap-allocated, exact size)
105+
↓ (decode_http_request)
106+
HttpRequest {
107+
buffer: []u8
108+
method: Slice{start, len}
109+
path: Slice{start, len}
110+
version: Slice{start, len}
111+
}
112+
↓ (request_handler)
113+
Response []u8 (e.g., "HTTP/1.1 200 OK\r\n...")
114+
↓ (send)
115+
TCP packets to client
116+
```
117+
118+
### 4. Keep-Alive Behavior
119+
120+
```
121+
After send():
122+
├── Connection remains open (Connection: keep-alive)
123+
├── Client fd stays in epoll
124+
└── Next request from same client:
125+
└── epoll_wait() triggers EPOLLIN again
126+
└── Back to step 3
127+
```
128+
129+
**Connection Lifecycle:**
130+
131+
```
132+
accept4() → add to epoll → [recv/send loop] → close on:
133+
├── recv() = 0 (FIN)
134+
├── recv() < 0 && errno != EAGAIN
135+
└── EPOLLHUP | EPOLLERR
136+
```
137+
138+
### 5. Connection Closure
139+
140+
```
141+
Trigger: recv() == 0 or EPOLLHUP/EPOLLERR
142+
└── handle_client_closure(epoll_fd, client_fd)
143+
├── remove_fd_from_epoll(epoll_fd, client_fd)
144+
│ └── epoll_ctl(EPOLL_CTL_DEL)
145+
└── close_socket(client_fd)
146+
└── close(client_fd) with EINTR retry
147+
```
148+
149+
**Closure Scenarios:**
150+
151+
1. **Clean shutdown**: Client sends FIN → `recv()` returns 0
152+
2. **Error state**: Socket error → `EPOLLERR` event
153+
3. **Client disconnect**: Connection reset → `EPOLLHUP` event
154+
4. **Decode failure**: Invalid HTTP → send 400, then close
155+
5. **Handler error**: Internal error → send 400, then close
156+
157+
## Memory Management
158+
159+
### Buffer Reuse Strategy
160+
161+
```
162+
process_events() {
163+
mut events := [4096]C.epoll_event{} // Stack, reused per loop
164+
mut request_buffer := [140]u8{} // Stack, reused per loop
165+
166+
for { // Event loop
167+
num_events := epoll_wait(&events[0], 4096, -1)
168+
169+
for i in 0..num_events {
170+
// Per-request heap allocation only when needed
171+
mut readed_request_buffer := []u8{cap: bytes_read}
172+
readed_request_buffer.push_many(&request_buffer[0], bytes_read)
173+
174+
// HttpRequest references the buffer (no copy)
175+
decoded := decode_http_request(readed_request_buffer)
176+
177+
// Response may be static const or dynamically built
178+
response := request_handler(decoded)
179+
}
180+
}
181+
}
182+
```
183+
184+
**Allocation Profile:**
185+
186+
- **Stack**: Event array (32KB), request buffer (140 bytes)
187+
- **Heap**: Per-request buffer allocation (exact size), response buffers
188+
- **Zero-copy**: HttpRequest uses Slices pointing into buffer
189+
190+
## Performance Optimizations
191+
192+
### 1. Edge-Triggered Mode (EPOLLET)
193+
194+
```
195+
Level-triggered (default):
196+
- epoll_wait() returns every time fd is readable
197+
- Wastes CPU on repeated notifications
198+
199+
Edge-triggered (EPOLLET):
200+
- epoll_wait() returns only on state change
201+
- Must drain recv() until EAGAIN
202+
- Higher throughput, lower CPU
203+
```
204+
205+
### 2. EPOLLEXCLUSIVE
206+
207+
```
208+
Without EPOLLEXCLUSIVE:
209+
- New connection wakes ALL workers
210+
- Only one can accept(), others waste CPU (thundering herd)
211+
212+
With EPOLLEXCLUSIVE:
213+
- Kernel wakes only ONE worker
214+
- Eliminates wasted wakeups
215+
- ~10% throughput improvement
216+
```
217+
218+
### 3. SO_REUSEPORT
219+
220+
```
221+
Single listen socket:
222+
- All workers contend on accept() lock
223+
- Kernel bottleneck
224+
225+
Per-worker socket (SO_REUSEPORT):
226+
- Kernel load-balances incoming connections
227+
- No shared lock contention
228+
- Near-linear scaling with cores
229+
```
230+
231+
### 4. TCP_NODELAY
232+
233+
```
234+
Nagle's algorithm (default):
235+
- Buffers small writes for efficiency
236+
- Adds latency (up to 200ms)
237+
238+
TCP_NODELAY:
239+
- Sends immediately
240+
- Critical for request-response patterns
241+
- Reduces P99 latency
242+
```
243+
244+
### 5. MSG_NOSIGNAL | MSG_DONTWAIT
245+
246+
```
247+
Default send():
248+
- Raises SIGPIPE if peer closed
249+
- Blocks if send buffer full
250+
251+
MSG_NOSIGNAL | MSG_DONTWAIT:
252+
- Returns EPIPE instead of signal
253+
- Returns EAGAIN instead of blocking
254+
- Allows graceful error handling
255+
```
256+
257+
## Concurrency Model
258+
259+
```
260+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
261+
│ Worker 1 │ │ Worker 2 │ │ Worker N │
262+
│ │ │ │ │ │
263+
│ listen_fd=3 │ │ listen_fd=4 │ │ listen_fd=N │
264+
│ epoll_fd=5 │ │ epoll_fd=6 │ │ epoll_fd=M │
265+
│ │ │ │ │ │
266+
│ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │
267+
│ │ Client │ │ │ │ Client │ │ │ │ Client │ │
268+
│ │ Pool │ │ │ │ Pool │ │ │ │ Pool │ │
269+
│ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │
270+
└─────────────┘ └─────────────┘ └─────────────┘
271+
▲ ▲ ▲
272+
└────────────────┴──────────────────┘
273+
Kernel distributes via SO_REUSEPORT
274+
```
275+
276+
**No Shared State:**
277+
278+
- Each worker is independent
279+
- No locks or atomics in hot path
280+
- Perfect CPU cache locality
281+
282+
## Error Handling
283+
284+
```
285+
Request Processing Errors:
286+
├── decode_http_request() fails
287+
│ └── Send 400 Bad Request + close
288+
├── request_handler() fails
289+
│ └── Send 400 Bad Request + close
290+
└── send() returns -1
291+
├── EAGAIN → Ignore (would need send buffer)
292+
├── EPIPE → Close connection
293+
└── Other → Log and close
294+
295+
Connection Errors:
296+
├── accept4() fails
297+
│ ├── EAGAIN → Break (no more pending)
298+
│ └── Other → Log, continue loop
299+
├── recv() == 0
300+
│ └── Clean FIN, close gracefully
301+
└── EPOLLHUP | EPOLLERR
302+
└── Force close
303+
```
304+
305+
## Metrics & Observability
306+
307+
The server is optimized for throughput over observability. Removed metrics:
308+
309+
- No per-request timing
310+
- No closure reason tracking
311+
- No error counters
312+
313+
For debugging, use:
314+
315+
```sh
316+
# System-level metrics
317+
ss -s # Socket statistics
318+
netstat -an | grep 3001 # Connection states
319+
320+
# Application tracing
321+
strace -p <pid> -e epoll_wait,accept4,recv,send
322+
323+
# Performance profiling
324+
perf record -F 99 -p <pid>
325+
perf report
326+
```
327+
328+
## Benchmark Results
329+
330+
```
331+
Configuration: 16 workers, 512 concurrent connections, 10 seconds
332+
Hardware: Multi-core x86_64
333+
334+
Results:
335+
- Requests/sec: 510,480
336+
- Latency P50: 1.29ms
337+
- Latency P99: ~5ms (estimated from stddev)
338+
- Throughput: 30.18 MB/sec
339+
- CPU: ~95% (16 cores saturated)
340+
```
341+
342+
**Bottlenecks:**
343+
344+
1. Client receive buffer (wrk limitation)
345+
2. Context switching overhead
346+
3. System call cost (recv/send)
347+
348+
**Future Optimizations:**
349+
350+
- **io_uring**: Zero-copy I/O with submission/completion queues, eliminating syscall overhead
351+
- **Batched sends**: Use `writev()` or `sendmsg()` with scatter-gather to send response in one
352+
syscall
353+
- **Response caching**: Pre-serialize common responses at startup, bypass routing/handler for
354+
cache hits
355+
- **CPU affinity**: Pin worker threads to specific cores to improve cache locality
356+
- **DPDK bypass**: Bypass kernel network stack entirely for maximum throughput (userspace TCP/IP)
357+
- **HTTP/2 multiplexing**: Share single connection for multiple requests, reduce connection
358+
overhead
359+
- **JIT compilation**: V's `-prod` with PGO (profile-guided optimization) for hot paths
360+
- **Memory pool**: Pre-allocate buffers in arena to eliminate allocation overhead
361+
- **Lock-free queues**: If cross-worker communication needed, use MPSC queues instead of shared
362+
epoll

0 commit comments

Comments
 (0)