|
| 1 | +# Vanilla HTTP Server - Data Flow |
| 2 | + |
| 3 | +This document describes the complete data flow through the high-performance HTTP server, from |
| 4 | +connection establishment to response delivery. |
| 5 | + |
| 6 | +## System Architecture Overview |
| 7 | + |
| 8 | +``` |
| 9 | +┌─────────────────────────────────────────────────────────────┐ |
| 10 | +│ Main Thread │ |
| 11 | +│ - Creates 16 worker threads │ |
| 12 | +│ - Each worker gets its own listen socket (SO_REUSEPORT) │ |
| 13 | +│ - Each worker gets its own epoll instance │ |
| 14 | +│ - Waits for workers (blocking, no CPU waste) │ |
| 15 | +└─────────────────────────────────────────────────────────────┘ |
| 16 | + │ |
| 17 | + ▼ |
| 18 | + ┌──────────────────────────────────────────┐ |
| 19 | + │ Worker Threads (x16) │ |
| 20 | + │ Each runs process_events() in loop │ |
| 21 | + └──────────────────────────────────────────┘ |
| 22 | + │ |
| 23 | + ▼ |
| 24 | +``` |
| 25 | + |
| 26 | +## Connection Flow |
| 27 | + |
| 28 | +### 1. Initial Setup (per worker) |
| 29 | + |
| 30 | +``` |
| 31 | +create_server_socket(port) |
| 32 | +├── socket(AF_INET, SOCK_STREAM, 0) |
| 33 | +├── set_blocking(fd, false) // Non-blocking listen socket |
| 34 | +├── setsockopt(SO_REUSEADDR) // Allow immediate rebind |
| 35 | +├── setsockopt(SO_REUSEPORT) // Multiple sockets on same port |
| 36 | +├── bind(0.0.0.0:3001) |
| 37 | +└── listen(backlog=4096) |
| 38 | +
|
| 39 | +epoll_create1(0) // Create per-worker epoll |
| 40 | +└── add_fd_to_epoll(listen_fd, EPOLLIN | EPOLLET | EPOLLEXCLUSIVE) |
| 41 | +``` |
| 42 | + |
| 43 | +**Key Points:** |
| 44 | + |
| 45 | +- Each worker has its own listening socket bound to the same port |
| 46 | +- `SO_REUSEPORT` enables kernel-level load balancing |
| 47 | +- `EPOLLEXCLUSIVE` prevents thundering herd on accept |
| 48 | + |
| 49 | +### 2. Accept New Connection |
| 50 | + |
| 51 | +``` |
| 52 | +epoll_wait() returns EPOLLIN on listen_fd |
| 53 | +└── handle_accept_loop(epoll_fd, listen_fd) |
| 54 | + └── Loop until EAGAIN: |
| 55 | + ├── accept4(listen_fd, NULL, NULL, SOCK_NONBLOCK) |
| 56 | + │ └── Returns client_fd (already non-blocking) |
| 57 | + ├── setsockopt(client_fd, TCP_NODELAY, 1) |
| 58 | + │ └── Disable Nagle's algorithm for low latency |
| 59 | + └── add_fd_to_epoll(epoll_fd, client_fd, EPOLLIN | EPOLLET) |
| 60 | + └── Register client for read events (edge-triggered) |
| 61 | +``` |
| 62 | + |
| 63 | +**Data Path:** |
| 64 | + |
| 65 | +``` |
| 66 | +Client connects → Kernel queues on one listen socket (SO_REUSEPORT) |
| 67 | + → One worker woken (EPOLLEXCLUSIVE) |
| 68 | + → accept4() returns non-blocking client_fd |
| 69 | + → TCP_NODELAY enabled |
| 70 | + → Client fd added to worker's epoll |
| 71 | +``` |
| 72 | + |
| 73 | +### 3. Request Processing |
| 74 | + |
| 75 | +``` |
| 76 | +epoll_wait() returns EPOLLIN on client_fd |
| 77 | +└── Process readable event: |
| 78 | + ├── recv(client_fd, buffer, 140-1, 0) |
| 79 | + │ └── Read HTTP request into reused buffer |
| 80 | + │ |
| 81 | + ├── decode_http_request(buffer) |
| 82 | + │ ├── parse_request_line() |
| 83 | + │ │ ├── Extract method (GET/POST/etc) |
| 84 | + │ │ ├── Extract path (/user/123) |
| 85 | + │ │ └── Extract HTTP version (HTTP/1.1) |
| 86 | + │ └── Return HttpRequest{buffer, method, path, version} |
| 87 | + │ |
| 88 | + ├── server.request_handler(decoded_request) |
| 89 | + │ ├── Route to controller based on method + path |
| 90 | + │ │ ├── GET / → home_controller() |
| 91 | + │ │ ├── GET /user/:id → get_user_controller(id) |
| 92 | + │ │ └── POST /user → create_user_controller() |
| 93 | + │ └── Return response buffer []u8 |
| 94 | + │ |
| 95 | + └── send(client_fd, response, len, MSG_NOSIGNAL | MSG_DONTWAIT) |
| 96 | + └── Non-blocking send with no SIGPIPE |
| 97 | +``` |
| 98 | + |
| 99 | +**Data Structure Flow:** |
| 100 | + |
| 101 | +``` |
| 102 | +Raw bytes [140]u8 |
| 103 | + ↓ (push_many) |
| 104 | +[]u8 (heap-allocated, exact size) |
| 105 | + ↓ (decode_http_request) |
| 106 | +HttpRequest { |
| 107 | + buffer: []u8 |
| 108 | + method: Slice{start, len} |
| 109 | + path: Slice{start, len} |
| 110 | + version: Slice{start, len} |
| 111 | +} |
| 112 | + ↓ (request_handler) |
| 113 | +Response []u8 (e.g., "HTTP/1.1 200 OK\r\n...") |
| 114 | + ↓ (send) |
| 115 | +TCP packets to client |
| 116 | +``` |
| 117 | + |
| 118 | +### 4. Keep-Alive Behavior |
| 119 | + |
| 120 | +``` |
| 121 | +After send(): |
| 122 | +├── Connection remains open (Connection: keep-alive) |
| 123 | +├── Client fd stays in epoll |
| 124 | +└── Next request from same client: |
| 125 | + └── epoll_wait() triggers EPOLLIN again |
| 126 | + └── Back to step 3 |
| 127 | +``` |
| 128 | + |
| 129 | +**Connection Lifecycle:** |
| 130 | + |
| 131 | +``` |
| 132 | +accept4() → add to epoll → [recv/send loop] → close on: |
| 133 | + ├── recv() = 0 (FIN) |
| 134 | + ├── recv() < 0 && errno != EAGAIN |
| 135 | + └── EPOLLHUP | EPOLLERR |
| 136 | +``` |
| 137 | + |
| 138 | +### 5. Connection Closure |
| 139 | + |
| 140 | +``` |
| 141 | +Trigger: recv() == 0 or EPOLLHUP/EPOLLERR |
| 142 | +└── handle_client_closure(epoll_fd, client_fd) |
| 143 | + ├── remove_fd_from_epoll(epoll_fd, client_fd) |
| 144 | + │ └── epoll_ctl(EPOLL_CTL_DEL) |
| 145 | + └── close_socket(client_fd) |
| 146 | + └── close(client_fd) with EINTR retry |
| 147 | +``` |
| 148 | + |
| 149 | +**Closure Scenarios:** |
| 150 | + |
| 151 | +1. **Clean shutdown**: Client sends FIN → `recv()` returns 0 |
| 152 | +2. **Error state**: Socket error → `EPOLLERR` event |
| 153 | +3. **Client disconnect**: Connection reset → `EPOLLHUP` event |
| 154 | +4. **Decode failure**: Invalid HTTP → send 400, then close |
| 155 | +5. **Handler error**: Internal error → send 400, then close |
| 156 | + |
| 157 | +## Memory Management |
| 158 | + |
| 159 | +### Buffer Reuse Strategy |
| 160 | + |
| 161 | +``` |
| 162 | +process_events() { |
| 163 | + mut events := [4096]C.epoll_event{} // Stack, reused per loop |
| 164 | + mut request_buffer := [140]u8{} // Stack, reused per loop |
| 165 | +
|
| 166 | + for { // Event loop |
| 167 | + num_events := epoll_wait(&events[0], 4096, -1) |
| 168 | +
|
| 169 | + for i in 0..num_events { |
| 170 | + // Per-request heap allocation only when needed |
| 171 | + mut readed_request_buffer := []u8{cap: bytes_read} |
| 172 | + readed_request_buffer.push_many(&request_buffer[0], bytes_read) |
| 173 | +
|
| 174 | + // HttpRequest references the buffer (no copy) |
| 175 | + decoded := decode_http_request(readed_request_buffer) |
| 176 | +
|
| 177 | + // Response may be static const or dynamically built |
| 178 | + response := request_handler(decoded) |
| 179 | + } |
| 180 | + } |
| 181 | +} |
| 182 | +``` |
| 183 | + |
| 184 | +**Allocation Profile:** |
| 185 | + |
| 186 | +- **Stack**: Event array (32KB), request buffer (140 bytes) |
| 187 | +- **Heap**: Per-request buffer allocation (exact size), response buffers |
| 188 | +- **Zero-copy**: HttpRequest uses Slices pointing into buffer |
| 189 | + |
| 190 | +## Performance Optimizations |
| 191 | + |
| 192 | +### 1. Edge-Triggered Mode (EPOLLET) |
| 193 | + |
| 194 | +``` |
| 195 | +Level-triggered (default): |
| 196 | +- epoll_wait() returns every time fd is readable |
| 197 | +- Wastes CPU on repeated notifications |
| 198 | +
|
| 199 | +Edge-triggered (EPOLLET): |
| 200 | +- epoll_wait() returns only on state change |
| 201 | +- Must drain recv() until EAGAIN |
| 202 | +- Higher throughput, lower CPU |
| 203 | +``` |
| 204 | + |
| 205 | +### 2. EPOLLEXCLUSIVE |
| 206 | + |
| 207 | +``` |
| 208 | +Without EPOLLEXCLUSIVE: |
| 209 | +- New connection wakes ALL workers |
| 210 | +- Only one can accept(), others waste CPU (thundering herd) |
| 211 | +
|
| 212 | +With EPOLLEXCLUSIVE: |
| 213 | +- Kernel wakes only ONE worker |
| 214 | +- Eliminates wasted wakeups |
| 215 | +- ~10% throughput improvement |
| 216 | +``` |
| 217 | + |
| 218 | +### 3. SO_REUSEPORT |
| 219 | + |
| 220 | +``` |
| 221 | +Single listen socket: |
| 222 | +- All workers contend on accept() lock |
| 223 | +- Kernel bottleneck |
| 224 | +
|
| 225 | +Per-worker socket (SO_REUSEPORT): |
| 226 | +- Kernel load-balances incoming connections |
| 227 | +- No shared lock contention |
| 228 | +- Near-linear scaling with cores |
| 229 | +``` |
| 230 | + |
| 231 | +### 4. TCP_NODELAY |
| 232 | + |
| 233 | +``` |
| 234 | +Nagle's algorithm (default): |
| 235 | +- Buffers small writes for efficiency |
| 236 | +- Adds latency (up to 200ms) |
| 237 | +
|
| 238 | +TCP_NODELAY: |
| 239 | +- Sends immediately |
| 240 | +- Critical for request-response patterns |
| 241 | +- Reduces P99 latency |
| 242 | +``` |
| 243 | + |
| 244 | +### 5. MSG_NOSIGNAL | MSG_DONTWAIT |
| 245 | + |
| 246 | +``` |
| 247 | +Default send(): |
| 248 | +- Raises SIGPIPE if peer closed |
| 249 | +- Blocks if send buffer full |
| 250 | +
|
| 251 | +MSG_NOSIGNAL | MSG_DONTWAIT: |
| 252 | +- Returns EPIPE instead of signal |
| 253 | +- Returns EAGAIN instead of blocking |
| 254 | +- Allows graceful error handling |
| 255 | +``` |
| 256 | + |
| 257 | +## Concurrency Model |
| 258 | + |
| 259 | +``` |
| 260 | +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ |
| 261 | +│ Worker 1 │ │ Worker 2 │ │ Worker N │ |
| 262 | +│ │ │ │ │ │ |
| 263 | +│ listen_fd=3 │ │ listen_fd=4 │ │ listen_fd=N │ |
| 264 | +│ epoll_fd=5 │ │ epoll_fd=6 │ │ epoll_fd=M │ |
| 265 | +│ │ │ │ │ │ |
| 266 | +│ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ |
| 267 | +│ │ Client │ │ │ │ Client │ │ │ │ Client │ │ |
| 268 | +│ │ Pool │ │ │ │ Pool │ │ │ │ Pool │ │ |
| 269 | +│ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ |
| 270 | +└─────────────┘ └─────────────┘ └─────────────┘ |
| 271 | + ▲ ▲ ▲ |
| 272 | + └────────────────┴──────────────────┘ |
| 273 | + Kernel distributes via SO_REUSEPORT |
| 274 | +``` |
| 275 | + |
| 276 | +**No Shared State:** |
| 277 | + |
| 278 | +- Each worker is independent |
| 279 | +- No locks or atomics in hot path |
| 280 | +- Perfect CPU cache locality |
| 281 | + |
| 282 | +## Error Handling |
| 283 | + |
| 284 | +``` |
| 285 | +Request Processing Errors: |
| 286 | +├── decode_http_request() fails |
| 287 | +│ └── Send 400 Bad Request + close |
| 288 | +├── request_handler() fails |
| 289 | +│ └── Send 400 Bad Request + close |
| 290 | +└── send() returns -1 |
| 291 | + ├── EAGAIN → Ignore (would need send buffer) |
| 292 | + ├── EPIPE → Close connection |
| 293 | + └── Other → Log and close |
| 294 | +
|
| 295 | +Connection Errors: |
| 296 | +├── accept4() fails |
| 297 | +│ ├── EAGAIN → Break (no more pending) |
| 298 | +│ └── Other → Log, continue loop |
| 299 | +├── recv() == 0 |
| 300 | +│ └── Clean FIN, close gracefully |
| 301 | +└── EPOLLHUP | EPOLLERR |
| 302 | + └── Force close |
| 303 | +``` |
| 304 | + |
| 305 | +## Metrics & Observability |
| 306 | + |
| 307 | +The server is optimized for throughput over observability. Removed metrics: |
| 308 | + |
| 309 | +- No per-request timing |
| 310 | +- No closure reason tracking |
| 311 | +- No error counters |
| 312 | + |
| 313 | +For debugging, use: |
| 314 | + |
| 315 | +```sh |
| 316 | +# System-level metrics |
| 317 | +ss -s # Socket statistics |
| 318 | +netstat -an | grep 3001 # Connection states |
| 319 | + |
| 320 | +# Application tracing |
| 321 | +strace -p <pid> -e epoll_wait,accept4,recv,send |
| 322 | + |
| 323 | +# Performance profiling |
| 324 | +perf record -F 99 -p <pid> |
| 325 | +perf report |
| 326 | +``` |
| 327 | + |
| 328 | +## Benchmark Results |
| 329 | + |
| 330 | +``` |
| 331 | +Configuration: 16 workers, 512 concurrent connections, 10 seconds |
| 332 | +Hardware: Multi-core x86_64 |
| 333 | +
|
| 334 | +Results: |
| 335 | +- Requests/sec: 510,480 |
| 336 | +- Latency P50: 1.29ms |
| 337 | +- Latency P99: ~5ms (estimated from stddev) |
| 338 | +- Throughput: 30.18 MB/sec |
| 339 | +- CPU: ~95% (16 cores saturated) |
| 340 | +``` |
| 341 | + |
| 342 | +**Bottlenecks:** |
| 343 | + |
| 344 | +1. Client receive buffer (wrk limitation) |
| 345 | +2. Context switching overhead |
| 346 | +3. System call cost (recv/send) |
| 347 | + |
| 348 | +**Future Optimizations:** |
| 349 | + |
| 350 | +- **io_uring**: Zero-copy I/O with submission/completion queues, eliminating syscall overhead |
| 351 | +- **Batched sends**: Use `writev()` or `sendmsg()` with scatter-gather to send response in one |
| 352 | + syscall |
| 353 | +- **Response caching**: Pre-serialize common responses at startup, bypass routing/handler for |
| 354 | + cache hits |
| 355 | +- **CPU affinity**: Pin worker threads to specific cores to improve cache locality |
| 356 | +- **DPDK bypass**: Bypass kernel network stack entirely for maximum throughput (userspace TCP/IP) |
| 357 | +- **HTTP/2 multiplexing**: Share single connection for multiple requests, reduce connection |
| 358 | + overhead |
| 359 | +- **JIT compilation**: V's `-prod` with PGO (profile-guided optimization) for hot paths |
| 360 | +- **Memory pool**: Pre-allocate buffers in arena to eliminate allocation overhead |
| 361 | +- **Lock-free queues**: If cross-worker communication needed, use MPSC queues instead of shared |
| 362 | + epoll |
0 commit comments