HTTP Framework Benchmark

⚠️ Disclaimer & Info (click to expand/collapse)

This is NOT production-grade benchmark data. This project was created for a CodeDay at {CodeWorks} in a limited timespan as a visual support to discuss HTTP frameworks and concurrency models.

Author bias: I'm primarily a Java developer. Java implementations received more attention. Non-Java apps were AI-generated (Claude) and may contain errors.

The web UI was fully vibe-coded with Claude and never manually reviewed. This was intentionally done to test "full vibe coding" as an experiment. It works for my purposes, but may contain bugs, inefficiencies, or break on certain browsers.

Read full disclaimer & methodology...

Author Bias

I am primarily a Java developer. This inherently introduces bias:

  • I have deeper knowledge of Java frameworks and their optimal configurations
  • I may have inadvertently optimized Java implementations better than others
  • My code review of non-Java implementations may miss language-specific anti-patterns or performance pitfalls
  • Framework choices in other languages were based on popularity rather than personal expertise
On the bright side, my bias occasionally helps catch bugs! When I saw Node.js suspiciously outperforming everything else, I thought "there's no way Node is THAT fast" and investigated. Turns out Claude had implemented the artificial delay with an empty loop that V8 happily optimized away to nothing. So yes, my Java-flavored skepticism saved the benchmark integrity. You're welcome, Node developers. (The bug is fixed now, Node is back to normal performance... most of the time. I still catch it being suspiciously fast in some results.)

Code Authorship

Different parts of this project have different levels of human involvement:

HAND-WRITTEN CODE (by me):

  • Java shared domain module
  • Quarkus implementations
  • Spring implementations
  • Spring Boot 3 implementations
  • Spring Boot 4 implementations
  • Core benchmark logic and delay simulation

AI-ASSISTED (generated by Claude, reviewed by me):

  • Micronaut implementations
  • Helidon SE implementations
  • OpenLiberty implementations
  • WildFly implementations
  • TomEE implementations
  • GraalVM native-image configurations (reflect-config.json and such) for frameworks that didn't work out of the box
  • All build scripts (Python)

FULLY AI-GENERATED (generated by Claude, reviewed but may contain errors):

  • All Go implementations
  • All Rust implementations
  • All Python implementations
  • All PHP implementations
  • All Node.js implementations
  • The entire web report UI

Web Report UI

THE WEB UI WAS FULLY VIBE-CODED WITH CLAUDE AND THE CODE WAS NEVER CHECKED.

Frontend development is not my main expertise. The entire web report interface (HTML, CSS, JavaScript, charting logic) was generated through AI conversation without manual code review. This was intentionally done to test "full vibe coding" as an experiment. It works for my purposes, but:

  • May contain bugs or inefficiencies
  • May not follow frontend best practices
  • May have accessibility issues
  • May break on certain browsers or screen sizes

Use the web UI as a visualization tool, not as a reference implementation.

UI Controls Guide

Filter buttons (Language, Framework, Pattern, Build Type):

  • Left-click: Toggle individual items on/off
  • Right-click on an item: Select ONLY that item (deselect all others)
  • Right-click on an already-solo item: Select ALL items in that group

Special filters:

  • "Merge": Combines multiple runs of the same configuration into one series
  • "Best": Shows only the best-performing variant per framework (does NOT work in all views - it's buggy, you've been warned)

Chart controls:

  • Grid: Display multiple charts in a grid layout on one screen
  • Legend: Toggle the chart legend visibility
  • Crop: Crops the Y-axis to exclude extreme outliers. This was specifically added because some tests failed catastrophically (timeouts, errors) and produced data points that made the rest of the chart unreadable. Use this to focus on the "normal" performance range.
Resolution slider - HUGE WARNING

The resolution slider controls data point sampling/aggregation.

High resolution may CRASH YOUR BROWSER if you have too many series selected. Only use high resolution on heavily filtered views.

Rule of thumb:

  • Many series selected → Use LOW resolution
  • Few series selected → Use HIGH resolution for accurate analysis

Non-Java Implementations

Important caveats about non-Java language implementations:

1. NO SHARED CODE

Unlike Java apps which share a domain module, each non-Java framework is a completely standalone project. This means:

  • Delay simulation logic was duplicated (may have subtle differences)
  • JSON parsing/serialization varies by framework
  • Error handling is inconsistent across implementations

2. DEV SERVER CONCERNS

Some implementations may be running in development mode rather than production mode. I identified and fixed several cases, but others may remain:

  • Flask/Django without gunicorn in production mode
  • Node.js with debug flags
  • PHP with development error reporting

If you notice an implementation performing unexpectedly poorly, this could be the cause.

3. LIMITED OPTIMIZATION

I intentionally kept the code simple and avoided framework-specific optimizations to maintain consistency. However, this means:

  • Implementations may not represent the framework's full potential
  • Production-grade code would likely perform differently
  • Connection pooling, caching, and other optimizations were not used

Test Environment

Hardware:

  • CPU: Intel Core i9-14900KS
  • CPU Mode: POWER-SAVING MODE (because Intel's recent CPUs have stability issues at full power - yes, Intel sucks)
  • This means performance numbers are artificially lower than what the hardware could achieve at full power
  • WSL2 allocated memory: 120GB
  • No memory limits were applied to any application
  • Java thrives in environments with abundant memory (as you can see in the benchmarks) - results may differ significantly in memory-constrained environments like containers with low limits

Software Environment:

  • Applications ran inside WSL2 (Windows Subsystem for Linux 2)
  • k6 load testing ran from Windows host
  • Tests used direct WSL2 IP address to minimize network overhead
  • This setup was chosen to reduce interference between load generator and applications under test

Network Path:

Windows k6 → Hyper-V virtual switch → WSL2 VM → Application

This is NOT equivalent to:

  • Native Linux performance
  • Docker container performance
  • Bare metal performance
  • Cloud/VM performance

Compute Endpoint Limitations

The compute workload calculates square roots to a specified precision. There are known issues:

1. DOUBLE PRECISION LIMITATION

The implementation uses double-precision floating point, which has limited precision (~15-17 significant digits). This means:

  • Very high precision targets may not be achievable
  • The algorithm may terminate early or behave unexpectedly
  • This could explain the dual grouping of data points visible in some benchmark charts

2. ALGORITHM QUALITY

The square root calculation algorithm was not optimized for numerical accuracy or performance. It's a simple iterative approach that may:

  • Have varying iteration counts for similar inputs
  • Produce inconsistent CPU load across implementations
  • Not be the best representation of "compute-bound" workloads

3. CROSS-LANGUAGE INCONSISTENCY

Floating-point behavior varies by language and runtime:

  • Java, Go, Rust may produce different results for the same input
  • Some languages may optimize the calculation differently
  • JIT compilation may affect computation patterns over time

Benchmark Methodology

Workload Types:

  • "fast": Minimal processing to test pure framework latency. Includes basic payload validation to prevent frameworks from over-optimizing it away.
  • "slow": Fixed artificial delay (Thread.sleep or equivalent)
  • "compute": CPU-bound square root calculation
  • "sqrt_stable": Deterministic code path - the same predictable code runs every time, making it friendly to CPU branch prediction and JIT optimization
  • "sqrt_unstable": Random branch selection - a random choice determines which code branch executes, but the overall computational complexity remains the same. This defeats branch prediction and may show different JIT behavior.
  • "real": Attempts to mimic a realistic endpoint with both computation and wait time, without using any external resources (database, network calls) to avoid contaminating the benchmark with external factors.

Test Configuration:

Load tests (k6):

  • Virtual Users (VUs): Ramped from 0 to target
  • Ramp-up time: 5 minutes
  • Sustained load: 1 minute at target VUs
  • Warmup: JIT applications had warmup period before measurement

Startup benchmarks:

  • Each application was started 10 times
  • 50 HTTP requests (curl) per start to measure latency
  • Memory measured via /proc after startup

What These Benchmarks DO Measure:

  • Relative performance differences between frameworks
  • Throughput under sustained concurrent load
  • Latency distribution patterns
  • How frameworks handle thread/connection exhaustion

What These Benchmarks DO NOT Measure:

  • Real-world application performance (no database, no external services)
  • Cold start in serverless environments
  • Memory efficiency for long-running applications
  • Framework ergonomics or developer productivity

Recommendations

When interpreting results:

  1. Focus on PATTERNS, not absolute numbers
    The test environment introduces overhead. Compare frameworks relative to each other, not against theoretical maximums.
  2. Consider the WORKLOAD type
    Different workloads (IO-bound vs CPU-bound) favor different architectures. A framework's ranking may change significantly between workload types.
  3. Remember the BIAS
    Java implementations received more attention and optimization. Non-Java results should be taken with additional skepticism.
  4. This is a LEARNING PROJECT
    The primary goal was to understand and compare concurrency models, not to definitively rank frameworks for production use.
  5. DO YOUR OWN BENCHMARKS
    For production decisions, benchmark with your actual workload, your actual hardware, and your actual deployment environment.

Attribution

  • Project concept and Java implementations: Human (me)
  • AI assistance: Claude (Anthropic)
  • Load testing: k6 (Grafana Labs)
  • Charting: Plotly.js
  • No actual humans were harmed in the making of this benchmark. Probably.

License & Usage

This project is provided for educational and demonstration purposes. Feel free to use the results in presentations, but please:

  1. Link back to the original project
  2. Include a reference to this disclaimer
  3. Acknowledge the limitations described above
Load Tests
Curl Latency
Startup Benchmarks
Compare
Bench Stats
Loading...
Requests per Second

Click legend items to toggle. Double-click to isolate.

Request Latency

Solid line = avg, shaded area = p90.

Success Rate
Server CPU Usage
Server Memory Usage
Server Thread Count
Endpoint Reference
Slow POST /slow — Synthetic I/O workload.
Fixed 1,080ms artificial delay (1s ritual + 80ms overhead).
Fast POST /validate — No artificial delay.
Tests raw request handling speed and framework overhead.
Compute Stable GET /sqrt — Pure CPU work (always runs full iterations).
Startup bench: 5 million iterations
Load test: 200 million iterations
Compute Unstable GET /sqrt — Pure CPU work (exits early when converged).
Startup bench: 5 million iterations
Load test: 200 million iterations
Real POST /compute — Mixed CPU + I/O (6 sqrt calculations with delays).
Startup bench: 100K iter × 6, 5ms delay → ~30ms total
Load test: 500K iter × 6, 50ms delay → ~300ms total
Framework Reference
Java
Quarkus2019
Red Hat Container-first Java for Kubernetes and serverless
Spring2003
Rod Johnson Simpler alternative to complex J2EE/EJB
Spring Boot2014
Pivotal/VMware Convention over configuration for Spring
Micronaut2018
Graeme Rocher (Grails) Compile-time DI, no reflection overhead
Helidon SE2018
Oracle Lightweight microservices without app server
Open Liberty2017
IBM Open source cloud-native WebSphere runtime
WildFly1999
Marc Fleury / Red Hat Open source Java EE app server (formerly JBoss AS)
TomEE2011
David Blevins (OpenEJB) Java EE for Tomcat users
Python
FastAPI2018
Sebastián Ramírez (Typer, SQLModel) Modern async API with type hints and auto-docs
Flask2010
Armin Ronacher (Jinja2, Werkzeug, Click) Minimalist micro-framework (started as April Fool's joke)
Django2005
Adrian Holovaty Batteries-included framework for rapid development
aiohttp2013
Andrew Svetlov Async HTTP client/server on asyncio
Sanic2016
Adam Hopkins Node.js-like async performance for Python
PHP
Laravel2011
Taylor Otwell Elegant syntax, better CodeIgniter alternative
Symfony2005
Fabien Potencier (Twig) Enterprise PHP with reusable components
Swoole2012
Rango (韩天峰) Coroutines and async I/O for PHP
RoadRunner2018
Spiral Scout Persistent PHP workers in Go for high performance
Go
net/http2012
Go Team Standard library HTTP server (Go 1.0)
Gin2014
Manu Martinez-Almeida Martini-like API, 40x faster with httprouter
Fiber2019
Fenny Express-inspired API for Go developers
Chi2016
Pressly Lightweight idiomatic router, stdlib only
Rust
Axum2021
Tokio Team Ergonomic Tower-based web framework
Actix-web2017
Nikolay Kim Actor-based high-performance framework
Rocket2016
Sergio Benitez Focus on usability, security, and speed
Node.js
Express2010
TJ Holowaychuk (Koa, Mocha, Jade) Sinatra-inspired minimal web framework
Fastify2016
Matteo Collina (Pino) Low overhead, high performance alternative
NestJS2017
Kamil Myśliwiec Angular-inspired enterprise framework
.NET
ASP.NET Core2016
Microsoft Cross-platform redesign of ASP.NET
Minimal APIs2021
Microsoft Lightweight API syntax introduced in .NET 6
FastEndpoints2021
Đĵ ΝιΓΞΗΛψΚ REPR pattern, faster than MVC controllers
ServiceStack2008
Demis Bellot Message-based web services framework
Runtime Reference
Java
jit
HotSpot JVM Standard JIT compilation at runtime
native
GraalVM Native Image AOT compiled to native binary
uber
Uber JAR Fat JAR with all dependencies (still JIT)
Python
jit
CPython Standard interpreter (no JIT, bytecode only)
native
Nuitka AOT compiled to native binary
uber
PyPy Alternative runtime with tracing JIT
Go
native
Go Compiler Always AOT compiled to native binary
Rust
native
rustc + LLVM Always AOT compiled to native binary
Node.js
jit
V8 Engine JIT compilation at runtime
.NET
jit
CoreCLR Standard JIT compilation at runtime
native
NativeAOT AOT compiled to native binary
uber
Single-file + ReadyToRun Self-contained with pre-compiled assemblies
PHP
jit
Zend Engine Standard interpreter with optional JIT (PHP 8+)