HTTP Test | Load Test Report

This is NOT production-grade benchmark data. This project was created for a CodeDay at {CodeWorks} in a limited timespan as a visual support to discuss HTTP frameworks and concurrency models.

Author bias: I'm primarily a Java developer. Java implementations received more attention. Non-Java apps were AI-generated (Claude) and may contain errors.

The web UI was fully vibe-coded with Claude and never manually reviewed. This was intentionally done to test "full vibe coding" as an experiment. It works for my purposes, but may contain bugs, inefficiencies, or break on certain browsers.

Read full disclaimer & methodology...

Author Bias

I am primarily a Java developer. This inherently introduces bias:

I have deeper knowledge of Java frameworks and their optimal configurations
I may have inadvertently optimized Java implementations better than others
My code review of non-Java implementations may miss language-specific anti-patterns or performance pitfalls
Framework choices in other languages were based on popularity rather than personal expertise

On the bright side, my bias occasionally helps catch bugs! When I saw Node.js suspiciously outperforming everything else, I thought "there's no way Node is THAT fast" and investigated. Turns out Claude had implemented the artificial delay with an empty loop that V8 happily optimized away to nothing. So yes, my Java-flavored skepticism saved the benchmark integrity. You're welcome, Node developers. (The bug is fixed now, Node is back to normal performance... most of the time. I still catch it being suspiciously fast in some results.)

Code Authorship

Different parts of this project have different levels of human involvement:

HAND-WRITTEN CODE (by me):

Java shared domain module
Quarkus implementations
Spring implementations
Spring Boot 3 implementations
Spring Boot 4 implementations
Core benchmark logic and delay simulation

AI-ASSISTED (generated by Claude, reviewed by me):

Micronaut implementations
Helidon SE implementations
OpenLiberty implementations
WildFly implementations
TomEE implementations
GraalVM native-image configurations (reflect-config.json and such) for frameworks that didn't work out of the box
All build scripts (Python)

FULLY AI-GENERATED (generated by Claude, reviewed but may contain errors):

All Go implementations
All Rust implementations
All Python implementations
All PHP implementations
All Node.js implementations
The entire web report UI

Web Report UI

THE WEB UI WAS FULLY VIBE-CODED WITH CLAUDE AND THE CODE WAS NEVER CHECKED.

Frontend development is not my main expertise. The entire web report interface (HTML, CSS, JavaScript, charting logic) was generated through AI conversation without manual code review. This was intentionally done to test "full vibe coding" as an experiment. It works for my purposes, but:

May contain bugs or inefficiencies
May not follow frontend best practices
May have accessibility issues
May break on certain browsers or screen sizes

Use the web UI as a visualization tool, not as a reference implementation.

UI Controls Guide

Filter buttons (Language, Framework, Pattern, Build Type):

Left-click: Toggle individual items on/off
Right-click on an item: Select ONLY that item (deselect all others)
Right-click on an already-solo item: Select ALL items in that group

Special filters:

"Merge": Combines multiple runs of the same configuration into one series
"Best": Shows only the best-performing variant per framework (does NOT work in all views - it's buggy, you've been warned)

Chart controls:

Grid: Display multiple charts in a grid layout on one screen
Legend: Toggle the chart legend visibility
Crop: Crops the Y-axis to exclude extreme outliers. This was specifically added because some tests failed catastrophically (timeouts, errors) and produced data points that made the rest of the chart unreadable. Use this to focus on the "normal" performance range.

Resolution slider - HUGE WARNING

The resolution slider controls data point sampling/aggregation.

High resolution may CRASH YOUR BROWSER if you have too many series selected. Only use high resolution on heavily filtered views.

Rule of thumb:

Many series selected → Use LOW resolution
Few series selected → Use HIGH resolution for accurate analysis

Non-Java Implementations

Important caveats about non-Java language implementations:

1. NO SHARED CODE

Unlike Java apps which share a domain module, each non-Java framework is a completely standalone project. This means:

Delay simulation logic was duplicated (may have subtle differences)
JSON parsing/serialization varies by framework
Error handling is inconsistent across implementations

2. DEV SERVER CONCERNS

Some implementations may be running in development mode rather than production mode. I identified and fixed several cases, but others may remain:

Flask/Django without gunicorn in production mode
Node.js with debug flags
PHP with development error reporting

If you notice an implementation performing unexpectedly poorly, this could be the cause.

3. LIMITED OPTIMIZATION

I intentionally kept the code simple and avoided framework-specific optimizations to maintain consistency. However, this means:

Implementations may not represent the framework's full potential
Production-grade code would likely perform differently
Connection pooling, caching, and other optimizations were not used

Test Environment

Hardware:

CPU: Intel Core i9-14900KS
CPU Mode: POWER-SAVING MODE (because Intel's recent CPUs have stability issues at full power - yes, Intel sucks)
This means performance numbers are artificially lower than what the hardware could achieve at full power
WSL2 allocated memory: 120GB
No memory limits were applied to any application
Java thrives in environments with abundant memory (as you can see in the benchmarks) - results may differ significantly in memory-constrained environments like containers with low limits

Software Environment:

Applications ran inside WSL2 (Windows Subsystem for Linux 2)
k6 load testing ran from Windows host
Tests used direct WSL2 IP address to minimize network overhead
This setup was chosen to reduce interference between load generator and applications under test

Network Path:

Windows k6 → Hyper-V virtual switch → WSL2 VM → Application

This is NOT equivalent to:

Native Linux performance
Docker container performance
Bare metal performance
Cloud/VM performance

Compute Endpoint Limitations

The compute workload calculates square roots to a specified precision. There are known issues:

1. DOUBLE PRECISION LIMITATION

The implementation uses double-precision floating point, which has limited precision (~15-17 significant digits). This means:

Very high precision targets may not be achievable
The algorithm may terminate early or behave unexpectedly
This could explain the dual grouping of data points visible in some benchmark charts

2. ALGORITHM QUALITY

The square root calculation algorithm was not optimized for numerical accuracy or performance. It's a simple iterative approach that may:

Have varying iteration counts for similar inputs
Produce inconsistent CPU load across implementations
Not be the best representation of "compute-bound" workloads

3. CROSS-LANGUAGE INCONSISTENCY

Floating-point behavior varies by language and runtime:

Java, Go, Rust may produce different results for the same input
Some languages may optimize the calculation differently
JIT compilation may affect computation patterns over time

Benchmark Methodology

Workload Types:

"fast": Minimal processing to test pure framework latency. Includes basic payload validation to prevent frameworks from over-optimizing it away.
"slow": Fixed artificial delay (Thread.sleep or equivalent)
"compute": CPU-bound square root calculation
"sqrt_stable": Deterministic code path - the same predictable code runs every time, making it friendly to CPU branch prediction and JIT optimization
"sqrt_unstable": Random branch selection - a random choice determines which code branch executes, but the overall computational complexity remains the same. This defeats branch prediction and may show different JIT behavior.
"real": Attempts to mimic a realistic endpoint with both computation and wait time, without using any external resources (database, network calls) to avoid contaminating the benchmark with external factors.

Test Configuration:

Load tests (k6):

Virtual Users (VUs): Ramped from 0 to target
Ramp-up time: 5 minutes
Sustained load: 1 minute at target VUs
Warmup: JIT applications had warmup period before measurement

Startup benchmarks:

Each application was started 10 times
50 HTTP requests (curl) per start to measure latency
Memory measured via /proc after startup

What These Benchmarks DO Measure:

Relative performance differences between frameworks
Throughput under sustained concurrent load
Latency distribution patterns
How frameworks handle thread/connection exhaustion

What These Benchmarks DO NOT Measure:

Real-world application performance (no database, no external services)
Cold start in serverless environments
Memory efficiency for long-running applications
Framework ergonomics or developer productivity

Recommendations

When interpreting results:

Focus on PATTERNS, not absolute numbers
The test environment introduces overhead. Compare frameworks relative to each other, not against theoretical maximums.
Consider the WORKLOAD type
Different workloads (IO-bound vs CPU-bound) favor different architectures. A framework's ranking may change significantly between workload types.
Remember the BIAS
Java implementations received more attention and optimization. Non-Java results should be taken with additional skepticism.
This is a LEARNING PROJECT
The primary goal was to understand and compare concurrency models, not to definitively rank frameworks for production use.
DO YOUR OWN BENCHMARKS
For production decisions, benchmark with your actual workload, your actual hardware, and your actual deployment environment.

Attribution

Project concept and Java implementations: Human (me)
AI assistance: Claude (Anthropic)
Load testing: k6 (Grafana Labs)
Charting: Plotly.js
No actual humans were harmed in the making of this benchmark. Probably.

License & Usage

This project is provided for educational and demonstration purposes. Feel free to use the results in presentations, but please:

Link back to the original project
Include a reference to this disclaimer
Acknowledge the limitations described above

Load Tests

Curl Latency

Startup Benchmarks

Compare

Bench Stats

Requests per Second

Click legend items to toggle. Double-click to isolate.

Request Latency

Solid line = avg, shaded area = p90.

Success Rate

Server CPU Usage

Server Memory Usage

Server Thread Count

Endpoint Reference

Slow POST /slow — Synthetic I/O workload.
Fixed 1,080ms artificial delay (1s ritual + 80ms overhead).

Fast POST /validate — No artificial delay.
Tests raw request handling speed and framework overhead.

Compute Stable GET /sqrt — Pure CPU work (always runs full iterations).
Startup bench: 5 million iterations
Load test: 200 million iterations

Compute Unstable GET /sqrt — Pure CPU work (exits early when converged).
Startup bench: 5 million iterations
Load test: 200 million iterations

Real POST /compute — Mixed CPU + I/O (6 sqrt calculations with delays).
Startup bench: 100K iter × 6, 5ms delay → ~30ms total
Load test: 500K iter × 6, 50ms delay → ~300ms total

Framework Reference

Java

Quarkus2019

Red Hat Container-first Java for Kubernetes and serverless

Spring2003

Rod Johnson Simpler alternative to complex J2EE/EJB

Spring Boot2014

Pivotal/VMware Convention over configuration for Spring

Micronaut2018

Graeme Rocher (Grails) Compile-time DI, no reflection overhead

Helidon SE2018

Oracle Lightweight microservices without app server

Open Liberty2017

IBM Open source cloud-native WebSphere runtime

WildFly1999

Marc Fleury / Red Hat Open source Java EE app server (formerly JBoss AS)

TomEE2011

David Blevins (OpenEJB) Java EE for Tomcat users

Python

FastAPI2018

Sebastián Ramírez (Typer, SQLModel) Modern async API with type hints and auto-docs

Flask2010

Armin Ronacher (Jinja2, Werkzeug, Click) Minimalist micro-framework (started as April Fool's joke)

Django2005

Adrian Holovaty Batteries-included framework for rapid development

aiohttp2013

Andrew Svetlov Async HTTP client/server on asyncio

Sanic2016

Adam Hopkins Node.js-like async performance for Python

PHP

Laravel2011

Taylor Otwell Elegant syntax, better CodeIgniter alternative

Symfony2005

Fabien Potencier (Twig) Enterprise PHP with reusable components

Swoole2012

Rango (韩天峰) Coroutines and async I/O for PHP

RoadRunner2018

Spiral Scout Persistent PHP workers in Go for high performance

net/http2012

Go Team Standard library HTTP server (Go 1.0)

Gin2014

Manu Martinez-Almeida Martini-like API, 40x faster with httprouter

Fiber2019

Fenny Express-inspired API for Go developers

Chi2016

Pressly Lightweight idiomatic router, stdlib only

Rust

Axum2021

Tokio Team Ergonomic Tower-based web framework

Actix-web2017

Nikolay Kim Actor-based high-performance framework

Rocket2016

Sergio Benitez Focus on usability, security, and speed

Node.js

Express2010

TJ Holowaychuk (Koa, Mocha, Jade) Sinatra-inspired minimal web framework

Fastify2016

Matteo Collina (Pino) Low overhead, high performance alternative

NestJS2017

Kamil Myśliwiec Angular-inspired enterprise framework

.NET

ASP.NET Core2016

Microsoft Cross-platform redesign of ASP.NET

Minimal APIs2021

Microsoft Lightweight API syntax introduced in .NET 6

FastEndpoints2021

Đĵ ΝιΓΞΗΛψΚ REPR pattern, faster than MVC controllers

ServiceStack2008

Demis Bellot Message-based web services framework

Runtime Reference

Java

jit

HotSpot JVM Standard JIT compilation at runtime

native

GraalVM Native Image AOT compiled to native binary

uber

Uber JAR Fat JAR with all dependencies (still JIT)

Python

jit

CPython Standard interpreter (no JIT, bytecode only)

native

Nuitka AOT compiled to native binary

uber

PyPy Alternative runtime with tracing JIT

native

Go Compiler Always AOT compiled to native binary

Rust

native

rustc + LLVM Always AOT compiled to native binary

Node.js

jit

V8 Engine JIT compilation at runtime

.NET

jit

CoreCLR Standard JIT compilation at runtime

native

NativeAOT AOT compiled to native binary

uber

Single-file + ReadyToRun Self-contained with pre-compiled assemblies

PHP

jit

Zend Engine Standard interpreter with optional JIT (PHP 8+)

HTTP Framework Benchmark

Author Bias

Code Authorship

HAND-WRITTEN CODE (by me):

AI-ASSISTED (generated by Claude, reviewed by me):

FULLY AI-GENERATED (generated by Claude, reviewed but may contain errors):

Web Report UI

UI Controls Guide

Non-Java Implementations

1. NO SHARED CODE

2. DEV SERVER CONCERNS

3. LIMITED OPTIMIZATION

Test Environment

Hardware:

Software Environment:

Network Path:

Compute Endpoint Limitations

1. DOUBLE PRECISION LIMITATION

2. ALGORITHM QUALITY

3. CROSS-LANGUAGE INCONSISTENCY

Benchmark Methodology

Workload Types:

Test Configuration:

What These Benchmarks DO Measure:

What These Benchmarks DO NOT Measure:

Recommendations

Attribution

License & Usage