Scaling Under Load: Our API's Performance Journey

In our ongoing commitment to engineering excellence, we're excited to share the latest results from our comprehensive API performance testing. What makes these results particularly impressive is that they were achieved on a relatively modest infrastructure: a virtual machine with just 4 CPUs and 12GB of RAM running in a containerized Docker environment.

Infrastructure Context

Before diving into the performance metrics, it's important to understand the infrastructure that powered these tests:

Compute Resources: 4 CPU cores, 12GB RAM virtual machine
Deployment Environment: Containerized using Docker
Test Duration: Full 10-minute sustained load test
Virtual Users: Up to 500 concurrent connections

This modest footprint highlights our focus on optimization and efficiency rather than simply throwing hardware at performance challenges.

Performance Results: Remarkable Efficiency

http_req_duration: avg=1.34ms min=104µs med=1.07ms max=54.18ms p(90)=2.62ms p(95)=3.34ms

These response times are particularly notable given our limited resources:

Average response time of 1.34ms: Sub-2ms performance on just 4 CPU cores
Median (p50) of 1.07ms: Typical requests completing in approximately 1 millisecond
95th percentile at 3.34ms: Even outlier requests remain under 4ms
Maximum response of 54.18ms: Rare edge cases still complete well within acceptable thresholds

Achieving these numbers on our modest infrastructure demonstrates the effectiveness of our optimization efforts and architectural decisions.

Impressive Throughput with Limited Resources

During our 10-minute test window, the system processed:

http_reqs: 16,775,780 (27,959.23/s)

This translates to:

16.8 million total requests successfully processed
~28,000 requests per second sustained for 10 minutes
~7,000 requests per second per CPU core
100% success rate across all requests

Processing nearly 28,000 requests per second on just 4 CPU cores represents exceptional efficiency - each core effectively handling around 7,000 requests per second on average.

Resource Utilization

The system maintained efficient resource usage throughout the test:

data_received: 3.2 GB (5.4 MB/s)
data_sent: 6.9 GB (12 MB/s)

This efficient network utilization, coupled with the ability to run within 8GB RAM constraints, demonstrates our codebase's optimization for both memory and I/O operations.

Perfect Reliability on Modest Hardware

Despite the limited resources, our reliability remained perfect:

http_req_failed: 0.00% (0 out of 16,775,780)
successful_requests: 100.00% (16,775,780 out of 16,775,780)

Every single one of the 16.8 million requests processed successfully, with all validation checks passing:

checks_succeeded: 100.00% (50,327,340 out of 50,327,340)

This demonstrates that high reliability doesn't necessarily require excessive hardware resources when the software is properly optimized.

Technical Implementation: Doing More With Less

For fellow engineers interested in our approach to resource efficiency, our implementation includes:

Event-driven architecture: Maximizing CPU utilization through non-blocking I/O
Efficient connection management: Minimizing memory overhead per connection
Optimized containerization: Docker configuration tuned for performance
Intelligent resource allocation: Prioritizing critical paths and operations
Careful memory management: Minimizing garbage collection overhead

The test simulated 500 virtual users over a full 10-minute window on this modest infrastructure:

vus_max: 500 min=500 max=500
running (10m00.0s), 000/500 VUs, 16,775,780 complete and 0 interrupted iterations

Cost-Efficiency Implications

These results have significant implications for infrastructure cost optimization:

Lower infrastructure requirements: Delivering excellent performance without expensive hardware
Reduced cloud costs: Handling high throughput on smaller instance types
Better scaling economics: Efficient resource usage allows linear cost scaling
Energy efficiency: Accomplishing more with fewer computing resources

Opportunities for Further Optimization

While we're proud of these results, we've identified specific areas for continued improvement:

Scaling Beyond Current Hardware: While our current performance is excellent for this hardware profile, we're exploring architectural improvements that would allow us to efficiently utilize additional resources when available.

Request Rate Consistency:

peak_rps: avg=85.92 min=1 med=86 max=95 p(90)=91 p(95)=92

We observed occasional throughput dips that we're addressing in our next iteration.

Conclusion: Efficiency Through Engineering

These results demonstrate that exceptional API performance doesn't necessarily require excessive computing resources. Through careful architectural decisions, optimization efforts, and performance-focused development practices, we've achieved:

Millisecond-level response times on modest hardware
Nearly 28,000 requests per second on just 4 CPU cores
Perfect reliability across 16.8 million requests
Efficient resource utilization within 8GB RAM constraints

This efficiency-first approach not only reduces infrastructure costs but also improves our environmental footprint and allows us to scale more economically as demand grows.

As we continue refining our systems, we remain committed to maximizing performance per computing resource rather than simply relying on hardware scaling to solve performance challenges.

What optimization techniques have you found most effective for improving performance on limited hardware? Share your experiences in the comments below.