Scaling Under Load: Our API's Performance Journey
In our ongoing commitment to engineering excellence, we're excited to share the latest results from our comprehensive API performance testing. What makes these results particularly impressive is that they were achieved on a relatively modest infrastructure: a virtual machine with just 4 CPUs and 12GB of RAM running in a containerized Docker environment.
Infrastructure Context
Before diving into the performance metrics, it's important to understand the infrastructure that powered these tests:
- Compute Resources: 4 CPU cores, 12GB RAM virtual machine
- Deployment Environment: Containerized using Docker
- Test Duration: Full 10-minute sustained load test
- Virtual Users: Up to 500 concurrent connections
This modest footprint highlights our focus on optimization and efficiency rather than simply throwing hardware at performance challenges.
Performance Results: Remarkable Efficiency
http_req_duration: avg=1.34ms min=104µs med=1.07ms max=54.18ms p(90)=2.62ms p(95)=3.34ms
These response times are particularly notable given our limited resources:
- Average response time of 1.34ms: Sub-2ms performance on just 4 CPU cores
- Median (p50) of 1.07ms: Typical requests completing in approximately 1 millisecond
- 95th percentile at 3.34ms: Even outlier requests remain under 4ms
- Maximum response of 54.18ms: Rare edge cases still complete well within acceptable thresholds
Achieving these numbers on our modest infrastructure demonstrates the effectiveness of our optimization efforts and architectural decisions.
Impressive Throughput with Limited Resources
During our 10-minute test window, the system processed:
http_reqs: 16,775,780 (27,959.23/s)
This translates to:
- 16.8 million total requests successfully processed
- ~28,000 requests per second sustained for 10 minutes
- ~7,000 requests per second per CPU core
- 100% success rate across all requests
Processing nearly 28,000 requests per second on just 4 CPU cores represents exceptional efficiency - each core effectively handling around 7,000 requests per second on average.
Resource Utilization
The system maintained efficient resource usage throughout the test:
data_received: 3.2 GB (5.4 MB/s)
data_sent: 6.9 GB (12 MB/s)
This efficient network utilization, coupled with the ability to run within 8GB RAM constraints, demonstrates our codebase's optimization for both memory and I/O operations.
Perfect Reliability on Modest Hardware
Despite the limited resources, our reliability remained perfect:
http_req_failed: 0.00% (0 out of 16,775,780)
successful_requests: 100.00% (16,775,780 out of 16,775,780)
Every single one of the 16.8 million requests processed successfully, with all validation checks passing:
checks_succeeded: 100.00% (50,327,340 out of 50,327,340)
This demonstrates that high reliability doesn't necessarily require excessive hardware resources when the software is properly optimized.
Technical Implementation: Doing More With Less
For fellow engineers interested in our approach to resource efficiency, our implementation includes:
- Event-driven architecture: Maximizing CPU utilization through non-blocking I/O
- Efficient connection management: Minimizing memory overhead per connection
- Optimized containerization: Docker configuration tuned for performance
- Intelligent resource allocation: Prioritizing critical paths and operations
- Careful memory management: Minimizing garbage collection overhead
The test simulated 500 virtual users over a full 10-minute window on this modest infrastructure:
vus_max: 500 min=500 max=500
running (10m00.0s), 000/500 VUs, 16,775,780 complete and 0 interrupted iterations
Cost-Efficiency Implications
These results have significant implications for infrastructure cost optimization:
- Lower infrastructure requirements: Delivering excellent performance without expensive hardware
- Reduced cloud costs: Handling high throughput on smaller instance types
- Better scaling economics: Efficient resource usage allows linear cost scaling
- Energy efficiency: Accomplishing more with fewer computing resources
Opportunities for Further Optimization
While we're proud of these results, we've identified specific areas for continued improvement:
- Scaling Beyond Current Hardware: While our current performance is excellent for this hardware profile, we're exploring architectural improvements that would allow us to efficiently utilize additional resources when available.
Request Rate Consistency:
peak_rps: avg=85.92 min=1 med=86 max=95 p(90)=91 p(95)=92
We observed occasional throughput dips that we're addressing in our next iteration.
Conclusion: Efficiency Through Engineering
These results demonstrate that exceptional API performance doesn't necessarily require excessive computing resources. Through careful architectural decisions, optimization efforts, and performance-focused development practices, we've achieved:
- Millisecond-level response times on modest hardware
- Nearly 28,000 requests per second on just 4 CPU cores
- Perfect reliability across 16.8 million requests
- Efficient resource utilization within 8GB RAM constraints
This efficiency-first approach not only reduces infrastructure costs but also improves our environmental footprint and allows us to scale more economically as demand grows.
As we continue refining our systems, we remain committed to maximizing performance per computing resource rather than simply relying on hardware scaling to solve performance challenges.
What optimization techniques have you found most effective for improving performance on limited hardware? Share your experiences in the comments below.