Performance Benchmarks¶
Rayforce-Py delivers exceptional performance, closely matching native Rayforce while significantly outperforming Pandas. Our benchmarks are based on the H2OAI Group By Benchmark standard.
Benchmarks run on: macOS M4 32GB, 1M rows, 100 groups, 50 runs (median), 20 warmup runs
Methodology
- Dataset: 1,000,000 rows, 6 columns (id1, id2, id3, v1, v2, v3)
- Timing: Median of 50 runs
- Warmup: 20 runs per query to warm caches
- Data: Deterministic (seed=42) for reproducibility
Q1: Group by id1, sum v1¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 611 | 1.00x | 5.82x | 1.90x |
| Native Rayforce | 612 | 1.00x | 5.81x | 1.90x |
| Polars | 1,162 | 1.90x | 3.06x | 1.00x |
| Pandas | 3,556 | 5.81x | 1.00x | 0.33x |
Q2: Group by id1, id2, sum v1¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 1,279 | 0.99x | 10.65x | 5.28x |
| Native Rayforce | 1,290 | 1.00x | 10.57x | 5.23x |
| Polars | 6,753 | 5.23x | 2.02x | 1.00x |
| Pandas | 13,631 | 10.57x | 1.00x | 0.50x |
Performance Insight
Multi-column group by operations show the largest performance advantage, with Rayforce-Py being 10.65x faster than Pandas and 5.28x faster than Polars.
Q3: Group by id3, sum v1, avg v3¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 829 | 1.00x | 5.85x | 1.63x |
| Native Rayforce | 828 | 1.00x | 5.85x | 1.63x |
| Polars | 1,352 | 1.63x | 3.58x | 1.00x |
| Pandas | 4,846 | 5.85x | 1.00x | 0.28x |
Q4: Group by id3, avg v1, v2, v3¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 1,044 | 1.00x | 5.96x | 1.52x |
| Native Rayforce | 1,045 | 1.00x | 5.95x | 1.52x |
| Polars | 1,584 | 1.52x | 3.92x | 1.00x |
| Pandas | 6,216 | 5.95x | 1.00x | 0.25x |
Q5: Group by id3, sum v1, v2, v3¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 1,049 | 1.01x | 6.55x | 1.48x |
| Native Rayforce | 1,043 | 1.00x | 6.60x | 1.49x |
| Polars | 1,549 | 1.49x | 4.44x | 1.00x |
| Pandas | 6,879 | 6.60x | 1.00x | 0.23x |
Best Performance
Q5 shows Rayforce-Py performing 6.55x faster than Pandas and 1.48x faster than Polars, demonstrating excellent performance on multiple aggregations.
Q6: Group by id3, max(v1) - min(v2)¶
| Implementation | Time (μs) | vs Native | vs Pandas | vs Polars |
|---|---|---|---|---|
| Rayforce-Py | 859 | 1.02x | 5.39x | 3.86x |
| Native Rayforce | 846 | 1.00x | 5.47x | 3.92x |
| Polars | 3,316 | 3.92x | 1.40x | 1.00x |
| Pandas | 4,627 | 5.47x | 1.00x | 0.72x |
| Query | Rayforce-Py vs Native | Rayforce-Py vs Pandas | Rayforce-Py vs Polars |
|---|---|---|---|
| Q1 | 1.00x | 5.82x | 1.90x |
| Q2 | 0.99x | 10.65x | 5.28x |
| Q3 | 1.00x | 5.85x | 1.63x |
| Q4 | 1.00x | 5.96x | 1.52x |
| Q5 | 1.01x | 6.55x | 1.48x |
| Q6 | 1.02x | 5.39x | 3.86x |
| Average | 1.00x | 6.70x | 2.61x |
Performance Analysis
Rayforce-Py adds almost no overhead compared to native Rayforce, demonstrating the efficiency of the Python bindings. On average, Rayforce-Py is 6.70x faster than Pandas and 2.61x faster than Polars, making it an excellent choice for high-performance data processing.
Note: The slight performance advantage shown by Rayforce-Py over native Rayforce is due to measurement methodology differences. Native Rayforce benchmarks include memory deallocation overhead, while Rayforce-Py measurements exclude it. In practice, the performance difference is negligible and within measurement noise, demonstrating that the Python bindings introduce virtually no overhead.
Running Your Own Benchmarks¶
You can run the benchmarks yourself using the provided benchmark suite:
# Default (15 runs, 5 warmup)
make benchmarkdb
# Custom configuration
make benchmarkdb ARGS="--runs 20 --warmup 5"
For Accurate Results
- Use at least 15-20 runs for statistical significance
- Ensure your system is idle to minimize interference
- Results use median (more robust than mean) with standard deviation reported
Learn More¶
- Getting Started Guide - Learn how to use Rayforce-Py
- Query Guide - Explore query capabilities