Choosing an Algorithm
Guide to selecting the right sorting algorithm for your use case.
Quick Decision Tree
Do you know your data characteristics?
├─ No -> Use smart()
└─ Yes
├─ Need stability?
│ ├─ Yes
│ │ ├─ Nearly sorted? -> tim()
│ │ ├─ Large (>100k)? -> par_sort()
│ │ └─ General -> merge()
│ └─ No
│ ├─ Small (<16)? -> insertion()
│ ├─ Large (>100k)? -> par_sort_unstable()
│ ├─ Integers only? -> radix()
│ └─ General -> pdq() or intro()
By Use Case
General Purpose
Recommendation: smart()
Automatically adapts to your data. Best default choice.
Maximum Performance (Unstable OK)
Recommendation: pdq()
Fastest general-purpose unstable sort. Pattern-defeating optimizations.
Stability Required
Recommendation: tim()
Stable sort optimized for real-world data with natural runs.
Large Datasets
Recommendation: par_sort_unstable() or par_sort()
# Unstable (fastest)
result = ordr.par_sort_unstable(large_arr)
# Stable
result = ordr.par_sort(large_arr)
Leverages multiple cores for arrays > 10,000 elements.
Integer Arrays
Recommendation: radix()
Linear time for integers when range is reasonable.
Nearly Sorted Data
Recommendation: tim()
Approaches O(n) for nearly sorted data.
Small Arrays
Recommendation: insertion()
Minimal overhead for arrays < 16 elements.
By Data Pattern
| Pattern | Best Algorithm | Why |
|---|---|---|
| Random | pdq() |
Pattern-defeating, fast partitioning |
| Sorted | tim() |
Detects runs, O(n) |
| Reverse | tim() or pdq() |
Tim detects reverse runs |
| Nearly sorted | tim() |
Optimized for natural runs |
| Many duplicates | tim() or radix() |
Stable handling or linear time |
| Large random | par_sort_unstable() |
Parallel speedup |
| Integers | radix() |
Linear time |
By Requirements
Need Guaranteed O(n log n)?
Use intro() or heap():
Need Stability?
Use tim(), merge(), or par_sort():
result = ordr.tim(arr) # Best for most cases
result = ordr.merge(arr) # Classic stable sort
result = ordr.par_sort(arr) # Large stable datasets
Need O(1) Space?
Use heap() or insertion():
Need Parallelism?
Use par_sort() or par_sort_unstable():
Performance Comparison
Performance on 1,000,000 random integers (measured with hyperfine, including Python startup):
| Algorithm | Time (ms) | vs builtin |
|---|---|---|
| par_sort_unstable | 462 | 1.46x faster |
| smart | 466 | 1.44x faster |
| radix | 502 | 1.34x faster |
| pdq | 517 | 1.30x faster |
| builtin | 673 | baseline |
Benchmarks measured on Windows with hyperfine 1.20, 3 runs each
Common Mistakes
Incorrect: Always using the same algorithm
Different data patterns benefit from different algorithms.
Correct: Use smart() or choose based on data
# Better
result = ordr.smart(arr)
# Or choose explicitly
if is_nearly_sorted(arr):
result = ordr.tim(arr)
else:
result = ordr.pdq(arr)
Incorrect: Using slow algorithms for large data
Correct: Use efficient algorithms
Incorrect: Ignoring stability requirements
Correct: Use stable sorts when needed
Summary
Default choice: smart() - adapts automatically
High performance: pdq() - fastest general-purpose
Stability: tim() - stable and fast
Large data: par_sort_unstable() - parallel speedup
Integers: radix() - linear time
Guaranteed: intro() - O(n log n) guaranteed