Skip to content

Choosing an Algorithm

Guide to selecting the right sorting algorithm for your use case.

Quick Decision Tree

Do you know your data characteristics?
├─ No -> Use smart()
└─ Yes
   ├─ Need stability?
   │  ├─ Yes
   │  │  ├─ Nearly sorted? -> tim()
   │  │  ├─ Large (>100k)? -> par_sort()
   │  │  └─ General -> merge()
   │  └─ No
   │     ├─ Small (<16)? -> insertion()
   │     ├─ Large (>100k)? -> par_sort_unstable()
   │     ├─ Integers only? -> radix()
   │     └─ General -> pdq() or intro()

By Use Case

General Purpose

Recommendation: smart()

result = ordr.smart(arr)

Automatically adapts to your data. Best default choice.


Maximum Performance (Unstable OK)

Recommendation: pdq()

result = ordr.pdq(arr)

Fastest general-purpose unstable sort. Pattern-defeating optimizations.


Stability Required

Recommendation: tim()

result = ordr.tim(arr)

Stable sort optimized for real-world data with natural runs.


Large Datasets

Recommendation: par_sort_unstable() or par_sort()

# Unstable (fastest)
result = ordr.par_sort_unstable(large_arr)

# Stable
result = ordr.par_sort(large_arr)

Leverages multiple cores for arrays > 10,000 elements.


Integer Arrays

Recommendation: radix()

result = ordr.radix(int_arr)

Linear time for integers when range is reasonable.


Nearly Sorted Data

Recommendation: tim()

result = ordr.tim(nearly_sorted)

Approaches O(n) for nearly sorted data.


Small Arrays

Recommendation: insertion()

result = ordr.insertion(small_arr)

Minimal overhead for arrays < 16 elements.


By Data Pattern

Pattern Best Algorithm Why
Random pdq() Pattern-defeating, fast partitioning
Sorted tim() Detects runs, O(n)
Reverse tim() or pdq() Tim detects reverse runs
Nearly sorted tim() Optimized for natural runs
Many duplicates tim() or radix() Stable handling or linear time
Large random par_sort_unstable() Parallel speedup
Integers radix() Linear time

By Requirements

Need Guaranteed O(n log n)?

Use intro() or heap():

result = ordr.intro(arr)  # Preferred
result = ordr.heap(arr)   # If O(1) space needed

Need Stability?

Use tim(), merge(), or par_sort():

result = ordr.tim(arr)      # Best for most cases
result = ordr.merge(arr)    # Classic stable sort
result = ordr.par_sort(arr) # Large stable datasets

Need O(1) Space?

Use heap() or insertion():

result = ordr.heap(arr)      # O(n log n)
result = ordr.insertion(arr) # O(n²) but good for small

Need Parallelism?

Use par_sort() or par_sort_unstable():

result = ordr.par_sort_unstable(arr)  # Faster
result = ordr.par_sort(arr)           # Stable

Performance Comparison

Performance on 1,000,000 random integers (measured with hyperfine, including Python startup):

Algorithm Time (ms) vs builtin
par_sort_unstable 462 1.46x faster
smart 466 1.44x faster
radix 502 1.34x faster
pdq 517 1.30x faster
builtin 673 baseline

Benchmarks measured on Windows with hyperfine 1.20, 3 runs each

Common Mistakes

Incorrect: Always using the same algorithm

# Don't do this for all cases
result = ordr.pdq(arr)

Different data patterns benefit from different algorithms.

Correct: Use smart() or choose based on data

# Better
result = ordr.smart(arr)

# Or choose explicitly
if is_nearly_sorted(arr):
    result = ordr.tim(arr)
else:
    result = ordr.pdq(arr)

Incorrect: Using slow algorithms for large data

# Don't do this for large arrays
result = ordr.bubble(large_arr)  # O(n²)!

Correct: Use efficient algorithms

# Better
result = ordr.par_sort_unstable(large_arr)

Incorrect: Ignoring stability requirements

# If you need stability, don't use unstable sorts
result = ordr.pdq(arr)  # Unstable!

Correct: Use stable sorts when needed

# Better
result = ordr.tim(arr)  # Stable

Summary

Default choice: smart() - adapts automatically

High performance: pdq() - fastest general-purpose

Stability: tim() - stable and fast

Large data: par_sort_unstable() - parallel speedup

Integers: radix() - linear time

Guaranteed: intro() - O(n log n) guaranteed