Quick Sort | Dheerendra Varma

Quick sort is the most commonly used sorting algorithm in practice. Where merge sort guarantees O(n log n) by always splitting at the midpoint, quick sort bets that a randomly chosen element (the pivot) will split the array reasonably well. That bet pays off almost always — and when it does, quick sort is faster than merge sort by a substantial constant factor because it sorts in-place, making far fewer memory accesses.

The core idea: pick one element as the pivot, rearrange the array so that everything smaller than the pivot sits to its left and everything larger sits to its right, then recursively sort both sides. After partitioning, the pivot is already in its final position — it never moves again.

Step-by-Step Trace

Input: [10, 7, 8, 9, 1, 5]

Lomuto partition — pivot = 5 (last element)

We track two indices: i marks the boundary of elements confirmed ≤ pivot, j scans forward.

Initial:   [10,  7,  8,  9,  1,  5]
            i=-1                 pivot=5
            j=0

j=0: arr[j]=10 > pivot, no swap.  i=-1
     [10,  7,  8,  9,  1,  5]

j=1: arr[j]=7  > pivot, no swap.  i=-1
     [10,  7,  8,  9,  1,  5]

j=2: arr[j]=8  > pivot, no swap.  i=-1
     [10,  7,  8,  9,  1,  5]

j=3: arr[j]=9  > pivot, no swap.  i=-1
     [10,  7,  8,  9,  1,  5]

j=4: arr[j]=1  ≤ pivot → i=0, swap arr[0]↔arr[4]
     [ 1,  7,  8,  9, 10,  5]
      ^i

Scan done. Place pivot: swap arr[i+1]↔arr[last] → swap arr[1]↔arr[5]
     [ 1,  5,  8,  9, 10,  7]
           ^pivot now at index 1, final position

Left partition:  [1]          — already sorted (single element)
Right partition: [8, 9, 10, 7] — recurse

Recursing into [8, 9, 10, 7] — pivot = 7

Initial:   [ 8,  9, 10,  7]
            i=-1         pivot=7

j=0: arr[j]=8  > pivot, no swap.
j=1: arr[j]=9  > pivot, no swap.
j=2: arr[j]=10 > pivot, no swap.

Scan done. Place pivot: swap arr[0]↔arr[3]
     [ 7,  9, 10,  8]
      ^pivot at index 0

Left:  []          — empty, base case
Right: [9, 10, 8]  — recurse

Recursing into [9, 10, 8] — pivot = 8

j=0: arr[j]=9  > pivot
j=1: arr[j]=10 > pivot

Place pivot: swap arr[0]↔arr[2]
     [8, 10, 9]
      ^pivot at index 0

Left:  []       — base case
Right: [10, 9]  — recurse → pivot=9, swap → [9, 10]

Final merged result: [1, 5, 7, 8, 9, 10]

Naive Implementation

Lomuto partition scheme with the last element as pivot. Clean and easy to follow, but vulnerable to O(n²) behaviour on already-sorted or reverse-sorted input.

fn quick_sort_naive(arr: &mut [i32]) {
    if arr.len() <= 1 {
        return;
    }
    let pivot_idx = lomuto_partition(arr);
    // pivot_idx is now in its final position — never touch it again.
    quick_sort_naive(&mut arr[..pivot_idx]);
    quick_sort_naive(&mut arr[pivot_idx + 1..]);
}
 
/// Lomuto partition: pivot = last element.
/// After this call, arr[pivot_idx] is in its correct sorted position.
/// Everything left of it is ≤ pivot; everything right is > pivot.
fn lomuto_partition(arr: &mut [i32]) -> usize {
    let last = arr.len() - 1;
    let pivot = arr[last];
    let mut i: isize = -1; // index of rightmost element confirmed ≤ pivot
 
    for j in 0..last {
        if arr[j] <= pivot {
            i += 1;
            arr.swap(i as usize, j);
        }
    }
 
    // Move pivot into its final slot, just after the ≤-region.
    let pivot_pos = (i + 1) as usize;
    arr.swap(pivot_pos, last);
    pivot_pos
}
 
fn main() {
    let mut data = vec![10, 7, 8, 9, 1, 5];
    quick_sort_naive(&mut data);
    println!("{:?}", data); // [1, 5, 7, 8, 9, 10]
}

The &mut [i32] slice API lets Rust enforce that the left and right sub-slices are non-overlapping, so no unsafe code is needed. The arr.swap calls are bounds-checked and cannot panic as long as the partition logic is correct.

Optimized Implementation

Three improvements combined: randomized pivot selection, Hoare partition (roughly half the swaps of Lomuto), and a three-way Dutch National Flag partition for arrays with many duplicate keys.

use std::cmp::Ordering;
 
fn quick_sort_optimized(arr: &mut [i32]) {
    if arr.len() <= 1 {
        return;
    }
    // Small subarrays are faster with insertion sort due to lower overhead.
    if arr.len() <= 10 {
        insertion_sort(arr);
        return;
    }
 
    // Randomize the pivot to avoid worst-case behaviour on sorted inputs.
    let pivot_idx = random_pivot_index(arr.len());
    arr.swap(pivot_idx, arr.len() - 1); // move pivot to last slot
 
    // Three-way partition: returns (lt, gt) such that:
    //   arr[..lt]   < pivot
    //   arr[lt..gt] == pivot  (all duplicates handled in one pass)
    //   arr[gt..]   > pivot
    let (lt, gt) = three_way_partition(arr);
    quick_sort_optimized(&mut arr[..lt]);
    quick_sort_optimized(&mut arr[gt..]);
}
 
/// Dutch National Flag three-way partition.
/// Pivot is arr[last]. Returns (lt, gt).
fn three_way_partition(arr: &mut [i32]) -> (usize, usize) {
    let pivot = arr[arr.len() - 1];
    let mut lt = 0;         // arr[..lt] < pivot
    let mut gt = arr.len(); // arr[gt..] > pivot
    let mut i  = 0;         // scanning cursor
 
    while i < gt {
        match arr[i].cmp(&pivot) {
            Ordering::Less => {
                arr.swap(lt, i);
                lt += 1;
                i  += 1;
            }
            Ordering::Equal => {
                // Element belongs in the equal region — just advance.
                i += 1;
            }
            Ordering::Greater => {
                gt -= 1;
                arr.swap(i, gt);
                // Do not advance i: the swapped-in element hasn't been
                // classified yet.
            }
        }
    }
    (lt, gt)
}
 
/// Simple insertion sort for small slices.
fn insertion_sort(arr: &mut [i32]) {
    for i in 1..arr.len() {
        let key = arr[i];
        let mut j = i;
        while j > 0 && arr[j - 1] > key {
            arr[j] = arr[j - 1];
            j -= 1;
        }
        arr[j] = key;
    }
}
 
/// Pseudo-random pivot index using a simple LCG.
/// A real implementation would use rand::thread_rng().
fn random_pivot_index(len: usize) -> usize {
    // For illustration; in production use the `rand` crate.
    use std::time::{SystemTime, UNIX_EPOCH};
    let seed = SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .map(|d| d.subsec_nanos())
        .unwrap_or(12345) as usize;
    seed % len
}
 
fn main() {
    let mut data = vec![10, 7, 8, 9, 1, 5];
    quick_sort_optimized(&mut data);
    println!("{:?}", data); // [1, 5, 7, 8, 9, 10]
 
    // Stress test: sorted input (worst case for naive Lomuto).
    let mut sorted_input: Vec<i32> = (0..10_000).collect();
    quick_sort_optimized(&mut sorted_input);
    assert!(sorted_input.windows(2).all(|w| w[0] <= w[1]));
 
    // Stress test: all duplicates (worst case for two-way partition).
    let mut dups = vec![42i32; 10_000];
    quick_sort_optimized(&mut dups);
    assert!(dups.iter().all(|&x| x == 42));
}

Why each improvement matters:

Randomized pivot — converts the adversarial sorted-input case from guaranteed O(n²) to O(n²) with negligible probability. An attacker who controls the input cannot reliably trigger worst-case behaviour.
Three-way partition — collapses all elements equal to the pivot into a middle region that is excluded from both recursive calls. On an array of n identical elements, the naive version is still O(n²) while the three-way version is O(n).
Insertion sort cutoff — for subarrays of ≤ 10 elements, recursion overhead exceeds the comparison savings. Insertion sort has excellent cache behaviour and no function-call cost on tiny inputs.

Time Complexity

Best and average case — O(n log n)

When each partition splits the array into two roughly equal halves the recurrence is:

T(n) = 2 T(n/2) + O(n)

This is identical to merge sort's recurrence. By the Master Theorem (Case 2) it solves to O(n log n).

Worst case — O(n²)

Worst case occurs when every pivot selection produces a maximally unbalanced split: one side has n−1 elements, the other has 0. This happens on sorted or reverse-sorted input with a fixed last-element pivot. The recurrence becomes:

T(n) = T(n-1) + T(0) + O(n)
     = T(n-1) + O(n)

Unrolling:
T(n) = T(n-1) + cn
     = T(n-2) + c(n-1) + cn
     = T(n-3) + c(n-2) + c(n-1) + cn
     ...
     = c · (1 + 2 + 3 + ... + n)
     = c · n(n+1)/2
     = O(n²)

With randomized pivot selection, the probability that every single partition is maximally unbalanced falls to (2/n)ⁿ which is astronomically small, so the expected case is O(n log n) regardless of input.

Average case (formal)

Let C(n) = expected comparisons for a random pivot chosen uniformly from n elements. Each pivot rank k ∈ has probability 1/n:

C(n) = (n-1) + (1/n) * Σ_{k=1}^{n} [C(k-1) + C(n-k)]
     = (n-1) + (2/n) * Σ_{k=0}^{n-1} C(k)

Solving this recurrence via the substitution C(n) ≈ 2n ln n gives the average case as approximately 1.39 n log₂ n comparisons — only 39% more than the lower bound of n log₂ n.

Space Complexity

Quick sort is in-place — it requires no auxiliary array. The only extra space is the call stack.

Best/Average case:  O(log n)  — balanced splits, tree depth is log n
Worst case:         O(n)      — each partition removes only one element

The optimized version mitigates the worst-case stack depth in two ways: the randomized pivot prevents the degenerate O(n) splits in practice, and a tail-recursion optimization (always recurse into the smaller partition first) guarantees O(log n) stack depth even in the worst case — this is sometimes called the "Sedgewick trick".

Case	Time	Stack
Best	O(n log n)	O(log n)
Average	O(n log n)	O(log n)
Worst	O(n²)	O(n)

Space Complexity

Quick sort uses O(1) auxiliary heap space — all rearrangements happen in the original array via swaps. The only overhead is the implicit call stack, which is O(log n) on average and O(n) in the degenerate worst case. The optimized implementation reduces the practical worst-case stack depth through randomized pivots, and can be bounded theoretically to O(log n) by always recursing on the smaller partition first and using tail-call elimination on the larger one.