Tim Sort | Dheerendra Varma

Tim Sort is the production sort algorithm used by Python (list.sort()), Java (Arrays.sort for objects), and Swift. Designed by Tim Peters in 2002, it was built not to win on random benchmarks but to handle the kinds of data that appear in real programs — sequences that are partially sorted, have repeated subsequences, or come from nearly-ordered sources.

The key insight: most real-world data contains natural runs — subsequences that are already ascending (or descending). Pure merge sort ignores this structure and always splits to single elements before merging back up. Tim Sort detects existing runs and works with them directly, dramatically reducing merge work when data is partially ordered. On a fully sorted array, it does O(n) work. On random data, it achieves O(n log n) worst case, guaranteed by a merge stack invariant.

Tim Sort combines two algorithms: insertion sort (O(n²) general, O(n) for nearly-sorted, excellent cache behavior for small arrays) and merge sort (O(n log n) guaranteed, stable). The combination gives the best of both.

Step-by-Step Trace

Input: [5, 21, 7, 23, 19, 10, 3, 17, 1, 8, 14, 6]

n = 12. For this size, minrun = 4 (explained below).

Phase 1 — detect and extend natural runs

Scan left-to-right to find runs. A descending run is immediately reversed.

Scan from index 0:
  [5, 21] → ascending, continue
  [5, 21, 7] → 7 < 21, run ends. Run A = [5, 21]  (length 2)

  Run A is shorter than minrun (4). Extend with insertion sort
  to produce a run of length minrun=4, pulling in next elements:
  Insert 7:  [5, 7, 21]
  Insert 23: [5, 7, 21, 23]
  Run A (extended) = [5, 7, 21, 23]

Scan from index 4:
  [19, 10] → 10 < 19, descending start
  [10, 3]  → 3 < 10, still descending
  [3, 17]  → 17 > 3, descending run ends at index 6
  Descending run reversed: [3, 10, 19]  (length 3)

  Run B is shorter than minrun. Extend:
  Insert 17: [3, 10, 17, 19]
  Run B (extended) = [3, 10, 17, 19]

Scan from index 8:
  [1, 8] → ascending, continue
  [8, 14] → ascending, continue
  [14, 6] → 6 < 14, run ends at index 10
  Run C = [1, 8, 14]  (length 3)

  Extend:
  Insert 6: [1, 6, 8, 14]
  Run C (extended) = [1, 6, 8, 14]

Runs on stack: A=[5,7,21,23]  B=[3,10,17,19]  C=[1,6,8,14]

Phase 2 — merge stack (maintaining invariant)

Merge invariant requires (from stack top, calling runs X, Y, Z):
  |Z| > |Y| + |X|
  |Y| > |X|

With three equal-length runs, invariant is violated.
Merge B and C (the two smallest adjacent):
  B=[3,10,17,19] + C=[1,6,8,14]
  merged BC = [1,3,6,8,10,14,17,19]

Now stack: A=[5,7,21,23]  BC=[1,3,6,8,10,14,17,19]
|A|=4, |BC|=8 → |BC| > |A| ✓

Merge A and BC:
  A=[5,7,21,23] + BC=[1,3,6,8,10,14,17,19]
  merged = [1,3,5,6,7,8,10,14,17,19,21,23]  ✓

Naive Implementation

The naive version finds natural runs, extends short ones with insertion sort, then merges pairs left-to-right without enforcing the stack invariant.

fn insertion_sort_range(data: &mut [i32]) {
    let n = data.len();
    for i in 1..n {
        let key = data[i];
        let mut j = i;
        while j > 0 && data[j - 1] > key {
            data[j] = data[j - 1];
            j -= 1;
        }
        data[j] = key;
    }
}
 
fn merge(left: &[i32], right: &[i32]) -> Vec<i32> {
    let mut result = Vec::with_capacity(left.len() + right.len());
    let (mut i, mut j) = (0, 0);
    while i < left.len() && j < right.len() {
        if left[i] <= right[j] {
            result.push(left[i]);
            i += 1;
        } else {
            result.push(right[j]);
            j += 1;
        }
    }
    result.extend_from_slice(&left[i..]);
    result.extend_from_slice(&right[j..]);
    result
}
 
fn tim_sort_naive(data: &mut Vec<i32>, min_run: usize) {
    let n = data.len();
    if n <= 1 {
        return;
    }
 
    // Phase 1: find runs, extend to min_run with insertion sort
    let mut runs: Vec<Vec<i32>> = Vec::new();
    let mut i = 0;
 
    while i < n {
        let mut run_end = i + 1;
 
        if run_end < n {
            if data[run_end] < data[i] {
                // Descending run — find its extent, then reverse
                while run_end < n && data[run_end] < data[run_end - 1] {
                    run_end += 1;
                }
                data[i..run_end].reverse();
            } else {
                // Ascending run
                while run_end < n && data[run_end] >= data[run_end - 1] {
                    run_end += 1;
                }
            }
        }
 
        // Extend short run to min_run using insertion sort
        let extend_to = (i + min_run).min(n);
        if run_end < extend_to {
            run_end = extend_to;
        }
        insertion_sort_range(&mut data[i..run_end]);
        runs.push(data[i..run_end].to_vec());
 
        i = run_end;
    }
 
    // Phase 2: merge pairs left-to-right (no stack invariant)
    while runs.len() > 1 {
        let mut merged_runs = Vec::new();
        let mut k = 0;
        while k < runs.len() {
            if k + 1 < runs.len() {
                let merged = merge(&runs[k], &runs[k + 1]);
                merged_runs.push(merged);
                k += 2;
            } else {
                merged_runs.push(runs[k].clone());
                k += 1;
            }
        }
        runs = merged_runs;
    }
 
    // Write merged result back
    if let Some(sorted) = runs.into_iter().next() {
        data.copy_from_slice(&sorted);
    }
}
 
fn main() {
    let mut data = vec![5i32, 21, 7, 23, 19, 10, 3, 17, 1, 8, 14, 6];
    tim_sort_naive(&mut data, 4);
    println!("{:?}", data); // [1, 3, 5, 6, 7, 8, 10, 14, 17, 19, 21, 23]
}

The naive version uses a fixed minrun and merges without the invariant, which can produce O(n log² n) behavior on adversarial inputs.

Optimized Implementation

The optimized version adds two of Tim Sort's critical engineering details: the calc_minrun formula that keeps run counts a power of two (ensuring balanced merges), and the merge stack invariant that guarantees O(n log n) worst-case behavior.

/// Compute minrun: the top 6 bits of n, plus 1 if any lower bit is set.
/// This ensures n/minrun is always a power of two (or close to it),
/// making all merges balanced and achieving O(n log n) worst case.
fn calc_minrun(mut n: usize) -> usize {
    let mut r = 0usize; // becomes 1 if any low bit is set
    while n >= 64 {
        r |= n & 1;
        n >>= 1;
    }
    n + r
}
 
fn insertion_sort_range(data: &mut [i32]) {
    for i in 1..data.len() {
        let key = data[i];
        let mut j = i;
        while j > 0 && data[j - 1] > key {
            data[j] = data[j - 1];
            j -= 1;
        }
        data[j] = key;
    }
}
 
fn merge_slices(left: &[i32], right: &[i32]) -> Vec<i32> {
    let mut out = Vec::with_capacity(left.len() + right.len());
    let (mut i, mut j) = (0, 0);
    while i < left.len() && j < right.len() {
        // <= preserves stability: equal elements from left come first
        if left[i] <= right[j] {
            out.push(left[i]);
            i += 1;
        } else {
            out.push(right[j]);
            j += 1;
        }
    }
    out.extend_from_slice(&left[i..]);
    out.extend_from_slice(&right[j..]);
    out
}
 
fn tim_sort(data: &mut Vec<i32>) {
    let n = data.len();
    if n <= 1 {
        return;
    }
 
    let min_run = calc_minrun(n);
 
    // Phase 1: identify runs, extend to min_run, push onto stack
    // Stack holds (start_index, length) pairs
    let mut run_stack: Vec<(usize, usize)> = Vec::new();
    let mut i = 0;
 
    while i < n {
        let mut run_end = i + 1;
 
        if run_end < n {
            if data[run_end] < data[i] {
                // Strictly descending — extend and reverse
                while run_end < n && data[run_end] < data[run_end - 1] {
                    run_end += 1;
                }
                data[i..run_end].reverse();
            } else {
                // Non-descending
                while run_end < n && data[run_end] >= data[run_end - 1] {
                    run_end += 1;
                }
            }
        }
 
        // Extend to min_run if too short
        let extend_to = (i + min_run).min(n);
        if run_end < extend_to {
            run_end = extend_to;
        }
        insertion_sort_range(&mut data[i..run_end]);
 
        run_stack.push((i, run_end - i));
        i = run_end;
 
        // Enforce merge invariant after each push.
        // Let X = stack top, Y = one below, Z = two below.
        // Invariant: |Z| > |Y| + |X|  AND  |Y| > |X|
        // Merge until invariant holds.
        enforce_invariant(data, &mut run_stack);
    }
 
    // Final pass: merge everything remaining on the stack
    while run_stack.len() > 1 {
        let len = run_stack.len();
        let (s1, l1) = run_stack[len - 2];
        let (s2, l2) = run_stack[len - 1];
        let merged = merge_slices(&data[s1..s1 + l1], &data[s2..s2 + l2]);
        data[s1..s1 + l1 + l2].copy_from_slice(&merged);
        run_stack.pop();
        run_stack.last_mut().unwrap().1 += l2;
    }
}
 
fn enforce_invariant(data: &mut Vec<i32>, stack: &mut Vec<(usize, usize)>) {
    loop {
        let len = stack.len();
        if len < 2 {
            break;
        }
 
        let should_merge = if len >= 3 {
            let (_, lz) = stack[len - 3];
            let (_, ly) = stack[len - 2];
            let (_, lx) = stack[len - 1];
            // Merge Y and X if either condition is violated
            lz <= ly + lx || ly <= lx
        } else {
            let (_, ly) = stack[len - 2];
            let (_, lx) = stack[len - 1];
            ly <= lx
        };
 
        if !should_merge {
            break;
        }
 
        // Merge the top two runs
        let (s2, l2) = stack[len - 1];
        let (s1, l1) = stack[len - 2];
        let merged = merge_slices(&data[s1..s1 + l1], &data[s2..s2 + l2]);
        data[s1..s1 + l1 + l2].copy_from_slice(&merged);
        stack.pop();
        stack.last_mut().unwrap().1 += l2;
    }
}
 
fn main() {
    let mut data = vec![5i32, 21, 7, 23, 19, 10, 3, 17, 1, 8, 14, 6];
    tim_sort(&mut data);
    println!("{:?}", data); // [1, 3, 5, 6, 7, 8, 10, 14, 17, 19, 21, 23]
}

calc_minrun explained. The formula extracts the top 6 bits of n and adds 1 if any remaining bits are set. This ensures that n / minrun is always between 32 and 64, and that it is close to a power of two — so the runs merge in balanced pairs rather than degrading to a linked-list-style merge (one run absorbed into a much larger one at each step).

Galloping mode. Real Tim Sort implementations include a galloping optimization: when one side of a merge consistently "wins" many consecutive comparisons, the algorithm switches from element-by-element comparison to an exponential search to find where the winning run ends. This gives O(log k) comparisons instead of O(k) when one run dominates, providing a significant constant-factor improvement on inputs with long sorted subsequences. It is not implemented above to keep the structure clear.

Time Complexity

Case	Time	Explanation
Best	O(n)	Input is a single ascending run; one scan, zero merges
Average	O(n log n)	Random input produces O(n / minrun) runs; balanced merges give O(log n) merge levels
Worst	O(n log n)	Merge invariant guarantees this regardless of run structure

The O(n) best case is a real gain, not a degenerate edge. Partially sorted data — common in practice — produces fewer, longer runs and fewer merge levels, shrinking the constant inside O(n log n) substantially. A perfectly reverse-sorted array still achieves O(n log n) because each descending run is detected and reversed in O(run length), then merged normally.

Space Complexity

O(n): the merge step requires a temporary buffer of up to n/2 elements (Tim Sort merges the smaller run into a buffer, not always the full combined size). The run stack holds at most O(log n) entries. In the implementation above, the merge buffer is the full merged slice, giving O(n) auxiliary space.