Insertion Sort | Dheerendra Varma

Insertion sort builds the sorted array one element at a time by taking each new element and inserting it into its correct position within the already-sorted portion to its left.

The intuition is card sorting. Imagine you are dealt five cards one at a time. When you receive the second card, you compare it to the first and insert it before or after. When you receive the third card, you compare it to the sorted pair and slot it in. By the time all five cards are in your hand, they are sorted — you never thought about the entire hand; you only ever thought about where the next card belongs.

This is the key distinction from bubble and selection sort. Insertion sort maintains a growing sorted prefix and does exactly one insertion per element. It does not scan the full array for a global minimum; it only asks "where does this specific element fit in what I already have sorted?"

Step-by-Step Trace

Starting array: [12, 11, 13, 5, 6]

The vertical bar | separates the sorted prefix (left) from the unsorted remainder (right). Each new element is picked up and shifted left until it finds its slot.

Step 1 — pick up 11, compare against sorted prefix [12]:

[12 | 11, 13, 5, 6]
      ^                  key = 11
 12 > 11 → shift 12 right
[__, 12 | 13, 5, 6]
 ^                       empty slot at index 0
[11, 12 | 13, 5, 6]      insert 11 at index 0

Step 2 — pick up 13, compare against sorted prefix [11, 12]:

[11, 12 | 13, 5, 6]
          ^              key = 13
 12 < 13 → stop, no shift needed
[11, 12, 13 | 5, 6]      13 stays in place

Step 3 — pick up 5, compare against sorted prefix [11, 12, 13]:

[11, 12, 13 | 5, 6]
             ^           key = 5
 13 > 5 → shift 13 right
[11, 12, __, 13 | 6]
         ^               key = 5
 12 > 5 → shift 12 right
[11, __, 12, 13 | 6]
     ^                   key = 5
 11 > 5 → shift 11 right
[__, 11, 12, 13 | 6]
 ^                       empty slot at index 0
[5, 11, 12, 13 | 6]      insert 5 at index 0

Step 4 — pick up 6, compare against sorted prefix [5, 11, 12, 13]:

[5, 11, 12, 13 | 6]
                ^        key = 6
 13 > 6 → shift 13 right
[5, 11, 12, __, 13]
            ^            key = 6
 12 > 6 → shift 12 right
[5, 11, __, 12, 13]
        ^                key = 6
 11 > 6 → shift 11 right
[5, __, 11, 12, 13]
    ^                    key = 6
 5 < 6 → stop
[5, 6, 11, 12, 13]       insert 6 at index 1

Result: [5, 6, 11, 12, 13] — sorted in 4 insertion steps.

Naive Implementation

fn insertion_sort(arr: &mut Vec<i32>) {
    let n = arr.len();
 
    // Start from index 1 — the first element is trivially a sorted prefix of length 1.
    for i in 1..n {
        // Pick up the current element. It needs to find its place in arr[0..i].
        let key = arr[i];
 
        // Walk backwards through the sorted prefix.
        // Shift each element that is larger than `key` one position to the right.
        let mut j = i;
        while j > 0 && arr[j - 1] > key {
            arr[j] = arr[j - 1]; // shift right — this is NOT a swap, it's a single write
            j -= 1;
        }
 
        // j is now the correct insertion index. Place `key` there.
        arr[j] = key;
    }
}

One important subtlety: the inner loop uses shifts, not swaps. A swap touches two elements and requires a temporary; a shift writes only one element. Inserting key into a sorted run of length k takes k shifts but only 1 write for key itself — (k+1) writes total. The equivalent bubble sort approach (pairwise swaps) would take 3k writes. This is why insertion sort is fast in practice for small or nearly-sorted arrays.

Optimized Implementation

Binary insertion sort: the naive version already performs O(1) swaps (it shifts), but it still uses O(n) comparisons per insertion to find the right slot by walking backwards. Since the prefix arr[0..i] is always sorted, we can use binary search to locate the insertion point in O(log n) comparisons. The number of shifts is unchanged (still O(n) in the worst case), but we cut the comparison count from O(n²) to O(n log n).

Rust's slice method partition_point performs this binary search: it returns the leftmost index where the predicate switches from true to false.

fn insertion_sort_optimized(arr: &mut Vec<i32>) {
    let n = arr.len();
 
    for i in 1..n {
        let key = arr[i];
 
        // Binary search the sorted prefix arr[0..i] for the insertion point.
        // partition_point returns the first index where arr[idx] >= key.
        // This is O(log i) comparisons instead of O(i).
        let pos = arr[..i].partition_point(|&x| x < key);
 
        // Shift every element in arr[pos..i] one slot to the right.
        // This makes room at index `pos` for `key`.
        // copy_within handles the overlapping range correctly.
        arr.copy_within(pos..i, pos + 1);
 
        // Place key at its sorted position.
        arr[pos] = key;
    }
}

copy_within(pos..i, pos + 1) is a single memory-move operation — much faster than a manual loop of individual shifts. The comparison count drops to O(n log n), though the overall time complexity remains O(n²) because the shifts still dominate. In practice, binary insertion sort is roughly 2-3× faster than the naive version on random data for small n (≤ 64), which is why it appears as the base case inside Timsort and pdqsort.

Time Complexity

Counting inversions.

An inversion is a pair of indices (i, j) where i < j but arr[i] > arr[j] — an element that is "out of place" relative to some later element. Each shift in the inner loop fixes exactly one inversion. The total work done by insertion sort is proportional to the total number of inversions in the input.

Best case — O(n)

A fully sorted array has zero inversions. The inner while loop never executes because arr[j-1] <= key is immediately true for every i. The outer loop runs n-1 times, each iteration doing one comparison and one write. Total: O(n).

Worst case — O(n²)

A reverse-sorted array has the maximum number of inversions. Every element at index i is greater than all elements to its right, so when we pick up element i, we must shift all i elements in the prefix. Summing:

Total shifts = 0 + 1 + 2 + ... + (n-1)
             = Σ k  for k = 0 to n-1
             = n(n-1) / 2

For n = 5 this is 10. Check against the trace: step 1 did 1 shift, step 3 did 3 shifts, step 4 did 3 shifts — exactly as expected for that particular input.

Average case — O(n²)

For a uniformly random permutation of n elements, the expected number of inversions is:

E[inversions] = C(n, 2) / 2
              = [n(n-1)/2] / 2
              = n(n-1) / 4

This is because for any pair (i, j), the probability that it forms an inversion is exactly 1/2 by symmetry. Since total work = total inversions, expected work = n(n-1)/4 = O(n²).

Case	Inversions	Comparisons (naive)	Comparisons (binary)	Time
Best	0	n-1	n-1	O(n)
Average	n(n-1)/4	~n(n-1)/4	O(n log n)	O(n²)
Worst	n(n-1)/2	n(n-1)/2	O(n log n)	O(n²)

Note: binary insertion sort reduces comparison count to O(n log n) but does not change the time complexity, because the O(n²) shifts remain. It is only a constant-factor win in practice.

Space Complexity

Insertion sort is O(1) auxiliary space. The key variable holds one displaced element — a single scalar. All shifts are in-place writes within the original array; no extra buffer is allocated. The optimized version's copy_within call also operates in-place with no external allocation, using the processor's built-in memory-move instruction internally.