Friday, June 19, 2015

Sorting algorithm - Heap Sort

Recall that selection sort algorithm which sorts by finding the minimum element from the unsorted part and put it at the front (Selection Sort). Similarly, it will work if we find the maximum element from the unsorted part and put it at the end. However, the time complexity of selection sort is \(\Theta(n^2)\). Interestingly, the time complexity can be improved substantially if a right data structure is used. Binary heap is a such data structure.


Binary heap

Binary heap is an almost-complete binary tree with each node satisfying the heap property: If v and p(v) are a node and its parent, then the key of the item stored in p(v) is not less (or greater) than the key of the item stored in v. The former is called as max heap and the latter is called as min heap.

A binary heap T with n nodes can be represented by an array H[1..n] as following:

  • The root of T is stored in H[1].
  • The left child of a node H[i] is stored in H[2j] and the right child is stored in H[2j+1].
  • The parent of a node H[i] is stored in \(H[\lfloor \frac j 2 \rfloor]\).
Below is a max binary heap and its array representation.


Operations on heap

A heap data structure normally supports the following operations:

  • peek(): Retrieve, but don't remove the root element of the heap.
  • poll(): Retrieve and remove the root element of the heap.
  • insert(e): Insert an element into the heap.
  • delete(i): Delete the ith element from the heap.
  • makeHeap(A): Transform array A into a heap.
And there are two basic operations that are used to implement the above operations:
  • shiftUp(i): Move the ith element up in the heap if the key of the element is greater than the key of its parent.
  • shiftDown(i): Move the ith element down in the heap if the key of the element is less than the key of its children.
note: The above operations are for max heap. It is easy to modify them to be used for min heap.

To implement heap sort algorithm, only shiftDown(i) and makeHeap(A) operations are needed. The other operations will not be described here. The pseudo code of the two operations are described in ShiftDown and MakeHeap. (Note: for simplicity, instead of 0 to n-1, we use 1 to n as the indices of the heap)

Algorithm 5.1 ShiftDown
Input: An array A[1..n] and an index i between 1 and n.
Output: A[i] is moved down so it is not smaller than its children.
  1. done = false
  2. if 2i > n then exit {comment: node i has no children}
  3. repeat
  4.     i = 2i
  5.     if \({i+1} \le n\) and key(A[i+1]) > key(A[i]) then
  6.         i = i + 1 {comment: use right child if its key is greater than the left child}
  7.     end if
  8.     if key(\(A[\lfloor \frac i 2 \rfloor]\)) < key(A[i]) then
  9.         swap \(A[\lfloor \frac i 2 \rfloor]\) and A[i]
  10.     else done = true
  11.     end if
  12. until 2i > n or done
Algorithm 5.2 MakeHeap
Input: An array A[1..n] of n elements
Output: A[1..n] is transformed into a heap

  1. for i = \(\lfloor \frac n 2 \rfloor\) to 1
  2.     ShiftDown(A, i)
  3. end for

Heap Sort

As in the heap data structure, the maximum (or minimum) element is the root of the heap. If we choose to use the min heap, poll() operation will give us the root of the heap which is the minimum element. So given an array A[1..n], we can sort it in non-decreasing order as follows. First, transform the array into a min heap. Next, for i = 1 to n, call the poll() operation on the min heap and store the element in A[i]. However, we need extra space to store the min heap in this approach. As stated earlier, we can use ShiftDown and MakeHeap to sort the array in place. First, we transform the array A[1..n] into a max heap, then the maximum element is in A[1]. Next, swap A[1] and A[n] so that A[n] is the maximum element in the array. Now, A[1] may be smaller than its children. Therefore, call ShiftDown(A[1..n-1], 1) to transform A[1..n-1] to a max heap. Next, swap A[1] and A[n-1] and transform A[1..n-2] into a max heap. Repeat the procedure until the heap size becomes to 1 which A[1] is the minimum element. The pseudo code is described in HeapSort.

Algorithm 5 HeapSort
Input: An array A[1..n] of n elements.
Output: A[1..n] sorted in non-decreasing order.

  1. MakeHeap(A)
  2. for i = n to 1
  3.     swap A[1] and A[i]
  4.     ShiftDown(A[1..i-1], 1)
  5. end for
A JAVA implementation of this algorithm can be found on GitHub HeapSort.

Time complexity

Let T be the heap of the array A[1..n]. Because T is an almost-complete binary tree, the height of the tree \(h = \lfloor logn \rfloor\). It is easy to observe that the maximum number of comparison performed by ShiftDown happens when move the root element down to the leaf level, so the number of comparison is at most 2(h-1) = 2logn - 2. So the time complexity of ShiftDown is O(logn).

If we move down the ith element in heap A[1..n], the time complexity of ShiftDown is analysed as following. The level of ith element in the tree \(l = \lfloor logi \rfloor\). It is easy to observe that the number of comparison performed by ShiftDown is at most 2(h - l). Since there are \(2^l\) nodes on level l, \(0 \le l \lt h\), the total number of comparison performed by MakeHeap is 
$$\begin{align} \sum\limits_{l=0}^{h-1}(h-l){2^l} & =2^0(h) + 2^1(h-1) + 2^2(h-2) + \cdots + 2^{k-1}(1) \\ & = 1(2^{h-1}) + 2(2^{h-2}) + \cdots + h(2^{h-h}) \\ & = \sum\limits_{i=1}^{h}i2^{h-i} \\ & =2^h\sum\limits_{i=1}^{h} \frac i {2^i} \\ & \le n\sum\limits_{i=1}^{h} \frac i {2^i} \\ & \lt 2n \\ \end{align}$$
The last second equality is because \(2^h \le n\) in an almost-complete binary tree. The last equality follows the geometric series
$$\sum\limits_{i=0}^n c^i = \frac {c^{n+1}-1} {c-1}$$
Differentiating both side of above equation and multiplying by c yields
$$\sum\limits_{i=0}^n ic^i = \frac {nc^{n+2}-nc^{n+1}-c^{n+1}+c} {(c-1)^2}$$ 
Let c = 1/2 yieds
$$\sum\limits_{i=0}^n \frac i {2^i} = 2 - \frac {n+2}{2^n} \lt 2$$
Thus, the time complexity of MakeHeap is O(n).

Based on the above, the time complexity of HeapSort algorithm is calculated as following. The number of comparison performed by MakeHeap at step 1 is \(\le 2n\). The number of comparison performed by ShiftDown at step 4 is \(\lt (n-1)logn\). Thus, the time complexity of HeapSort is
$$C(n) \lt 2n + (n-1)logn = O(nlogn)$$


Reference:
Algorithm Design Techniques and Analysis by M.H. Alsuwaiyel

No comments:

Post a Comment