Geek's Pearls: June 2015

Monday, June 29, 2015

Git with Eclipse (4) - pushing local repositories to remote repositories

This post describes how to create a remote repository at GitHub and how to push a local repository to remote repository at GitHub using EGit.

Create a repository at GitHub

Firstly, we need a remote repository to start. I will use GitHub to host our remote repository. Sign up from here to get an account at GitHub. After sign in, click the

icon and select New repository.

In the next screen, give a name "HelloWorld" to our repository. You can also write a brief description about the repository, so other people will know what is your repository about. Choose Public as the repository type, so it can be accessed by anyone with restriction on modification. Private repository is available to paid users. If you tick Initialize this repository with a README, GitHub will create a read me file in the repository. We leave it as it is so we will have a clean repository. Then click the Create repository button and our HelloWorld repository is created.

In the next screen, you can see the HelloWorld repository is empty and GitHub provides several suggestions to start with. We will leave it for now and go back to HelloWold project in Eclipse.

SSH configuration

If you are going to use SSH protocol instead of HTTP for read and write access to the remote repository at GitHub, you have to configure the SSH key in Eclipse and GitHub.

Eclipse SSH configuration

Open Eclipse Preferences via Windows -> Preferences. Then go to General -> Network Connections -> SSH2. Select Key Management tab and click the Generate RSA Key... button. Eclipse will generate a public key (Copy the public key to a text file and it will be uploaded to GitHub later). Then enter a pass phrase in Passphrase: and Confirm passphrase: fields, it will be used to protect the private key. Click the Save Private Key... button to save the private key in your computer (usually it is saved in C:\Users\<username>\.ssh). Then go to General tab and enter the path of the private key (usually is C:\Users\<username>\.ssh) in SSH2 home: field. Click Apply button to save the configuration.

GitHub SSH configuration

icon on the top right of the screen and select Settings in the menu. Choose SSH Keys in the navigator on the left. Then click Add SSH Key button and paste the Eclipse generated public key into the Key text box. Title field is optional.

Click Add key button and you will be asked to confirm your password. After the password is confirmed, the public key is added to GitHub. Now you are able to use SSH protocol to access your repositories at GitHub.

Push the local repository to remote repository

Firstly, we need a local repository to push. Create the HelloWorld project in Eclipse and share it in a local Git repository. Please refer to Git with Eclipse (3) - basic use cases in EGit with local repository for how to share project in local repository.

Secondly, because we created an empty repository at GitHub in previous step, we need to create a branch in the remote repository. Open Git perspective in Eclipse by selecting Windows -> Open Perspective -> Other....in the menu, then choose Git in the opened dialog.

In the Git perspective, expand HelloWorld repository, right click on Remotes and select Create Remote... In the menu.

Give a remote name in the opened dialog (Usually is origin by default) and select Configure push, then click OK.

The Configure Push dialog asks for the URI of the remote repository. we can copy the URI from HelloWorld repository at GitHub (we will use SSH URI in our example). Log in to GitHub and open HelloWorld repository. In the top section of the page, click on SSH and click the

icon to copy the SSH URI.

Then go back to Eclipse. Click the Change... button in Configure Push dialog and paste the SSH URI into the URI: field in the opened Destination Git Repository dialog. Select ssh in the Protocol: field. Authentication Users: is git. Leave the rest as what it is and click Finish. You will be asked for the pass phrase. Enter the pass phrase you have set up in the SSH configuration and click OK.

Then click Save and Push button in Configure Push dialog.

A confirmation window will be shown once the repository is pushed to GitHub successfully.

Click OK to close the window. Log in to GitHub and open HelloWorld repository. You will see the HelloWorld project is now available at GitHub.

Now the HelloWorld project is ready to be forked or collaborated by other contributors. Also, you can fetch any updates in the HelloWorld project from GitHub and push your changes in HelloWorld project to GitHub. For how to clone a remote repository and fetch/push changes. Please read Git with Eclipse (5) - work with remote repositories (coming soon...).

Reference:

https://wiki.eclipse.org/EGit/User_Guide

Friday, June 19, 2015

Sorting algorithm - Heap Sort

Recall that selection sort algorithm which sorts by finding the minimum element from the unsorted part and put it at the front (Selection Sort). Similarly, it will work if we find the maximum element from the unsorted part and put it at the end. However, the time complexity of selection sort is $\Theta(n^2)$. Interestingly, the time complexity can be improved substantially if a right data structure is used. Binary heap is a such data structure.

Binary heap

Binary heap is an almost-complete binary tree with each node satisfying the heap property: If v and p(v) are a node and its parent, then the key of the item stored in p(v) is not less (or greater) than the key of the item stored in v. The former is called as max heap and the latter is called as min heap.

A binary heap T with n nodes can be represented by an array H[1..n] as following:

The root of T is stored in H[1].
The left child of a node H[i] is stored in H[2j] and the right child is stored in H[2j+1].
The parent of a node H[i] is stored in $H[\lfloor \frac j 2 \rfloor]$.

Below is a max binary heap and its array representation.

Operations on heap

A heap data structure normally supports the following operations:

peek(): Retrieve, but don't remove the root element of the heap.
poll(): Retrieve and remove the root element of the heap.
insert(e): Insert an element into the heap.
delete(i): Delete the ith element from the heap.
makeHeap(A): Transform array A into a heap.

And there are two basic operations that are used to implement the above operations:

shiftUp(i): Move the ith element up in the heap if the key of the element is greater than the key of its parent.
shiftDown(i): Move the ith element down in the heap if the key of the element is less than the key of its children.

note: The above operations are for max heap. It is easy to modify them to be used for min heap.

To implement heap sort algorithm, only shiftDown(i) and makeHeap(A) operations are needed. The other operations will not be described here. The pseudo code of the two operations are described in ShiftDown and MakeHeap. (Note: for simplicity, instead of 0 to n-1, we use 1 to n as the indices of the heap)

Algorithm 5.1 ShiftDown

Input: An array A[1..n] and an index i between 1 and n.

Output: A[i] is moved down so it is not smaller than its children.

done = false
if 2i > n then exit {comment: node i has no children}
repeat
i = 2i
if ${i+1} \le n$ and key(A[i+1]) > key(A[i]) then
i = i + 1 {comment: use right child if its key is greater than the left child}
end if
if key($A[\lfloor \frac i 2 \rfloor]$) < key(A[i]) then
swap $A[\lfloor \frac i 2 \rfloor]$ and A[i]
else done = true
end if
until 2i > n or done

Algorithm 5.2 MakeHeap
Input: An array A[1..n] of n elements
Output: A[1..n] is transformed into a heap

for i = $\lfloor \frac n 2 \rfloor$ to 1
ShiftDown(A, i)
end for

Heap Sort

As in the heap data structure, the maximum (or minimum) element is the root of the heap. If we choose to use the min heap, poll() operation will give us the root of the heap which is the minimum element. So given an array A[1..n], we can sort it in non-decreasing order as follows. First, transform the array into a min heap. Next, for i = 1 to n, call the poll() operation on the min heap and store the element in A[i]. However, we need extra space to store the min heap in this approach. As stated earlier, we can use ShiftDown and MakeHeap to sort the array in place. First, we transform the array A[1..n] into a max heap, then the maximum element is in A[1]. Next, swap A[1] and A[n] so that A[n] is the maximum element in the array. Now, A[1] may be smaller than its children. Therefore, call ShiftDown(A[1..n-1], 1) to transform A[1..n-1] to a max heap. Next, swap A[1] and A[n-1] and transform A[1..n-2] into a max heap. Repeat the procedure until the heap size becomes to 1 which A[1] is the minimum element. The pseudo code is described in HeapSort.

Algorithm 5 HeapSort
Input: An array A[1..n] of n elements.
Output: A[1..n] sorted in non-decreasing order.

MakeHeap(A)
for i = n to 1
swap A[1] and A[i]
ShiftDown(A[1..i-1], 1)
end for

A JAVA implementation of this algorithm can be found on GitHub HeapSort.

Time complexity

Let T be the heap of the array A[1..n]. Because T is an almost-complete binary tree, the height of the tree $h = \lfloor logn \rfloor$. It is easy to observe that the maximum number of comparison performed by ShiftDown happens when move the root element down to the leaf level, so the number of comparison is at most 2(h-1) = 2logn - 2. So the time complexity of ShiftDown is O(logn).

If we move down the ith element in heap A[1..n], the time complexity of ShiftDown is analysed as following. The level of ith element in the tree $l = \lfloor logi \rfloor$. It is easy to observe that the number of comparison performed by ShiftDown is at most 2(h - l). Since there are $2^l$ nodes on level l, $0 \le l \lt h$, the total number of comparison performed by MakeHeap is

$$\begin{align} \sum\limits_{l=0}^{h-1}(h-l){2^l} & =2^0(h) + 2^1(h-1) + 2^2(h-2) + \cdots + 2^{k-1}(1) \\ & = 1(2^{h-1}) + 2(2^{h-2}) + \cdots + h(2^{h-h}) \\ & = \sum\limits_{i=1}^{h}i2^{h-i} \\ & =2^h\sum\limits_{i=1}^{h} \frac i {2^i} \\ & \le n\sum\limits_{i=1}^{h} \frac i {2^i} \\ & \lt 2n \\ \end{align}$$

The last second equality is because $2^h \le n$ in an almost-complete binary tree. The last equality follows the geometric series

$$\sum\limits_{i=0}^n c^i = \frac {c^{n+1}-1} {c-1}$$

Differentiating both side of above equation and multiplying by c yields

$$\sum\limits_{i=0}^n ic^i = \frac {nc^{n+2}-nc^{n+1}-c^{n+1}+c} {(c-1)^2}$$

Let c = 1/2 yieds

$$\sum\limits_{i=0}^n \frac i {2^i} = 2 - \frac {n+2}{2^n} \lt 2$$

Thus, the time complexity of MakeHeap is O(n).

Based on the above, the time complexity of HeapSort algorithm is calculated as following. The number of comparison performed by MakeHeap at step 1 is $\le 2n$. The number of comparison performed by ShiftDown at step 4 is $\lt (n-1)logn$. Thus, the time complexity of HeapSort is

$$C(n) \lt 2n + (n-1)logn = O(nlogn)$$

Reference:

http://geeksquiz.com/heap-sort/

Algorithm Design Techniques and Analysis by M.H. Alsuwaiyel

Sunday, June 14, 2015

Sorting algorithm - Merge Sort and Quick Sort

In this post, I will introduce two divide-and-conquer sorting algorithms - Merge Sort and Quick Sort.

Merge Sort

The merge sort algorithm divides input array in two halves and sort the two sub arrays respectively, then merge them to get the desired sorted array. As to the sorting method used for each half, merge sort algorithm use itself recursively.

Let A[0..n] be an array of n elements. The merge sort algorithm divides A to A[0..$\frac n 2$] and A[$\frac n 2+1$..n]. And it performs itself on the two sub arrays to make them sorted. Then it merges two sub arrays to get A[0..n] sorted. The diagram from Wikipedia illustrates the process of merge sort algorithm.

The pseudo code is described in MergeSort.

Algorithm 3 MergeSort
Input: An array A[0..n] of n elements.
Output: A[0..n] sorted in non-decreasing order.

mergesort(A, 0, n)

Function mergesort(A, low, high)

if low < high
mid = (low + high)/2
mergesort(A, low, mid)
mergesort(A, mid+1, high)
merge(A, low, mid, high)
end if

Function merge(A, p, q, r)

Input: An array A[0..n] and three indices p, q and r with $0 \le p \le q \lt r \le n$. Both sub arrays A[p..q] and A[q+1..r] are sorted in non-decreasing order.

Output: A[p..r] contains the elements from A[p..q] and A[q+1..r] sorted in non-decreasing order.

B[p..r] is an auxiliary array
i = p, j = q+1, k = p
while ($i \le q$) and ($j \le r$)
if A[i] < A[j] then
B[k] = A[i]
i = i + 1
else
B[k] = A[j]
j = j + 1
end if
k = k + 1
end while
if $i \le q$ then
B[k..r] = A[i..q]
end if
if $j \le r$ then
B[k..r] = A[j..r]
end if
A[p..r] = B[p..r]

A JAVA implementation of this algorithm can be found on GitHub Merge Sort.

Time complexity
The basic operation of merge sort algorithm is element comparison. Let C(n) be the number of element comparisons. For simplicity, assume n is a power of 2. If n = 1, C(n) = 0 because the algorithm does not perform any element comparison. If n > 1, in function mergesort(A, low, high), step 3 and step 4 perform C($\frac n 2$) element comparisons. And step 5 calls function merge(A, p, q, r). It is easy to see the number of element comparison performed by function merge is between $\frac n 2$ and n - 1. Thus, the minimum number of comparison is

$$C(n) = \begin{cases}0 & n=1 \\ 2C(\frac n 2) + \frac n 2 & n \ge 2 \end{cases}$$

Proceed to solve the recurrence by expansion as follows. Note: we assume $n=2^k$.

$$\begin{align}C(n) & = 2C(\frac n 2) + \frac n 2 \\ & = 2^2C(\frac n {2^2}) + 2\frac n 2 \\ & = ... \\ & = 2^kC(\frac n {2^k}) + k\frac n 2 \\ & = 2^kC(1) + k\frac n 2 \\ & = \frac {nlogn} 2\end{align}$$

The maximum number of comparison is

$$C(n) = \begin{cases}0 & n=1 \\ 2C(\frac n 2) + n - 1 & n \ge 2 \end{cases}$$

Proceed to solve the recurrence by expansion as follows. Note: we assume $n=2^k$.

$$\begin{align}C(n) & = 2C(\frac n 2) + n - 1 \\ & = 2^2C(\frac n {2^2}) + 2n - 2 - 1 \\ & = ... \\ & = 2^kC(\frac n {2^k}) + kn - 2^k-1 - 2^k-2 - ... - 2 - 1 \\ & = 2^kC(1) + kn - \sum\limits_{i=0}^{k-1} 2^i \\ & = kn - 2^k + 1 \\ & = nlogn - n + 1\end{align}$$

According to the analysis above, The time complexity of merge sort algorithm is $C(n)=\Omega(nlogn) = O(nlogn) = \Theta(nlogn)$

Quick Sort
The quick sort algorithm is another divide-and-conquer sorting algorithm. It divides the input array by a chosen pivot, so that all elements smaller than the pivot are at the left of it and all greater elements are at the right. This procedure is called partitioning or splitting. An important observation is that after splitting the array by the pivot, the pivot is in its correct position. Recursively perform the same splitting procedure on each side of the pivot until all elements are in their correct position, the array therefore is sorted. There are many different ways to choose a pivot in the array, here I will pick the first element as the pivot. However, it leads to quadratic running time in the worst case by always picking the first element as the pivot. This will be discussed later in the time complexity analysis of quick sort algorithm. The animation below from Wikipedia illustrates the idea of quick sort algorithm.

The pseudo code is described in QuickSort.

Algorithm 4 QuickSort
Input: An array A[0..n] of n elements.
Output: A[0..n] sorted in non-decreasing order.

quicksort(A, 0, n)

Function quicksort(A, low, high)

if low < high then
w = split(A, low, high)
quicksort(A, low, w-1)
quicksort(A, w+1, high)
end if

Function split(A, low, high)

Input: An array A[0..n] and two indices low and high with $0 \le low < high \le n$.

Output: An index w with $low \le w \le high$. And w is the correct position of A[low] in A[low..high].

i = low
x = A[low]
for j = i + 1 to high
if A[j] < x then
i = i + 1
if $i \ne j$ then swap A[i] and A[j] comment: move the smaller element to the left
end if
end for
swap A[low] and A[i] comment: i is the correct position of A[low]
return i

A JAVA implementation of this algorithm can be found on GitHub Quick Sort.

Time complexity

The average case scenario
The time complexity of quick sort algorithm depends on the relative order of the elements in the input array. To analyse the average behavior of the algorithm, assume the distribution of each element in the input array is equally likely. Let C(n) be the number of elements comparisons performed by the quick sort algorithm. It is easy to see the number of elements comparison performed by split in step 2 is high - low - 1, which is n - 1 for high = n, low = 0. The recursive call in steps 3 and 4 cost C(w-1) and C(n-w). Based on the assumption of each element is equally likely to be the first element of the input array and thus chosen as the pivot, the probability that any element be the pivot is 1/n. Hence, the total number of comparison is
$$C(n) = (n-1) + \frac {1} {n} \sum\limits_{w=1}^n(C(w-1) + C(n-w))$$
Since
$$\sum\limits_{w=1}^nC(n-w)=C(n-1)+C(n-2)+...+C(0)=\sum\limits_{w=1}^nC(w-1)$$
Thus
$$C(n)=(n-1) + \frac {2} {n} \sum\limits_{w=1}^nC(w-1)$$
Multiply the equation by n
$$nC(n)=n(n-1) + 2\sum\limits_{w=1}^{n}C(w-1)$$
Replace n by n-1
$$(n-1)C(n-1)=(n-1)(n-2) + 2\sum\limits_{w=1}^{n-1}C(w-1)$$
Subtracting the above two equations
$$nC(n) - (n-1)C(n-1)=2(n-1) + 2C(n-1)$$
Rearrange the equation yields
$$\frac {C(n)} {n+1} = \frac {C(n-1)} {n} + \frac {2(n+1)} {n(n+1)}$$
Let $D(n) = \frac {C(n)} {n+1}$
$$D(n) = D(n-1) + \frac {2(n+1)} {n(n+1)}$$
Resolve the above equation
$$D(n) = 2\sum\limits_{i=1}^n\frac{i-1}{i(i+1)}=2\sum\limits_{i=1}^n\frac 1 j - \frac {4n} {n+1}=2lnn - \Theta(1)=\frac 2 {log e}log n-\Theta(1) \approx 1.44log n$$
Consequently,
$$C(n) = (n+1)D(n) \approx 1.44nlogn = \Theta(nlogn)$$

The worst case scenario
Suppose that in every call to quicksort, the pivot which is A[low], is the smallest element in the array. It results the w returned by split is equal to low. Then the next recursive calls will be quicksort(A, 0, -1) and quicksort(A, 1, n), and quicksort(A, 0, -1) will do nothing. Thus, if the input array is already sorted in non-decreasing order, the call sequence to quicksort is below:

quicksort(A, 0, n), quicksort(A, 1, n), ... , quicksort(A, n-1, n), quicksort(A, n, n)

These result in calls to split as following:

split(A, 0, n), split(A, 1, n), ... , split(A, n-1, n), split(A, n, n)

The number of elements comparison performed by split is high - low - 1. Thus, the total number of comparison done by quick sort algorithm in the worst case is

$$(n-1) + (n-2) + ... + 1 + 0 = \frac {n(n-1)} 2 = \Theta(n^2)$$

The scenario above however is not the only case that results in quadratic running time. If the algorithm always picks the smallest k elements with k is sufficiently small relative to n, the running time of quick sort algorithm is also quadratic. This can be improved to $\Theta(nlogn)$ by always picking the median element as the pivot which splits the input array in balance. But the time complexity of the median finding algorithm needs to be considered carefully. Another approach to improve the quick sort algorithm is to choose the pivot randomly which will result in a very efficient algorithm. It will be discussed in my future post.

Reference:

http://geeksquiz.com/merge-sort/

http://geeksquiz.com/quick-sort/

Algorithm Design Techniques and Analysis by M.H. Alsuwaiyel

Monday, June 8, 2015

Sorting algorithm - Selection Sort and Insertion Sort

This post will introduce two simple sorting algorithms - Selection Sort and Insertion Sort.

Selection Sort

The selection sort algorithm sorts an array by finding minimum element from unsorted part of the array and putting it at the front of the unsorted part.

Let A[0..n] be an array of n elements. The selection sort algorithm sorts the elements in A as follows. First, it finds the minimum element and store it in A[0]. Next, it finds the minimum of the remaining n-1 elements and store it in A[1]. It continues until A is sorted. The pseudo code is described in SelectionSort.

Algorithm 1 SelectionSort
Input: An array A[0..n] of n elements
Output: A[0..n] sorted in non-decreasing order

for i = 0 to n
k = i
for j = i + 1 to n
if A[j] < A[k] k = j
end for
if $i \neq k$ then swap A[i] and A[k]
end for

A JAVA implementation of this algorithm can be found on GitHub - Selection Sort.

Time complexity

The basic operation of this algorithm is the element comparison in step 4. It is easy to see the number of element comparison performed by selection algorithm is $$ \sum\limits_{i=0}^n (n-i) = n + (n-1) + (n-2) + ... + 1 = \sum\limits_{i=1}^n i = \frac{n(n-1)}{2}$$
It is easy to see the number of element comparison performed by selection sort algorithm is always $\frac{n(n-1)}{2}$ regardless of how the input array are ordered. Thus, the time complexity is $\Theta(n^2)$.

Insertion Sort

The insertion sort algorithm sorts an array by inserting one element from the unsorted part of the array into the right position of the sorted part. It works the way of sorting cards in our hands.

Let A[0..n] be an array of n elements. The insertion sort algorithm sorts the elements in A as follows. We begin with the sub array of the first element A[0]. Apparently, A[0] is already sorted because there is only one element. Next, A[1] is inserted before A[0] if it is smaller than A[0], otherwise, A[1] is inserted after A[0]. Continuing this way, A[i] is inserted in the sorted sub array A[0..i-1]. This is done by comparing A[i] to A[i-1] down to A[0] until an element is less than or equal to A[i] is found. If A[j] (j = i-1 to 0) is greater than A[i], shift A[j] to A[j+1]. A[i] is then inserted at the position after the found element. The pseudo code is described in InsertionSort.

Algorithm 2 InsertionSort

Input: An array A[0..n] of n elements

Output: A[0..n] sorted in non-decreasing order

for i = 0 to n
x = A[i]
j = i - 1
while ($j\ge0$) and (A[j] > x)
A[j+1] = A[j]
j = j - 1
end while
A[j+1] = x
end for

A JAVA implementation of this algorithm can be found on GitHub - Insertion Sort.

Time complexity

Unlike selection sort algorithm, the number of element comparison performed by insertion sort algorithm depends on the order of the input array. If the input array is already sorted in non-decreasing order, the number of element comparison is n - 1, because the element A[i] is compared with A[i-1] only. On the other hand, if the input array is already sorted in decreasing order, the number of element comparison is $$ \sum\limits_{i=0}^{n}i = \frac{n(n-1)}{2} $$

As stated above, the number of element comparison performed by insertion sort is between n - 1 and $\frac{n(n-1)}{2}$. The time complexity can be described using the O-notation as O($n^2$).

Based on the discussion above, we can see that the insertion sort algorithm performs better than selection sort algorithm if the input array is not sorted in decreasing order already. But the time complexity of both algorithms is O($n^2$).

Reference:

http://geeksquiz.com/selection-sort/

http://geeksquiz.com/insertion-sort/

Algorithm Design Techniques and Analysis by M.H. Alsuwaiyel

Pages

Monday, June 29, 2015

Git with Eclipse (4) - pushing local repositories to remote repositories

Create a repository at GitHub

SSH configuration

Eclipse SSH configuration

GitHub SSH configuration

Push the local repository to remote repository

Friday, June 19, 2015

Sorting algorithm - Heap Sort

Binary heap

Operations on heap

Heap Sort

Time complexity

Sunday, June 14, 2015

Sorting algorithm - Merge Sort and Quick Sort

Monday, June 8, 2015

Sorting algorithm - Selection Sort and Insertion Sort

Selection Sort

Insertion Sort