Table of Contents


DATA ANALYTICS REFERENCE DOCUMENT

Document Title:Document Title
Document No.:1552735583
Author(s):Gerhard van der Linde, Rita Raher
Contributor(s):

REVISION HISTORY


Revision


Details of Modification(s)

Reason for modification

Date

By
0 Draft releaseDocument description here 2019/03/16 11:26 Gerhard van der Linde, Rita Raher

Week 8 - Sorting Algorithms Part 1

07_sorting_algorithms_part_1.pdf

Overview

Sorting

Timeline of sorting algorithms

Sorting

Conditions for sorting

Comparing items in a collection

Comparator functions

Inversions

Comparison sorts

Sort keys and satellite data

Desirable properties for sorting algorithms

Stability

Stable sort of flight information

Reference: 1)

Analysing sorting algorithms

Recap: orders of growth

Factors which influence running time

In-place sorting

Overview of sorting algorithms

AlgorithmBest caseWorst caseAverage caseSpace ComplexityStable?
Bubble Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Selection Sort <m>n2</m> <m>n2</m> <m>n2</m> 1 No
Insertion Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Merge Sort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>O(n)</m> Yes
Quicksort <m>n log n</m> <m>n2</m> <m>n log n</m> <m>n</m> (worst case) No*
Heapsort <m>n log n</m> <m>n log n</m> <m>n log n</m> 1 No
Counting Sort <m>n + k</m> <m>n + k</m> <m>n + k</m> <m>n + k</m> Yes
Bucket Sort <m>n + k</m> <m>n2</m> <m>n + k</m> <m>n * k</m> Yes
Timsort <m>n</m> <m>n log n</m> <m>n log n</m> <m>n</m> Yes
Introsort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>log n</m> No

*the standard Quicksort algorithm is unstable, although stable variations do exist

Criteria for choosing a sorting algorithm

Criteria Sorting algorithm
Small number of items to be sorted Insertion Sort
Items are mostly sorted already Insertion Sort
Concerned about worst-case scenarios Heap Sort
Interested in a good average-case behaviour Quicksort
Items are drawn from a uniform dense universe Bucket Sort
Desire to write as little code as possible Insertion Sort
Stable sorting required Merge Sort

Week 9: Sorting Algorithms Part 2

Overview

Review of sorting

Overview of sorting algorithms

AlgorithmBest caseWorst caseAverage caseSpace ComplexityStable?
Bubble Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Selection Sort <m>n2</m> <m>n2</m> <m>n2</m> 1 No
Insertion Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Merge Sort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>O(n)</m> Yes
Quicksort <m>n log n</m> <m>n2</m> <m>n log n</m> <m>n</m> (worst case) No*
Heapsort <m>n log n</m> <m>n log n</m> <m>n log n</m> 1 No
Counting Sort <m>n + k</m> <m>n + k</m> <m>n + k</m> <m>n + k</m> Yes
Bucket Sort <m>n + k</m> <m>n2</m> <m>n + k</m> <m>n * k</m> Yes
Timsort <m>n</m> <m>n log n</m> <m>n log n</m> <m>n</m> Yes
Introsort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>log n</m> No

*the standard Quicksort algorithm is unstable, although stable variations do exist

Comparison sorts

Bubble Sort

Bubble Sort procedure

Bubble Sort example

Bubble Sort in Code

public static void bubblesort(int[]a){
  int outer, inner;
  for(outer = a.length - 1; outer > 0;outer--){ //counting down
   for(inner=0;inner < outer;inner++){ //bubbling up
    if(a[inner] > a[inner+1];  { //if out of order....
    int temp = a[inner]; //...then swap
    a[inner] =a[inner+1];
    a[inner +1] = temp;
    }
   }
  }
 }
bubblesort.py
# Bubble Sort in python
def printArray(arr):
  print (' '.join(str(i) for i in arr))
 
 
def bubblesort(arr):
  for i in range(len(arr)):
    for j in range(len(arr) - i - 1):
      if arr[j] > arr[j + 1]:
        temp = arr[j]
        arr[j] = arr[j + 1]
        arr[j + 1] = temp
    # Print array after every pass
    print ("After pass " + str(i) + "  :", printArray(arr))
 
if __name__ == '__main__':
    arr = [10, 7, 3, 1, 9, 7, 4, 3]
    print ("Initial Array :", printArray(arr))
    bubblesort(arr)

Bubble Sort Example

Analysing Bubble Sort (worst case)

for(outer =a.length-1; outer >0; outer--){
 for(inner=0; inner < outer; inner++){
  if(a[inner]>a[inner+1]){
  //swap code omitted
  }
 }
}

Selection Sort

Selection Sort procedure

Selection Sort example

The element at index 4 is the smallest, so swap with index 0
The element at index 2 is the smallest, so swap with index 1
The element at index 3 is the smallest, so swap with index 2
The element at index 3 is the smallest, so swap with index 3

Selection sort might swap an array element with itself; this is harmless, and not worth checking for

Selection Sort in Code

public static void selectionsort(int[]a){
 int outer=0, inner=0, min=0;
 for(outer = 0; outer <a.length-1;outer++){  //outer counts up
 min = outer;
 for(inner = outer +1; inner <a.length; inner++){
  if(a[inner]<a[min]){ //find index of smallest value
   min = inner;
   }
  }
  //swap a [min] with a [outer]
  int temp = a[outer];
  a[outer] = a[min];
  a[min] = temp;
 }
}
selection_sort.py
# selection sort in python
def printArray(arr):
  return (' '.join(str(i) for i in arr))
 
 
def selectionsort(arr):
  N = len(arr)
  for i in range(0, N):
    small = arr[i]
    pos = i
    for j in range(i + 1, N):
      if arr[j] < small:
        small = arr[j]
        pos = j
    temp = arr[pos]
    arr[pos] = arr[i]
    arr[i] = temp
    print ("After pass " + str(i) + "  :", printArray(arr))
 
if __name__ == '__main__':
    arr = [10, 7, 3, 1, 9, 7, 4, 3]
    print ("Initial Array :", printArray(arr))
    selectionsort(arr)

Analysing Selection Sort

in best, worst and average cases

Insertion Sort

Insertion Sort procedure

Insertion Sort example

a[1]=5 is the key; 7>5 so move 7 right by one position, and insert 5 at index 0
a[2]=2 is the key; 7>2 so move both 7 and 5 right by one position, and insert 2 at index 0
a[3]=3 is the key; 7>3 and 5>3 so move both 7 and 5 right by one position, and insert 3 at index 1
a[4]=1 is the key; 7>1, 5>1, 3>1 and 2>1 so move both 7, 5, 3 and 2 right by one position, and insert 1 at index 1

(done)

Insertion Sort in code

public static void insertionsort(int a[]){
 for(int i=1; i<a.length; i++){
  int key =a[i]; //value to be inseted
  int j = i-1;
  //move all elements > key right one position
  while(j>=0 && a[j]>key){
  a[j+1]= a[j];
  j=j-1;
  }
  a[j+1]=key; //insert key in its new position
 }
 
}
insertion_sort.py
# insertion sort
def printArray(arr):
  return(' '.join(str(i) for i in arr))
 
 
def insertionsort(arr):
  N = len(arr)
  for i in range(1, N):
    j = i - 1
    temp = arr[i]
    while j >= 0 and temp < arr[j]:
      arr[j + 1] = arr[j]
      j -= 1
    arr[j + 1] = temp
    print ("After pass " + str(i) + "  :", printArray(arr))
 
if __name__ == '__main__':
    arr = [10, 7, 3, 1, 9, 7, 4, 3]
    print ("Initial Array :", printArray(arr))
    insertionsort(arr)

Analysing Insertion Sort

Comparison of Simple sorting algorithms

Criteria for choosing a sorting algorithm

Criteria Sorting algorithm
Small number of items to be sorted Insertion Sort
Items are mostly sorted already Insertion Sort
Concerned about worst-case scenarios Heap Sort
Interested in a good average-case behaviour Quicksort
Items are drawn from a uniform dense universe Bucket Sort
Desire to write as little code as possible Insertion Sort
Stable sorting required Merge Sort

Recap

Week 10: Sorting Algorithms Part 3

Overview

Overview of sorting algorithms

AlgorithmBest caseWorst caseAverage caseSpace ComplexityStable?
Bubble Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Selection Sort <m>n2</m> <m>n2</m> <m>n2</m> 1 No
Insertion Sort <m>n</m> <m>n2</m> <m>n2</m> 1 Yes
Merge Sort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>O(n)</m> Yes
Quicksort <m>n log n</m> <m>n2</m> <m>n log n</m> <m>n</m> (worst case) No*
Heapsort <m>n log n</m> <m>n log n</m> <m>n log n</m> 1 No
Counting Sort <m>n + k</m> <m>n + k</m> <m>n + k</m> <m>n + k</m> Yes
Bucket Sort <m>n + k</m> <m>n2</m> <m>n + k</m> <m>n * k</m> Yes
Timsort <m>n</m> <m>n log n</m> <m>n log n</m> <m>n</m> Yes
Introsort <m>n log n</m> <m>n log n</m> <m>n log n</m> <m>log n</m> No

*the standard Quicksort algorithm is unstable, although stable variations do exist

Merge sort

Merge Sort Example

  • Mergesort is based on the following basic idea:
    • if the size of the list is 0 or 1, return.
    • Otherwise, separate the list into two lists of equal or nearly equal size and recursively sort the first and second halves separately.
    • finally, merge the two sorted halves into one sorted list.
  • Clearly, almost all the work is in the merge step, which should be as efficient as possible.
  • Any merge must take at least time that is linear in the total size of the two lists in the worst case, since every element must be looked at in order to determine the correct ordering.

Merge Sort in code

merge_sort.py
  1. # Merge sort
  2. def mergesort(arr, i, j):
  3. mid = 0
  4. if i < j:
  5. mid = int((i + j) / 2)
  6. mergesort(arr, i, mid)
  7. mergesort(arr, mid + 1, j)
  8. merge(arr, i, mid, j)
  9.  
  10.  
  11. def merge(arr, i, mid, j):
  12. print ("Left: " + str(arr[i:mid + 1]))
  13. print ("Right: " + str(arr[mid + 1:j + 1]))
  14. N = len(arr)
  15. temp = [0] * N
  16. l = i
  17. r = j
  18. m = mid + 1
  19. k = l
  20. while l <= mid and m <= r:
  21. if arr[l] <= arr[m]:
  22. temp[k] = arr[l]
  23. l += 1
  24. else:
  25. temp[k] = arr[m]
  26. m += 1
  27. k += 1
  28.  
  29. while l <= mid:
  30. temp[k] = arr[l]
  31. k += 1
  32. l += 1
  33. while m <= r:
  34. temp[k] = arr[m]
  35. k += 1
  36. m += 1
  37. for i1 in range(i, j + 1):
  38. arr[i1] = temp[i1]
  39. print ("After Merge: " + str(arr[i:j + 1]))
  40.  
  41. if __name__ == '__main__':
  42. arr = [9, 4, 8, 3, 1, 2, 5]
  43. print ("Initial Array: " + str(arr))
  44. mergesort(arr, 0, len(arr) - 1)

Terminal Output:

Initial Array: [9, 4, 8, 3, 1, 2, 5]
Left: [9]
Right: [4]
After Merge: [4, 9]
Left: [8]
Right: [3]
After Merge: [3, 8]
Left: [4, 9]
Right: [3, 8]
After Merge: [3, 4, 8, 9]
Left: [1]
Right: [2]
After Merge: [1, 2]
Left: [1, 2]
Right: [5]
After Merge: [1, 2, 5]
Left: [3, 4, 8, 9]
Right: [1, 2, 5]
After Merge: [1, 2, 3, 4, 5, 8, 9]

Quicksort

Quicksort procedure

The main steps in Quick sort are:

  1. Pivot selection: Pick an element, called a “pivot” from the array
  2. Partioning: reorder the array elements with values < the pivot come beofre it, which all elements the values ≥ than the pivot come after it. After this partioining, the pivot is in its final position.
  3. Recursion: apply steps 1 and 2 above recursively to each of the two subarrays

The base case for the recursion is a subarray of length 1 or 0; by definition these cases do not need to be sorted.

Quicksort example

  • On overage quicksort runs in <m>n</m> log <m>n</m> but if it consistently chooses bad pivots, its performance degrades to <m>n^2</m>.
  • This happens if the pivot is consistently chosen so that all(or too many of) the elements in the array are < the pivot or > than the pivot. (A classic case is when the first or last element is chosen as a pivot and the data is already sorted, or nearly sorted).
  • Some options for choosing the pivot:
  • Always pick the first elements as the pivot.
  • Always pick the last elements as the pivot.
  • Pick a random element as the pivot.
  • Pick the median element as the pivot.

Quick Sort Code

quick_sort.py
  1. # Quick Sort
  2. def printArray(arr):
  3. return (' '.join(str(i) for i in arr))
  4.  
  5. def quicksort(arr, i, j):
  6. if i < j:
  7. pos = partition(arr, i, j)
  8. quicksort(arr, i, pos - 1)
  9. quicksort(arr, pos + 1, j)
  10.  
  11. def partition(arr, i, j):
  12. #pivot = arr[j] # pivot on the last item
  13. pivot = arr[int(j/2)] # pivot on the median
  14. small = i - 1
  15. for k in range(i, j):
  16. if arr[k] <= pivot:
  17. small += 1
  18. swap(arr, k, small)
  19.  
  20. swap(arr, j, small + 1)
  21. print ("Pivot = " + str(arr[small + 1]), " Arr = " + printArray(arr))
  22. return small + 1
  23.  
  24. def swap(arr, i, j):
  25. arr[i], arr[j] = arr[j], arr[i]
  26.  
  27. if __name__ == '__main__':
  28. arr = [9, 4, 8, 3, 1, 2, 5]
  29. print (" Initial Array :",printArray(arr))
  30. quicksort(arr, 0, len(arr) - 1)
 Initial Array : 9 4 8 3 1 2 5
Pivot = 5  Arr = 4 3 1 2 5 9 8
Pivot = 2  Arr = 1 2 4 3 5 9 8
Pivot = 3  Arr = 1 2 3 4 5 9 8
Pivot = 8  Arr = 1 2 3 4 5 8 9

Non-comparison Sorts

Counting Sort

Counting Sort procedure

Counting Sort example

Counting Sort Code

counting_sort.py
  1. # Counting sort
  2. def printArray(arr):
  3. return(' '.join(str(i) for i in arr))
  4.  
  5.  
  6. def countingsort(arr):
  7. count = [0] * 11 # can store the count of positive numbers <= 10
  8. N = len(arr)
  9. for i in range(0, N):
  10. count[arr[i]] += 1
  11. for i in range(1, len(count)):
  12. count[i] += count[i - 1]
  13. print ("Counting Array :",
  14. printArray(count))
  15. output = [0] * N
  16. for i in range(len(arr)):
  17. output[count[arr[i]] - 1] = arr[i]
  18. count[arr[i]] -= 1
  19. print ("After Sorting :",
  20. printArray(output))
  21.  
  22. if __name__ == '__main__':
  23. arr = [10, 7, 3, 1, 9, 7, 4, 3]
  24. print ("Initial Array :",
  25. printArray(arr))
  26. countingsort(arr)
Initial Array   : 10 7 3 1 9 7 4 3
Counting Array  : 0 1 1 3 4 4 4 6 6 7 8
After Sorting   : 1 3 3 4 7 7 9 10

Bucket Sort

Bucket Sort procedure

Bucket Sort example

2)

Hybrid Sorting Algorithms

Introsort

Timsort

Criteria for choosing a sorting algorithm

Criteria Sorting algorithm
Small number of items to be sorted Insertion Sort
Items are mostly sorted already Insertion Sort
Concerned about worst-case scenarios Heap Sort
Interested in a good average-case behaviour Quicksort
Items are drawn from a uniform dense universe Bucket Sort
Desire to write as little code as possible Insertion Sort
Stable sorting required Merge Sort

Conclusion

1)
Pollice G., Selkow S. and Heineman G. (2016). Algorithms in a Nutshell, 2nd Edition. O' Reilly.