DATA ANALYTICS REFERENCE DOCUMENT 


Document Title:  46887  Computational Thinking with Algorithms 
Document No.:  1548328636 
Author(s):  Rita Raher, Gerhard van der Linde 
Contributor(s): 
REVISION HISTORY
Revision  Details of Modification(s)  Reason for modification  Date  By 

0  Draft release  46887 Module summary reference page  2019/01/24 11:17  Gerhard van der Linde 
On completion of this module the learner will/should be able to
Operator  Description  Examples 

+  Additive operator (also string concatenation)  2 + 1 = 3 “abc” + “_” + 123 = “abc_123” 
  Subtraction operator  25 – 12 = 13 
*  Multiplication operator  2 * 25 = 50 
/  Division operator  35 / 5 = 7 
%  Remainder operator  35 % 5 = 0, 36 % 5 = 1 
indicate that a quantity is to be multiplies by itself some number of times
A variable is simply a storage location and associated name which can use to store some information for later use
Type of variables
Operator  Description  Examples 

=  Assignment operator  int number = 23; string myWord = “apple”; 
++  Increment operator; increments a value by 1  int number = 23; number++; System.out.println(number); prints 24 
–  Decrement operator; decrements a value by 1  int number = 23; number–; System.out.println(number); prints 22 
+=  Assignment (shorthand for number = number + value)  int number = 23; number += 2; System.out.println(number); prints 25 
=  Assignment (shorthand for number = number  value)  int number = 23; number = 2; System.out.println(number); prints 21 
==  equality, equal  2==1 false 
!=  equality, not equal  2!=1 true 
&&  Logical AND  2==1 && 1==1 false 
  Logical OR  2==1  1==1 true 
!  inverts the value of the Boolean  !success 
>  Relational, greater than  
>=  Relational, greater than or equal to  
<  Relational, less than  
⇐  Relational, less than or equal to 
array
list
cars = [“Ford”, “Volvo”, “BMW”]
It is important to differentiate between:
Note that complexity affects performance but not the other way around
value of  constant  (linear)  (linearithmic)  (quadratic)  (cubic)  (exponential)  

8  1  3  8  24  64  512  256 
16  1  4  16  64  256  4096  65536 
32  1  5  32  160  1024  32768  4294967296 
64  1  6  64  384  4096  262144  1.84467E+19 
128  1  7  128  896  16384  2097152  3.40282E+38 
256  1  8  256  2048  65536  16777216  1.15792E+77 
512  1  9  512  4608  262144  134217728  1.3408E+154 
Suppose 𝑓(𝑥) and 𝑔(𝑥) are two functions defined on some subset of the set of real numbers
if and only if there exist constants 𝑁 and 𝐶 such that
Intuitively, this means that 𝑓 does not grow more quickly than 𝑔
(Note that N is size of the input set which is large enough for the higher order term to begin to dominate)
// passing an array of integers into a method in Java void myMethod(int[] elements) { // do something with the data here }
def my_function(elements): # do something with the data here test 1 = [5, 1, 12, 5, 16] my_function(test1)
Array with 5 elements:
boolean isFirstElementTwo(int[] elements) { if(elements[0] == 2) { return true; } else { return false; } }
# O(1) Example def is_first_el_two(elements): if elemenents[0] == 2: return True return False test1 = [2, 1, 0, 3] test2 = [0, 2, 3, 4] print(is_first_el_two(test1)) # prints True print(is_first_el_two(test1)) # prints False
boolean containsOne(int[] elements) { for (int i=0; i<elements.size(); i++){ if(elements[i] == 1) { return true; } } return false; }
# O(n) Example def contains_one(elements) for i in range(0, len(elements)): if elements[i] == 1: return True return False test1 = [0, 2, 1, 2] test2 = [0, 2, 3, 4] print(contains_one(test1)) # prints True print(contains_one(test1)) # prints False
boolean containsDuplicates(int[] elements) { for (int i=0; i<elements.length; i++){ for (int j=0; j<elements.length; j++){ if(i == j){ // avoid self comparison continue; } if(elements[i] == elements[j]) { return true; // duplicate found } } } return false; }
# O(n^2) Example def contains_duplicates(elements): for i in range (0, len(elements)): for j in range(0, len(elements)): if i == j: ##avoid self comparison continue if elements[i] == elements[j]: return True # duplicate found return False test1 = [0, 2, 1, 2] test2 = [1, 2, 3, 4] print(contains_duplicates(test1)) # prints True print(contains_duplicates(test2)) # prints False
The video and the code is a very condensed summary of the following section and not part of the syllabus, you can skip to Iteration and recursion
Socratica  Recursion, the Fibonacci Sequence and Memoization
from functools import lru_cache #@lru_cache(maxsize=1000) def fibonacci(n): if n == 1: return 1 elif n == 2: return 1 elif n > 2: return fibonacci(n1) + fibonacci(n2) for n in range(1, 1001): print(n, ":", fibonacci(n))
Run the following code with the highlighted line commented and not commented and observe the difference. The video link above the code explains the details.
void main(){ count(0); } void count(int index){ print(index); if(index <2){ count(index+1); } }
def count(index): print(index) if index < 2: count(index + 1) count(0) # outputs 0 1 2
void infinite(int x){ infinite(x1); }
def infinite(x): infinite(x1) infinite(1) # RecusrionError: # maximimum recursion depth exceeded
void circular(int x){ if(x==1){ circular(x+1); } circular(x1); }
def circular(x): if x ==1: circilar(x + 1) circular(x  1) circilar(1) # RecursioError: # maximum recusrion depth exceeded # in comparison
def factorial(n): answer = 1 while n > 1: answer *= n n =1 return answer print(factorial(5)) #prints 120
def factorial_rec(n): if n <1: return 1 else: return n*factorial_rec(n1) print(factorial_rec(5)) #print 120
def euclid(a, b): while b != 0: temp = b b = a% b a = temp return a print(euclid(35, 49)) # print 7
def euclid (a, b): if b == 0: return a else: return euclid(b, a%b) print(euclid(35, 49)) # prints 7
def fib(n): i, n1, n2 =1, 0, 1 while i <n: temp =n1 n1 = n2 n2 =n1 +temp i = 1+ i return n1 print(fib(5)) #prints 3
def fib(n): if n==1: return 0 elif n ==2: return 1 return fib(n1) +fib(n2) print(fib(5)) #prints 3
07_algorithms_cryptography_ssl_2019_part1.pdf
This is somewhat surprising…..
problems for which no good algorithms are known are crucial here.
A general encryption procedure(left ) and decryption procedure (right) is as follows:
and
Substitution Cipher
EG: Hello  > khoor
So from example above, this kind of cypher is easily broken…
Polyalphabetic substitution:
Polyalphabetic substitution continued
…and each letter of the phrase could take on 1 of 26 possible numeric values.
Ours are (t =20, h= 8 etc…)
As we had 10 characters in our phrase, and each character could be any 1 of 26 possible values, our potential key space is i.e there are 141,167,095,653,376 possible keys or “phrases”.
Prime number: A number is prime if it is greater than 1 and if its only factors are itself and 1.
(Foundation is a number theory in mathematics, based on Euler's generalisation of Fermat's Theorem).
Format
A block of plaintext(message), M, gets transformed into a cipher block C
Requirements:
e.g. PIN = 2345
“When sending us private data such as your PIN, encrypt it using RSA. This is the relevant public key information: e=7, n=33”
DATA ANALYTICS REFERENCE DOCUMENT 


Document Title:  Document Title 
Document No.:  1552735583 
Author(s):  Gerhard van der Linde, Rita Raher 
Contributor(s): 
REVISION HISTORY
Revision  Details of Modification(s)  Reason for modification  Date  By 

0  Draft release  Document description here  2019/03/16 11:26  Gerhard van der Linde, Rita Raher 
Reference: ^{1)}
Algorithm  Best case  Worst case  Average case  Space Complexity  Stable? 

Bubble Sort  1  Yes  
Selection Sort  1  No  
Insertion Sort  1  Yes  
Merge Sort  Yes  
Quicksort  (worst case)  No*  
Heapsort  1  No  
Counting Sort  Yes  
Bucket Sort  Yes  
Timsort  Yes  
Introsort  No 
*the standard Quicksort algorithm is unstable, although stable variations do exist
Criteria  Sorting algorithm 

Small number of items to be sorted  Insertion Sort 
Items are mostly sorted already  Insertion Sort 
Concerned about worstcase scenarios  Heap Sort 
Interested in a good averagecase behaviour  Quicksort 
Items are drawn from a uniform dense universe  Bucket Sort 
Desire to write as little code as possible  Insertion Sort 
Stable sorting required  Merge Sort 
Algorithm  Best case  Worst case  Average case  Space Complexity  Stable? 

Bubble Sort  1  Yes  
Selection Sort  1  No  
Insertion Sort  1  Yes  
Merge Sort  Yes  
Quicksort  (worst case)  No*  
Heapsort  1  No  
Counting Sort  Yes  
Bucket Sort  Yes  
Timsort  Yes  
Introsort  No 
*the standard Quicksort algorithm is unstable, although stable variations do exist
public static void bubblesort(int[]a){ int outer, inner; for(outer = a.length  1; outer > 0;outer){ //counting down for(inner=0;inner < outer;inner++){ //bubbling up if(a[inner] > a[inner+1]; { //if out of order.... int temp = a[inner]; //...then swap a[inner] =a[inner+1]; a[inner +1] = temp; } } } }
# Bubble Sort in python def printArray(arr): print (' '.join(str(i) for i in arr)) def bubblesort(arr): for i in range(len(arr)): for j in range(len(arr)  i  1): if arr[j] > arr[j + 1]: temp = arr[j] arr[j] = arr[j + 1] arr[j + 1] = temp # Print array after every pass print ("After pass " + str(i) + " :", printArray(arr)) if __name__ == '__main__': arr = [10, 7, 3, 1, 9, 7, 4, 3] print ("Initial Array :", printArray(arr)) bubblesort(arr)
for(outer =a.length1; outer >0; outer){ for(inner=0; inner < outer; inner++){ if(a[inner]>a[inner+1]){ //swap code omitted } } }
The element at index 4 is the smallest, so swap with index 0
The element at index 2 is the smallest, so swap with index 1
The element at index 3 is the smallest, so swap with index 2
The element at index 3 is the smallest, so swap with index 3
Selection sort might swap an array element with itself; this is harmless, and not worth checking for
public static void selectionsort(int[]a){ int outer=0, inner=0, min=0; for(outer = 0; outer <a.length1;outer++){ //outer counts up min = outer; for(inner = outer +1; inner <a.length; inner++){ if(a[inner]<a[min]){ //find index of smallest value min = inner; } } //swap a [min] with a [outer] int temp = a[outer]; a[outer] = a[min]; a[min] = temp; } }
# selection sort in python def printArray(arr): return (' '.join(str(i) for i in arr)) def selectionsort(arr): N = len(arr) for i in range(0, N): small = arr[i] pos = i for j in range(i + 1, N): if arr[j] < small: small = arr[j] pos = j temp = arr[pos] arr[pos] = arr[i] arr[i] = temp print ("After pass " + str(i) + " :", printArray(arr)) if __name__ == '__main__': arr = [10, 7, 3, 1, 9, 7, 4, 3] print ("Initial Array :", printArray(arr)) selectionsort(arr)
a[1]=5 is the key; 7>5 so move 7 right by one position, and insert 5 at index 0
a[2]=2 is the key; 7>2 so move both 7 and 5 right by one position, and insert 2 at index 0
a[3]=3 is the key; 7>3 and 5>3 so move both 7 and 5 right by one position, and insert 3 at index 1
a[4]=1 is the key; 7>1, 5>1, 3>1 and 2>1 so move both 7, 5, 3 and 2 right by one position, and insert 1 at index 1
(done)
public static void insertionsort(int a[]){ for(int i=1; i<a.length; i++){ int key =a[i]; //value to be inseted int j = i1; //move all elements > key right one position while(j>=0 && a[j]>key){ a[j+1]= a[j]; j=j1; } a[j+1]=key; //insert key in its new position } }
# insertion sort def printArray(arr): return(' '.join(str(i) for i in arr)) def insertionsort(arr): N = len(arr) for i in range(1, N): j = i  1 temp = arr[i] while j >= 0 and temp < arr[j]: arr[j + 1] = arr[j] j = 1 arr[j + 1] = temp print ("After pass " + str(i) + " :", printArray(arr)) if __name__ == '__main__': arr = [10, 7, 3, 1, 9, 7, 4, 3] print ("Initial Array :", printArray(arr)) insertionsort(arr)
Criteria  Sorting algorithm 

Small number of items to be sorted  Insertion Sort 
Items are mostly sorted already  Insertion Sort 
Concerned about worstcase scenarios  Heap Sort 
Interested in a good averagecase behaviour  Quicksort 
Items are drawn from a uniform dense universe  Bucket Sort 
Desire to write as little code as possible  Insertion Sort 
Stable sorting required  Merge Sort 
Algorithm  Best case  Worst case  Average case  Space Complexity  Stable? 

Bubble Sort  1  Yes  
Selection Sort  1  No  
Insertion Sort  1  Yes  
Merge Sort  Yes  
Quicksort  (worst case)  No*  
Heapsort  1  No  
Counting Sort  Yes  
Bucket Sort  Yes  
Timsort  Yes  
Introsort  No 
*the standard Quicksort algorithm is unstable, although stable variations do exist
# Merge sort def mergesort(arr, i, j): mid = 0 if i < j: mid = int((i + j) / 2) mergesort(arr, i, mid) mergesort(arr, mid + 1, j) merge(arr, i, mid, j) def merge(arr, i, mid, j): print ("Left: " + str(arr[i:mid + 1])) print ("Right: " + str(arr[mid + 1:j + 1])) N = len(arr) temp = [0] * N l = i r = j m = mid + 1 k = l while l <= mid and m <= r: if arr[l] <= arr[m]: temp[k] = arr[l] l += 1 else: temp[k] = arr[m] m += 1 k += 1 while l <= mid: temp[k] = arr[l] k += 1 l += 1 while m <= r: temp[k] = arr[m] k += 1 m += 1 for i1 in range(i, j + 1): arr[i1] = temp[i1] print ("After Merge: " + str(arr[i:j + 1])) if __name__ == '__main__': arr = [9, 4, 8, 3, 1, 2, 5] print ("Initial Array: " + str(arr)) mergesort(arr, 0, len(arr)  1)
Terminal Output:
Initial Array: [9, 4, 8, 3, 1, 2, 5] Left: [9] Right: [4] After Merge: [4, 9] Left: [8] Right: [3] After Merge: [3, 8] Left: [4, 9] Right: [3, 8] After Merge: [3, 4, 8, 9] Left: [1] Right: [2] After Merge: [1, 2] Left: [1, 2] Right: [5] After Merge: [1, 2, 5] Left: [3, 4, 8, 9] Right: [1, 2, 5] After Merge: [1, 2, 3, 4, 5, 8, 9]
The main steps in Quick sort are:
The base case for the recursion is a subarray of length 1 or 0; by definition these cases do not need to be sorted.
# Quick Sort def printArray(arr): return (' '.join(str(i) for i in arr)) def quicksort(arr, i, j): if i < j: pos = partition(arr, i, j) quicksort(arr, i, pos  1) quicksort(arr, pos + 1, j) def partition(arr, i, j): #pivot = arr[j] # pivot on the last item pivot = arr[int(j/2)] # pivot on the median small = i  1 for k in range(i, j): if arr[k] <= pivot: small += 1 swap(arr, k, small) swap(arr, j, small + 1) print ("Pivot = " + str(arr[small + 1]), " Arr = " + printArray(arr)) return small + 1 def swap(arr, i, j): arr[i], arr[j] = arr[j], arr[i] if __name__ == '__main__': arr = [9, 4, 8, 3, 1, 2, 5] print (" Initial Array :",printArray(arr)) quicksort(arr, 0, len(arr)  1)
Initial Array : 9 4 8 3 1 2 5 Pivot = 5 Arr = 4 3 1 2 5 9 8 Pivot = 2 Arr = 1 2 4 3 5 9 8 Pivot = 3 Arr = 1 2 3 4 5 9 8 Pivot = 8 Arr = 1 2 3 4 5 8 9
# Counting sort def printArray(arr): return(' '.join(str(i) for i in arr)) def countingsort(arr): count = [0] * 11 # can store the count of positive numbers <= 10 N = len(arr) for i in range(0, N): count[arr[i]] += 1 for i in range(1, len(count)): count[i] += count[i  1] print ("Counting Array :", printArray(count)) output = [0] * N for i in range(len(arr)): output[count[arr[i]]  1] = arr[i] count[arr[i]] = 1 print ("After Sorting :", printArray(output)) if __name__ == '__main__': arr = [10, 7, 3, 1, 9, 7, 4, 3] print ("Initial Array :", printArray(arr)) countingsort(arr)
Initial Array : 10 7 3 1 9 7 4 3 Counting Array : 0 1 1 3 4 4 4 6 6 7 8 After Sorting : 1 3 3 4 7 7 9 10
Criteria  Sorting algorithm 

Small number of items to be sorted  Insertion Sort 
Items are mostly sorted already  Insertion Sort 
Concerned about worstcase scenarios  Heap Sort 
Interested in a good averagecase behaviour  Quicksort 
Items are drawn from a uniform dense universe  Bucket Sort 
Desire to write as little code as possible  Insertion Sort 
Stable sorting required  Merge Sort 
Reference: Pollice G., Selkow S. and Heineman G. (2016). Algorithms in a Nutshell, 2nd Edition. O' Reilly.
As we saw with Sorting Algorithms, there is no universal best algorithm for every possible use case. The most appropriate search algorithm often depends on the data structure being searched, but also on any a priori knowledge about the data.
Note: Only linear and binary covered in this module.
Search the array a for the target value 7:
Search the array a for the target value 3:
# Searching an element in a list/array in python # can be simply done using \'in\' operator # Example: # if x in arr: # print arr.index(x) # If you want to implement Linear Search in python # Linearly search x in arr[] # If x is present then return its location # else return 1 def search(arr, x): for i in range(len(arr)): if arr[i] == x: return i return 1
Search the array a for the target value 7:
Search the array a for the target value 3:
# Iterative Binary Search Function # It returns location of x in given array arr if present, # else returns 1 def binarySearch(arr, l, r, x): while l <= r: mid = l + (r  l)/2; # Check if x is present at mid if arr[mid] == x: return mid # If x is greater, ignore left half elif arr[mid] < x: l = mid + 1 # If x is smaller, ignore right half else: r = mid  1 # If we reach here, then the element was not present return 1 # Test array arr = [ 2, 3, 4, 10, 40 ] x = 10 # Function call result = binarySearch(arr, 0, len(arr)1, x) if result != 1: print "Element is present at index %d" % result else: print "Element is not present in array"
# Python Program for recursive binary search. # Returns index of x in arr if present, else 1 def binarySearch (arr, l, r, x): # Check base case if r >= l: mid = l + (r  l)/2 # If element is present at the middle itself if arr[mid] == x: return mid # If element is smaller than mid, then it can only # be present in left subarray elif arr[mid] > x: return binarySearch(arr, l, mid1, x) # Else the element can only be present in right subarray else: return binarySearch(arr, mid+1, r, x) else: # Element is not present in the array return 1 # Test array arr = [ 2, 3, 4, 10, 40 ] x = 10 # Function call result = binarySearch(arr, 0, len(arr)1, x) if result != 1: print "Element is present at index %d" % result else: print "Element is not present in array"
import time #log the start time in seconds start_time = time.time() #call the function you want to be benchmarked #log the start time in seconds end_time = time.time() #calculate the elapsed time time_elapsed = end_time  start_time
import time num_runs = 10 # number of times to test the function results = [] # array to store the results for each test # benchmark the function for r in range(num_runs): # log the start time in seconds start_time = time.time() # call the function that you want to benchmark # log the end time in seconds end_time = time.time() # calculate the elapsed time time_elapsed = end_time  start_time results.append(time_elasped)
from numpy.random import randint from time import time def Bubble_Sort(array): # your sort code sorted=array return sorted def Merge_Sort(array): # your sort code sorted=array return sorted def Counting_Sort(array): # your sort code sorted=array return sorted def Quick_Sort(array): # your sort code sorted=array return sorted def Timsort(array): # your sort code sorted=array return sorted # The actual names of your sort functions in your code somewhere, notice no quotes sort_functions_list = [Bubble_Sort, Merge_Sort, Counting_Sort, Quick_Sort, Timsort] # The prescribed array sizes to use in the benchmarking testst for ArraySize in (100, 250, 500, 750, 1000, 1250, 2500, 3750, 5000, 6250, 7500, 8750, 10000): # create a randowm array of values to use in the tests below testArray = randint(1,ArraySize*2,ArraySize) # iterate throug the list of sort functions to call in the test for sort_type in sort_functions_list: # and test every function ten times for i in range(10): start=time() sort_type(testArray) stop=time() print('{} {} {} {}'.format(sort_type, ArraySize, i, (stopstart)*1000))
Python 3 time documentation: https://docs.python.org/3/library/time.html
Python Date and Time Overview https://www.tutorialspoint.com/python/python_date_time.htm
Discussions of benchmarking issues in Python (advanced material) https://www.peterbe.com/plog/howtodoperformancemicrobenchmarksinpython