1. Topics for discussion
slightly.
2. Programming exercises
1. Conduct a random experiment to test the difference between the sequential search algorithm and the binary search algorithm when processing lists of integers.
The binary search algorithm can only search ordered lists, and the sequential search algorithm can search both unordered and ordered lists, so the comparison experiment is done twice.
import timeit import random import numpy as np import matplotlib.pyplot as plt import pandas as pd from chapter5.search import orderSequentialSearch, binarySearch lenx = [] osy = [] bsy = [] color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w'] for i in range(1000000, 100000000, 1000000): t1 = timeit.Timer("orderSequentialSearch(x,random.randrange(%d))" % i, "from __main__ import random, x, orderSequentialSearch") t2 = timeit.Timer("binarySearch(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch") x = list(range(i)) # np.random.shuffle(x) findtime1 = t1.timeit(number=1) # x = list(range(i)) findtime2 = t2.timeit(number=1) print("%d, %15.6f,%15.6f" % (i, findtime1, findtime2)) lenx.append(i) osy.append(findtime1) bsy.append(findtime2) ax = plt.gca() # remove border ax.spines['top'].set_color('none') ax.spines['right'].set_color('none') # Shift the position to set the origin to intersect ax.xaxis.set_ticks_position('bottom') ax.spines['bottom'].set_position(('data', 0)) ax.yaxis.set_ticks_position('left') ax.spines['left'].set_position(('data', 0)) # plt.ylim(0, 0.0001) plt.scatter(lenx, osy, c=color[3], edgecolors='r', label='orderSequentialSearch') plt.scatter(lenx, bsy, c=color[1], edgecolors='b', marker='^', label='binarySearch') plt.xlabel('lenth(list)') plt.ylabel('time(/s)') plt.title('Search(Order)_analysis') plt.legend() plt.show()
The result is shown in the figure:
Result analysis: For the sequential search algorithm, when the list to be searched is unordered, its performance is better than binary search.
Result analysis: For the sequential search algorithm, when the list to be searched is ordered, its performance is far lower than that of the binary search algorithm.
2. Randomly generate an ordered list of integers. Analyze the binary search function given in the paper (recursive and recursive versions) by benchmarking. Please explain the results you get.
For benchmarking functions with python, refer to the link: https://blog.csdn.net/qq_21201267/article/details/126241080 link , the timeit package introduced in the second chapter of the book is used here for benchmarking.
def binarySearch(alist, item): first = 0 last = len(alist) - 1 found = False while first <= last and not found: midpoint = (first + last) // 2 if alist[midpoint] == item: found = True else: if item < alist[midpoint]: last = midpoint - 1 else: first = midpoint + 1 return found def binarySearch2(alist, item): if len(alist) == 0: return False else: midpoint = len(alist) // 2 if alist[midpoint] == item: return True else: if item < alist[midpoint]: return binarySearch2(alist[:midpoint], item) else: return binarySearch2(alist[midpoint+1:], item)
import timeit import random import numpy as np import matplotlib.pyplot as plt import pandas as pd from chapter5.search import binarySearch2, binarySearch lenx = [] b1y = [] b2y = [] color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w'] for i in range(1000000, 100000000, 1000000): t1 = timeit.Timer("binarySearch(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch") t2 = timeit.Timer("binarySearch2(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch2") x = list(range(i)) findtime1 = t1.timeit(number=1) findtime2 = t2.timeit(number=1) print("%d, %15.6f,%15.6f" % (i, findtime1, findtime2)) lenx.append(i) b1y.append(findtime1) b2y.append(findtime2) ax = plt.gca() # remove border ax.spines['top'].set_color('none') ax.spines['right'].set_color('none') # Shift the position to set the origin to intersect ax.xaxis.set_ticks_position('bottom') ax.spines['bottom'].set_position(('data', 0)) ax.yaxis.set_ticks_position('left') ax.spines['left'].set_position(('data', 0)) plt.scatter(lenx, b1y, c=color[3], edgecolors='r', label='binarySearch1') plt.scatter(lenx, b2y, c=color[1], edgecolors='b', marker='^', label='binarySearch2') plt.xlabel('lenth(list)') plt.ylabel('time(/s)') plt.title('binarySearch_analysis') plt.legend() plt.show()
Result analysis: For binary search, as the length of the list grows, the performance of the circular version is higher than that of the recursive version.
This is because: For recursion, the "stack" is called when a function call is made, which stores all local variables and other content (tail recursion can solve this problem, but python3 does not support tail recursion), so the function in the loop Call operations are faster than recursive operations.
3. Implement the recursive version of the binary search algorithm without using the slice operator. Don't forget to pass in the subscripts of the head and tail elements. Randomly generate an ordered list of integers and benchmark.
import timeit import random import numpy as np import matplotlib.pyplot as plt import pandas as pd from chapter5.search import binarySearch2 def binarySearch3(alist, left, right, item): if len(alist) == 0: return False else: # midpoint = len(alist) // 2 midpoint = (left + right) // 2 if alist[midpoint] == item: return True else: if item < alist[midpoint]: return binarySearch3(alist, left, midpoint - 1, item) else: return binarySearch3(alist, midpoint + 1, right, item) lenx = [] b1y = [] b2y = [] color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w'] for i in range(1000000, 100000000, 1000000): t1 = timeit.Timer("binarySearch3(x, 0, len(x) - 1,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch3") t2 = timeit.Timer("binarySearch2(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch2") x = list(range(i)) findtime1 = t1.timeit(number=1) findtime2 = t2.timeit(number=1) print("%d, %15.6f,%15.6f" % (i, findtime1, findtime2)) lenx.append(i) b1y.append(findtime1) b2y.append(findtime2) ax = plt.gca() # remove border ax.spines['top'].set_color('none') ax.spines['right'].set_color('none') # Shift the position to set the origin to intersect ax.xaxis.set_ticks_position('bottom') ax.spines['bottom'].set_position(('data', 0)) ax.yaxis.set_ticks_position('left') ax.spines['left'].set_position(('data', 0)) plt.scatter(lenx, b1y, c=color[3], edgecolors='r', label='binarySearch3') plt.scatter(lenx, b2y, c=color[1], edgecolors='b', marker='^', label='binarySearch2') plt.xlabel('lenth(list)') plt.ylabel('time(/s)') plt.title('binarySearch_analysis') plt.legend() plt.show()
As shown in the figure, for the recursive version of the binary search algorithm, the performance of the slicing operation is much lower than that of the incoming head and tail subscript operations.
4. Implement the len method __len__ for the hash table.
&5. Implement the in method __contains__ for the hash table.
def __len__(self): return len(self.slots) def __contains__(self, item): return item in self.slots
6. When the chaining method is used to deal with conflicts, how to delete elements from the hash table? If the open addressing method is used, how to do it? Are there any special cases that must be handled? Please implement del method for HashTable class.
See book p141, that is, find first and then delete, pay attention to conflicts.
def __delitem__(self, key): startslot = self.hashfunction(key, len(self.slots)) stop = False found = False position = startslot while not found and not stop: if self.slots[position] == None: print('Error: None') break elif self.slots[position] == key: found = True self.slots[position] = None self.data[position] = None data = self.data[position] else: position = self.rehash(position, len(self.slots)) if position == startslot: stop = True
7. In this chapter, the size of the hash table is 11. If the table is full, it needs to grow. Please reimplement the put method so that the hash table can automatically adjust the size when the load factor reaches a preset value (you can decide the preset value yourself according to the impact of the load on performance).
def overload(self): slotsB = self.slots dataB = self.data self.slots = [None] * self.size self.data = [None] * self.size for i in range(len(self.slots)): if slotsB[i] != None: hashvalue = self.hashfunction(slotsB[i], len(self.slots)) if self.slots[hashvalue] == None: self.slots[hashvalue] = slotsB[i] self.data[hashvalue] = dataB[i] else: nextslots = self.rehash(hashvalue, len(self.slots)) while self.slots[nextslots] != None and self.slots[nextslots] != slotsB[i]: nextslots = self.rehash(nextslots, len(self.slots)) if self.slots[nextslots] == None: self.slots[nextslots] = slotsB[i] self.data[nextslots] = dataB[i] else: self.data[nextslots] = dataB[i] def put(self, key, data): k = 0.9 sum = 0 for i in range(self.size): if self.slots[i] != None: sum += 1 if sum / self.size >= k: self.size += 6 for j in range(6): self.slots.append(None) self.data.append(None) self.overload() hashvalue = self.hashfunction(key, len(self.slots)) if self.slots[hashvalue] == None: self.slots[hashvalue] = key self.data[hashvalue] = data else: if self.slots[hashvalue] == key: self.data[hashvalue] = data else: nextslots = self.rehash(hashvalue, len(self.slots)) while self.slots[nextslots] != None and self.slots[nextslots] != key: nextslots = self.rehash(nextslots, len(self.slots)) if self.slots[nextslots] == None: self.slots[nextslots] = key self.data[nextslots] = data else: self.data[nextslots] = data
8. Implement the rehashing technique of square detection.
def rehash_square(self, oldhash, size, m): return (oldhash + m**2) % size
Note that the rehash functions in overload(), put(), and get() are all changed to rehash_square.
9. Create a list of 500 integers using a random number generator. Analyze the sorting algorithms in this chapter by benchmarking them. What is the difference in their execution speed?
sort.py:
def bubbleSort(alist): for passnum in range(len(alist) - 1, 0, -1): for i in range(passnum): if alist[i] > alist[i+1]: temp = alist[i] alist[i] = alist[i+1] alist[i+1] = temp def shortBubbleSort(alist): exchange = True passnum = len(alist) - 1 while passnum > 0 and exchange: exchange = False for i in range(passnum): if alist[i] > alist[i + 1]: exchange = True temp = alist[i] alist[i] = alist[i + 1] alist[i + 1] = temp passnum -= 1 def selectionSort(alist): for fileslot in range(len(alist)-1, 0, -1): positionOfMax = 0 for location in range(1, fileslot + 1): if alist[location] > alist[positionOfMax]: positionOfMax = location temp = alist[fileslot] alist[fileslot] = alist[positionOfMax] alist[positionOfMax] = temp def insetionSort(alist): for index in range(1, len(alist)): currentvalue = alist[index] position = index while position > 0 and alist[position-1] > currentvalue: alist[position] = alist[position-1] position -= 1 alist[position] = currentvalue def shellSort(alist): sublistcount = len(alist) // 2 while sublistcount > 0: for startposition in range(sublistcount): gapInsertSort(alist, startposition, sublistcount) # print('After increments of size ', sublistcount, 'The list is: ', alist) sublistcount //= 2 def gapInsertSort(alist, start, gap): for i in range(start+gap, len(alist), gap): currentvalue = alist[i] position = i while position >= gap and alist[position-gap] > currentvalue: alist[position] = alist[position-gap] position -= gap alist[position] = currentvalue def mergeSort(alist): # print('Splitting ', alist) if len(alist) > 1: mid = len(alist) // 2 lefthalf = alist[:mid] righthalf = alist[mid:] mergeSort(lefthalf) mergeSort(righthalf) i, j, k = 0, 0, 0 while i < len(lefthalf) and j < len(righthalf): if lefthalf[i] < righthalf[j]: alist[k] = lefthalf[i] i += 1 else: alist[k] = righthalf[j] j += 1 k += 1 while i < len(lefthalf): alist[k] = lefthalf[i] i += 1 k += 1 while j < len(righthalf): alist[k] = righthalf[j] j += 1 k += 1 # print('Merging', alist) def quickSort(alist): quickSortHelper(alist, 0, len(alist) - 1) def quickSortHelper(alist, first, last): if first < last: splitpoint = partition(alist, first, last) quickSortHelper(alist, first, splitpoint - 1) quickSortHelper(alist, splitpoint + 1, last) def partition(alist, first, last): pivotvalue = alist[first] leftmark = first + 1 rightmark = last done = False while not done: while leftmark <= rightmark and alist[leftmark] <= pivotvalue: leftmark += 1 while leftmark <= rightmark and alist[rightmark] >= pivotvalue: rightmark -= 1 if rightmark < leftmark: done = True else: temp = alist[leftmark] alist[leftmark] = alist[rightmark] alist[rightmark] = temp temp = alist[first] alist[first] = alist[rightmark] alist[rightmark] = temp return rightmark if __name__ == '__main__': # alist = [54, 26, 93, 17, 77, 31, 44, 55, 20] # shellSort(alist) b = [54, 26, 93, 17, 77, 31, 44, 55, 20] quickSort(b) print(b)
import timeit import random from chapter5.sort import bubbleSort, shortBubbleSort, selectionSort, insetionSort, shellSort, mergeSort, quickSort x = [] for i in range(500): x.append(random.randint(1, 1000)) t1 = timeit.Timer("bubbleSort(x)", "from __main__ import x, bubbleSort") print('bubbleSort', t1.timeit(number=1000), 'milliseconds') t2 = timeit.Timer("shortBubbleSort(x)", "from __main__ import x, shortBubbleSort") print('shortBubbleSort', t2.timeit(number=1000), 'milliseconds') t3 = timeit.Timer("selectionSort(x)", "from __main__ import x, selectionSort") print('selectionSort', t3.timeit(number=1000), 'milliseconds') t4 = timeit.Timer("insetionSort(x)", "from __main__ import x, insetionSort") print('insetionSort', t4.timeit(number=1000), 'milliseconds') t5 = timeit.Timer("shellSort(x)", "from __main__ import x, shellSort") print('shellSort', t5.timeit(number=1000), 'milliseconds') t6 = timeit.Timer("mergeSort(x)", "from __main__ import x, mergeSort") print('mergeSort', t6.timeit(number=1000), 'milliseconds') t7 = timeit.Timer("quickSort(x)", "from __main__ import x, quickSort") print('quickSort', t7.timeit(number=1000), 'milliseconds')
The resu lt is shown in the figure. For a list with a length of 500, the sorting speed from slow to fast is: bubble < fast (recursive) < selection < merge < Hill sort < insert < short bubble
10. Use the simultaneous assignment feature to implement bubble sorting.
def bubbleSort(alist): for passnum in range(len(alist) - 1, 0, -1): for i in range(passnum): if alist[i] > alist[i+1]: alist[i], alist[i+1] = alist[i+1], alist[i] if __name__ == '__main__': b = [54, 26, 93, 17, 77, 31, 44, 55, 20] bubbleSort(b) print(b)
11. The bubble sort algorithm can be modified to "bubble" in both directions. The first pass traverses "up" the list, and the second pass traverses "down" the list. Continue this pattern until no more traversals are required. Implement this sorting algorithm, and describe when it is suitable.
def bubbleSort(alist, size): left = 0 right = size - 1 flag = True while flag: flag = False for i in range(left, right, 1): if alist[i] > alist[i+1]: alist[i], alist[i+1] = alist[i+1], alist[i] flag = True right -= 1 for j in range(right, left, -1): if alist[j] < alist[j-1]: alist[j], alist[j-1] = alist[j-1], alist[j] flag = True left += 1 if __name__ == '__main__': b = [54, 26, 93, 17, 77, 31, 44, 55, 20] bubbleSort(b, len(b)) print(b)
The two-way bubbling algorithm is more suitable than the one-way bubbling algorithm when the sequence is basically ordered, but there are small elements at the end. For example: add a smaller integer to the end of an increasing ordered sequence, sorting this new sequence is suitable for bidirectional bubbling (compared to ordinary bubbling).
12. Use the simultaneous assignment feature to implement selection sorting.
def selectionSort(alist): for fileslot in range(len(alist)-1, 0, -1): positionOfMax = 0 for location in range(1, fileslot + 1): if alist[location] > alist[positionOfMax]: positionOfMax = location alist[fileslot], alist[positionOfMax] = alist[positionOfMax], alist[fileslot] if __name__ == '__main__': b = [54, 26, 93, 17, 77, 31, 44, 55, 20] selectionSort(b) print(b)
13. Benchmark Hillsort using different increment sets for the same list.
import timeit import random import math def shellSort(alist): sublistcount = len(alist) // 2 while sublistcount > 0: for startposition in range(sublistcount): gapInsertSort(alist, startposition, sublistcount) # print('After increments of size ', sublistcount, 'The list is: ', alist) sublistcount //= 2 def shellSort2(alist): k = int(math.log(len(alist) + 1, 2)) sublistcount = 2 ** k - 1 while sublistcount > 0: k -= 1 for startposition in range(sublistcount): gapInsertSort(alist, startposition, sublistcount) # print('After increments of size ', sublistcount, 'The list is: ', alist) sublistcount = 2 ** k - 1 def gapInsertSort(alist, start, gap): for i in range(start+gap, len(alist), gap): currentvalue = alist[i] position = i while position >= gap and alist[position-gap] > currentvalue: alist[position] = alist[position-gap] position -= gap alist[position] = currentvalue if __name__ == '__main__': x = [] for i in range(500): x.append(random.randint(1, 1000)) t1 = timeit.Timer("shellSort(x)", "from __main__ import x, shellSort") print('n/2^k:', t1.timeit(number=1000), 'milliseconds') t2 = timeit.Timer("shellSort2(x)", "from __main__ import x, shellSort2") print('2^k - 1:', t2.timeit(number=1000), 'milliseconds') >>> ? n/2^k: 0.095182 milliseconds 2^k - 1: 0.09979709999999999 milliseconds
14. Implement the mergesort function without slicing operators.
def mergeSort(alist, left, right): if right > left : mid = (left + right) // 2 mergeSort(alist, left, mid) mergeSort(alist, mid + 1, right) merge(alist, left, mid, right) else: return def merge(alist, left, mid, right): temp = [] i, j = left, mid + 1 while i <= mid and j <= right: if alist[i] <= alist[j]: temp.append(alist[i]) i += 1 else: temp.append(alist[j]) j += 1 while i <= mid: temp.append(alist[i]) i += 1 while j <= right: temp.append(alist[j]) j += 1 alist[left:right+1] = temp if __name__ == '__main__': a = [2, 6, 3] length = len(a) mid = (0 + len(a) - 1) // 2 merge(a, 0, mid, len(a) - 1) print(a) b = [54, 26, 93, 17, 77, 31, 44, 55, 20] mergeSort(b, 0, len(b) - 1) print(b)
15. There is a way to improve quicksort, which is to use insertion sort when the length of the list is less than a certain value (this value is called "partition limit"). What's the point? Reimplement the quicksort algorithm and film it with a list of random integers. Performance analysis with different partition limits.
Quick sorting is not very good for small-scale data sets, but the quick sorting algorithm uses divide-and-conquer technology. Ultimately, large data sets must be divided into small data sets for processing. The improvement that can be obtained from this is that when the data set is small, it is not necessary to continue to recursively call the quick sort algorithm, but instead call other sorting algorithms that are more capable of processing small-scale data sets.
import timeit import random def quickSort2(alist, k): quickSortHelper2(alist, 0, len(alist) - 1, k) def quickSortHelper2(alist, first, last, k): if first < last: if last - first <= k: temp = insetionSort(alist[first:last+1]) alist[first:last+1] = temp else: splitpoint = partition(alist, first, last) quickSortHelper2(alist, first, splitpoint - 1, k) quickSortHelper2(alist, splitpoint + 1, last, k) def insetionSort(alist): for index in range(1, len(alist)): currentvalue = alist[index] position = index while position > 0 and alist[position-1] > currentvalue: alist[position] = alist[position-1] position -= 1 alist[position] = currentvalue return alist def partition(alist, first, last): pivotvalue = alist[first] leftmark = first + 1 rightmark = last done = False while not done: while leftmark <= rightmark and alist[leftmark] <= pivotvalue: leftmark += 1 while leftmark <= rightmark and alist[rightmark] >= pivotvalue: rightmark -= 1 if rightmark < leftmark: done = True else: temp = alist[leftmark] alist[leftmark] = alist[rightmark] alist[rightmark] = temp temp = alist[first] alist[first] = alist[rightmark] alist[rightmark] = temp return rightmark if __name__ == '__main__': x = [] for i in range(500): x.append(random.randint(1, 1000)) t1 = timeit.Timer("quickSort2(x, 5)", "from __main__ import x, quickSort2") print('k = 5:', t1.timeit(number=1000), 'milliseconds') t2 = timeit.Timer("quickSort2(x, 50)", "from __main__ import x, quickSort2") print('k = 50:', t2.timeit(number=1000), 'milliseconds') t3 = timeit.Timer("quickSort2(x, 100)", "from __main__ import x, quickSort2") print('k = 100:', t3.timeit(number=1000), 'milliseconds') t4 = timeit.Timer("quickSort2(x, 200)", "from __main__ import x, quickSort2") print('k = 200:', t4.timeit(number=1000), 'milliseconds')
The result is shown in the figure: for a sequence of length 500, as the value of k increases, the sorting speed is faster.
16. Modify the quickSort function to use the method of taking the middle of three numbers when selecting the reference value. The performance difference of the two techniques is compared through experiments.
import timeit import random from chapter5.sort import quickSort def quickSort2(alist): quickSortHelper2(alist, 0, len(alist) - 1) def quickSortHelper2(alist, first, last): if first < last: splitpoint = partition2(alist, first, last) quickSortHelper2(alist, first, splitpoint - 1) quickSortHelper2(alist, splitpoint + 1, last) def partition2(alist, first, last): mid = (first + last) // 2 firstvalue = alist[first] midvalue = alist[mid] lastvalue = alist[last] if midvalue > firstvalue and midvalue < lastvalue: alist[first], alist[mid] = alist[mid], alist[first] elif lastvalue > firstvalue and lastvalue < midvalue: alist[first], alist[last] = alist[last], alist[first] pivotvalue = alist[first] leftmark = first + 1 rightmark = last done = False while not done: while leftmark <= rightmark and alist[leftmark] <= pivotvalue: leftmark += 1 while leftmark <= rightmark and alist[rightmark] >= pivotvalue: rightmark -= 1 if rightmark < leftmark: done = True else: temp = alist[leftmark] alist[leftmark] = alist[rightmark] alist[rightmark] = temp temp = alist[first] alist[first] = alist[rightmark] alist[rightmark] = temp return rightmark if __name__ == '__main__': x = [] for i in range(1000): x.append(random.randint(1, 1000)) t1 = timeit.Timer("quickSort(x)", "from __main__ import x, quickSort") print('quickSort:', t1.timeit(number=1000), 'milliseconds') t2 = timeit.Timer("quickSort2(x)", "from __main__ import x, quickSort2") print('quickSort2:', t2.timeit(number=1000), 'milliseconds')
As shown in the figure, the three-number method is much faster than the unimproved quick sort.
3. Summary
Mainly learned binary search, hash table, various mainstream sorting algorithms and their improvements.