slightly.

# 2. Programming exercises

## 1. Conduct a random experiment to test the difference between the sequential search algorithm and the binary search algorithm when processing lists of integers.

The binary search algorithm can only search ordered lists, and the sequential search algorithm can search both unordered and ordered lists, so the comparison experiment is done twice.

```import timeit
import random
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from chapter5.search import orderSequentialSearch, binarySearch

lenx = []
osy = []
bsy = []
color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w']

for i in range(1000000, 100000000, 1000000):
t1 = timeit.Timer("orderSequentialSearch(x,random.randrange(%d))" % i, "from __main__ import random, x, orderSequentialSearch")
t2 = timeit.Timer("binarySearch(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch")
x = list(range(i))
# np.random.shuffle(x)
findtime1 = t1.timeit(number=1)
# x = list(range(i))
findtime2 = t2.timeit(number=1)
print("%d, %15.6f，%15.6f" % (i, findtime1, findtime2))
lenx.append(i)
osy.append(findtime1)
bsy.append(findtime2)
ax = plt.gca()
# remove border
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
# Shift the position to set the origin to intersect
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
# plt.ylim(0, 0.0001)
plt.scatter(lenx, osy, c=color, edgecolors='r', label='orderSequentialSearch')
plt.scatter(lenx, bsy, c=color, edgecolors='b', marker='^', label='binarySearch')
plt.xlabel('lenth(list)')
plt.ylabel('time(/s)')
plt.title('Search(Order)_analysis')
plt.legend()
plt.show()

```

The result is shown in the figure: Result analysis: For the sequential search algorithm, when the list to be searched is unordered, its performance is better than binary search. Result analysis: For the sequential search algorithm, when the list to be searched is ordered, its performance is far lower than that of the binary search algorithm.

## 2. Randomly generate an ordered list of integers. Analyze the binary search function given in the paper (recursive and recursive versions) by benchmarking. Please explain the results you get.

For benchmarking functions with python, refer to the link: https://blog.csdn.net/qq_21201267/article/details/126241080 link , the timeit package introduced in the second chapter of the book is used here for benchmarking.

```def binarySearch(alist, item):
first = 0
last = len(alist) - 1
found = False
midpoint = (first + last) // 2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint - 1
else:
first = midpoint + 1
return found

def binarySearch2(alist, item):
if len(alist) == 0:
return False
else:
midpoint = len(alist) // 2
if alist[midpoint] == item:
return True
else:
if item < alist[midpoint]:
return binarySearch2(alist[:midpoint], item)
else:
return binarySearch2(alist[midpoint+1:], item)
```
```import timeit
import random
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from chapter5.search import binarySearch2, binarySearch

lenx = []
b1y = []
b2y = []
color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w']

for i in range(1000000, 100000000, 1000000):
t1 = timeit.Timer("binarySearch(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch")
t2 = timeit.Timer("binarySearch2(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch2")
x = list(range(i))
findtime1 = t1.timeit(number=1)
findtime2 = t2.timeit(number=1)
print("%d, %15.6f，%15.6f" % (i, findtime1, findtime2))
lenx.append(i)
b1y.append(findtime1)
b2y.append(findtime2)
ax = plt.gca()
# remove border
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
# Shift the position to set the origin to intersect
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
plt.scatter(lenx, b1y, c=color, edgecolors='r', label='binarySearch1')
plt.scatter(lenx, b2y, c=color, edgecolors='b', marker='^', label='binarySearch2')
plt.xlabel('lenth(list)')
plt.ylabel('time(/s)')
plt.title('binarySearch_analysis')
plt.legend()
plt.show()

``` Result analysis: For binary search, as the length of the list grows, the performance of the circular version is higher than that of the recursive version.
This is because: For recursion, the "stack" is called when a function call is made, which stores all local variables and other content (tail recursion can solve this problem, but python3 does not support tail recursion), so the function in the loop Call operations are faster than recursive operations.

## 3. Implement the recursive version of the binary search algorithm without using the slice operator. Don't forget to pass in the subscripts of the head and tail elements. Randomly generate an ordered list of integers and benchmark.

```import timeit
import random
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from chapter5.search import binarySearch2

def binarySearch3(alist, left, right, item):
if len(alist) == 0:
return False
else:
# midpoint = len(alist) // 2
midpoint = (left + right) // 2
if alist[midpoint] == item:
return True
else:
if item < alist[midpoint]:
return binarySearch3(alist, left, midpoint - 1, item)
else:
return binarySearch3(alist, midpoint + 1, right, item)

lenx = []
b1y = []
b2y = []
color = ['c', 'b', 'g', 'r', 'm', 'y', 'k', 'w']

for i in range(1000000, 100000000, 1000000):
t1 = timeit.Timer("binarySearch3(x, 0, len(x) - 1,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch3")
t2 = timeit.Timer("binarySearch2(x,random.randrange(%d))" % i, "from __main__ import random, x, binarySearch2")
x = list(range(i))
findtime1 = t1.timeit(number=1)
findtime2 = t2.timeit(number=1)
print("%d, %15.6f，%15.6f" % (i, findtime1, findtime2))
lenx.append(i)
b1y.append(findtime1)
b2y.append(findtime2)
ax = plt.gca()
# remove border
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
# Shift the position to set the origin to intersect
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
plt.scatter(lenx, b1y, c=color, edgecolors='r', label='binarySearch3')
plt.scatter(lenx, b2y, c=color, edgecolors='b', marker='^', label='binarySearch2')
plt.xlabel('lenth(list)')
plt.ylabel('time(/s)')
plt.title('binarySearch_analysis')
plt.legend()
plt.show()
``` As shown in the figure, for the recursive version of the binary search algorithm, the performance of the slicing operation is much lower than that of the incoming head and tail subscript operations.

&

## 5. Implement the in method __contains__ for the hash table.

```    def __len__(self):
return len(self.slots)

def __contains__(self, item):
return item in self.slots
```

## 6. When the chaining method is used to deal with conflicts, how to delete elements from the hash table? If the open addressing method is used, how to do it? Are there any special cases that must be handled? Please implement del method for HashTable class.

See book p141, that is, find first and then delete, pay attention to conflicts.

```    def __delitem__(self, key):
startslot = self.hashfunction(key, len(self.slots))
stop = False
found = False
position = startslot
if self.slots[position] == None:
print('Error: None')
break
elif self.slots[position] == key:
found = True
self.slots[position] = None
self.data[position] = None
data = self.data[position]
else:
position = self.rehash(position, len(self.slots))
if position == startslot:
stop = True
```

## 7. In this chapter, the size of the hash table is 11. If the table is full, it needs to grow. Please reimplement the put method so that the hash table can automatically adjust the size when the load factor reaches a preset value (you can decide the preset value yourself according to the impact of the load on performance).

```    def overload(self):
slotsB = self.slots
dataB = self.data
self.slots = [None] * self.size
self.data = [None] * self.size
for i in range(len(self.slots)):
if slotsB[i] != None:
hashvalue = self.hashfunction(slotsB[i], len(self.slots))
if self.slots[hashvalue] == None:
self.slots[hashvalue] = slotsB[i]
self.data[hashvalue] = dataB[i]
else:
nextslots = self.rehash(hashvalue, len(self.slots))
while self.slots[nextslots] != None and self.slots[nextslots] != slotsB[i]:
nextslots = self.rehash(nextslots, len(self.slots))
if self.slots[nextslots] == None:
self.slots[nextslots] = slotsB[i]
self.data[nextslots] = dataB[i]
else:
self.data[nextslots] = dataB[i]

def put(self, key, data):
k = 0.9
sum = 0
for i in range(self.size):
if self.slots[i] != None:
sum += 1
if sum / self.size >= k:
self.size += 6
for j in range(6):
self.slots.append(None)
self.data.append(None)
hashvalue = self.hashfunction(key, len(self.slots))
if self.slots[hashvalue] == None:
self.slots[hashvalue] = key
self.data[hashvalue] = data
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data
else:
nextslots = self.rehash(hashvalue, len(self.slots))
while self.slots[nextslots] != None and self.slots[nextslots] != key:
nextslots = self.rehash(nextslots, len(self.slots))
if self.slots[nextslots] == None:
self.slots[nextslots] = key
self.data[nextslots] = data
else:
self.data[nextslots] = data
```

## 8. Implement the rehashing technique of square detection.

```    def rehash_square(self, oldhash, size, m):
return (oldhash + m**2) % size
```

Note that the rehash functions in overload(), put(), and get() are all changed to rehash_square.

## 9. Create a list of 500 integers using a random number generator. Analyze the sorting algorithms in this chapter by benchmarking them. What is the difference in their execution speed?

sort.py:

```def bubbleSort(alist):
for passnum in range(len(alist) - 1, 0, -1):
for i in range(passnum):
if alist[i] > alist[i+1]:
temp = alist[i]
alist[i] = alist[i+1]
alist[i+1] = temp

def shortBubbleSort(alist):
exchange = True
passnum = len(alist) - 1
while passnum > 0 and exchange:
exchange = False
for i in range(passnum):
if alist[i] > alist[i + 1]:
exchange = True
temp = alist[i]
alist[i] = alist[i + 1]
alist[i + 1] = temp
passnum -= 1

def selectionSort(alist):
for fileslot in range(len(alist)-1, 0, -1):
positionOfMax = 0
for location in range(1, fileslot + 1):
if alist[location] > alist[positionOfMax]:
positionOfMax = location
temp = alist[fileslot]
alist[fileslot] = alist[positionOfMax]
alist[positionOfMax] = temp

def insetionSort(alist):
for index in range(1, len(alist)):
currentvalue = alist[index]
position = index
while position > 0 and alist[position-1] > currentvalue:
alist[position] = alist[position-1]
position -= 1
alist[position] = currentvalue

def shellSort(alist):
sublistcount = len(alist) // 2
while sublistcount > 0:
for startposition in range(sublistcount):
gapInsertSort(alist, startposition, sublistcount)
# print('After increments of size ', sublistcount, 'The list is: ', alist)
sublistcount //= 2

def gapInsertSort(alist, start, gap):
for i in range(start+gap, len(alist), gap):
currentvalue = alist[i]
position = i
while position >= gap and alist[position-gap] > currentvalue:
alist[position] = alist[position-gap]
position -= gap
alist[position] = currentvalue

def mergeSort(alist):
# print('Splitting ', alist)
if len(alist) > 1:
mid = len(alist) // 2
lefthalf = alist[:mid]
righthalf = alist[mid:]
mergeSort(lefthalf)
mergeSort(righthalf)
i, j, k = 0, 0, 0
while i < len(lefthalf) and j < len(righthalf):
if lefthalf[i] < righthalf[j]:
alist[k] = lefthalf[i]
i += 1
else:
alist[k] = righthalf[j]
j += 1
k += 1
while i < len(lefthalf):
alist[k] = lefthalf[i]
i += 1
k += 1
while j < len(righthalf):
alist[k] = righthalf[j]
j += 1
k += 1
# print('Merging', alist)

def quickSort(alist):
quickSortHelper(alist, 0, len(alist) - 1)

def quickSortHelper(alist, first, last):
if first < last:
splitpoint = partition(alist, first, last)
quickSortHelper(alist, first, splitpoint - 1)
quickSortHelper(alist, splitpoint + 1, last)

def partition(alist, first, last):
pivotvalue = alist[first]
leftmark = first + 1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
leftmark += 1
while leftmark <= rightmark and alist[rightmark] >= pivotvalue:
rightmark -= 1
if rightmark < leftmark:
done = True
else:
temp = alist[leftmark]
alist[leftmark] = alist[rightmark]
alist[rightmark] = temp
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp

return rightmark

if __name__ == '__main__':
# alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
# shellSort(alist)
b = [54, 26, 93, 17, 77, 31, 44, 55, 20]
quickSort(b)
print(b)

```
```import timeit
import random
from chapter5.sort import bubbleSort, shortBubbleSort, selectionSort, insetionSort, shellSort, mergeSort, quickSort

x = []
for i in range(500):
x.append(random.randint(1, 1000))

t1 = timeit.Timer("bubbleSort(x)", "from __main__ import x, bubbleSort")
print('bubbleSort', t1.timeit(number=1000), 'milliseconds')

t2 = timeit.Timer("shortBubbleSort(x)", "from __main__ import x, shortBubbleSort")
print('shortBubbleSort', t2.timeit(number=1000), 'milliseconds')

t3 = timeit.Timer("selectionSort(x)", "from __main__ import x, selectionSort")
print('selectionSort', t3.timeit(number=1000), 'milliseconds')

t4 = timeit.Timer("insetionSort(x)", "from __main__ import x, insetionSort")
print('insetionSort', t4.timeit(number=1000), 'milliseconds')

t5 = timeit.Timer("shellSort(x)", "from __main__ import x, shellSort")
print('shellSort', t5.timeit(number=1000), 'milliseconds')

t6 = timeit.Timer("mergeSort(x)", "from __main__ import x, mergeSort")
print('mergeSort', t6.timeit(number=1000), 'milliseconds')

t7 = timeit.Timer("quickSort(x)", "from __main__ import x, quickSort")
print('quickSort', t7.timeit(number=1000), 'milliseconds')
``` The resu lt is shown in the figure. For a list with a length of 500, the sorting speed from slow to fast is: bubble < fast (recursive) < selection < merge < Hill sort < insert < short bubble

## 10. Use the simultaneous assignment feature to implement bubble sorting.

```def bubbleSort(alist):
for passnum in range(len(alist) - 1, 0, -1):
for i in range(passnum):
if alist[i] > alist[i+1]:
alist[i], alist[i+1] = alist[i+1], alist[i]

if __name__ == '__main__':
b = [54, 26, 93, 17, 77, 31, 44, 55, 20]
bubbleSort(b)
print(b)
```

## 11. The bubble sort algorithm can be modified to "bubble" in both directions. The first pass traverses "up" the list, and the second pass traverses "down" the list. Continue this pattern until no more traversals are required. Implement this sorting algorithm, and describe when it is suitable.

```def bubbleSort(alist, size):
left = 0
right = size - 1
flag = True
while flag:
flag = False
for i in range(left, right, 1):
if alist[i] > alist[i+1]:
alist[i], alist[i+1] = alist[i+1], alist[i]
flag = True
right -= 1
for j in range(right, left, -1):
if alist[j] < alist[j-1]:
alist[j], alist[j-1] = alist[j-1], alist[j]
flag = True
left += 1

if __name__ == '__main__':
b = [54, 26, 93, 17, 77, 31, 44, 55, 20]
bubbleSort(b, len(b))
print(b)
```

The two-way bubbling algorithm is more suitable than the one-way bubbling algorithm when the sequence is basically ordered, but there are small elements at the end. For example: add a smaller integer to the end of an increasing ordered sequence, sorting this new sequence is suitable for bidirectional bubbling (compared to ordinary bubbling).

## 12. Use the simultaneous assignment feature to implement selection sorting.

```def selectionSort(alist):
for fileslot in range(len(alist)-1, 0, -1):
positionOfMax = 0
for location in range(1, fileslot + 1):
if alist[location] > alist[positionOfMax]:
positionOfMax = location
alist[fileslot], alist[positionOfMax] = alist[positionOfMax], alist[fileslot]

if __name__ == '__main__':
b = [54, 26, 93, 17, 77, 31, 44, 55, 20]
selectionSort(b)
print(b)
```

## 13. Benchmark Hillsort using different increment sets for the same list.

```import timeit
import random
import math

def shellSort(alist):
sublistcount = len(alist) // 2
while sublistcount > 0:
for startposition in range(sublistcount):
gapInsertSort(alist, startposition, sublistcount)
# print('After increments of size ', sublistcount, 'The list is: ', alist)
sublistcount //= 2

def shellSort2(alist):
k = int(math.log(len(alist) + 1, 2))
sublistcount = 2 ** k - 1
while sublistcount > 0:
k -= 1
for startposition in range(sublistcount):
gapInsertSort(alist, startposition, sublistcount)
# print('After increments of size ', sublistcount, 'The list is: ', alist)
sublistcount = 2 ** k - 1

def gapInsertSort(alist, start, gap):
for i in range(start+gap, len(alist), gap):
currentvalue = alist[i]
position = i
while position >= gap and alist[position-gap] > currentvalue:
alist[position] = alist[position-gap]
position -= gap
alist[position] = currentvalue

if __name__ == '__main__':
x = []
for i in range(500):
x.append(random.randint(1, 1000))

t1 = timeit.Timer("shellSort(x)", "from __main__ import x, shellSort")
print('n/2^k:', t1.timeit(number=1000), 'milliseconds')

t2 = timeit.Timer("shellSort2(x)", "from __main__ import x, shellSort2")
print('2^k - 1:', t2.timeit(number=1000), 'milliseconds')

>>>  ?
n/2^k: 0.095182 milliseconds
2^k - 1: 0.09979709999999999 milliseconds
```

## 14. Implement the mergesort function without slicing operators.

```def mergeSort(alist, left, right):
if right > left :
mid = (left + right) // 2
mergeSort(alist, left, mid)
mergeSort(alist, mid + 1, right)
merge(alist, left, mid, right)
else:
return

def merge(alist, left, mid, right):
temp = []
i, j = left, mid + 1
while i <= mid and j <= right:
if alist[i] <= alist[j]:
temp.append(alist[i])
i += 1
else:
temp.append(alist[j])
j += 1
while i <= mid:
temp.append(alist[i])
i += 1
while j <= right:
temp.append(alist[j])
j += 1
alist[left:right+1] = temp

if __name__ == '__main__':
a = [2, 6, 3]
length = len(a)
mid = (0 + len(a) - 1) // 2
merge(a, 0, mid, len(a) - 1)
print(a)

b = [54, 26, 93, 17, 77, 31, 44, 55, 20]
mergeSort(b, 0, len(b) - 1)
print(b)

```

## 15. There is a way to improve quicksort, which is to use insertion sort when the length of the list is less than a certain value (this value is called "partition limit"). What's the point? Reimplement the quicksort algorithm and film it with a list of random integers. Performance analysis with different partition limits.

Quick sorting is not very good for small-scale data sets, but the quick sorting algorithm uses divide-and-conquer technology. Ultimately, large data sets must be divided into small data sets for processing. The improvement that can be obtained from this is that when the data set is small, it is not necessary to continue to recursively call the quick sort algorithm, but instead call other sorting algorithms that are more capable of processing small-scale data sets.

```import timeit
import random

def quickSort2(alist, k):
quickSortHelper2(alist, 0, len(alist) - 1, k)

def quickSortHelper2(alist, first, last, k):
if first < last:
if last - first <= k:
temp = insetionSort(alist[first:last+1])
alist[first:last+1] = temp
else:
splitpoint = partition(alist, first, last)
quickSortHelper2(alist, first, splitpoint - 1, k)
quickSortHelper2(alist, splitpoint + 1, last, k)

def insetionSort(alist):
for index in range(1, len(alist)):
currentvalue = alist[index]
position = index
while position > 0 and alist[position-1] > currentvalue:
alist[position] = alist[position-1]
position -= 1
alist[position] = currentvalue
return alist

def partition(alist, first, last):
pivotvalue = alist[first]
leftmark = first + 1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
leftmark += 1
while leftmark <= rightmark and alist[rightmark] >= pivotvalue:
rightmark -= 1
if rightmark < leftmark:
done = True
else:
temp = alist[leftmark]
alist[leftmark] = alist[rightmark]
alist[rightmark] = temp
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp

return rightmark

if __name__ == '__main__':
x = []
for i in range(500):
x.append(random.randint(1, 1000))

t1 = timeit.Timer("quickSort2(x, 5)", "from __main__ import x, quickSort2")
print('k = 5:', t1.timeit(number=1000), 'milliseconds')

t2 = timeit.Timer("quickSort2(x, 50)", "from __main__ import x, quickSort2")
print('k = 50:', t2.timeit(number=1000), 'milliseconds')

t3 = timeit.Timer("quickSort2(x, 100)", "from __main__ import x, quickSort2")
print('k = 100:', t3.timeit(number=1000), 'milliseconds')

t4 = timeit.Timer("quickSort2(x, 200)", "from __main__ import x, quickSort2")
print('k = 200:', t4.timeit(number=1000), 'milliseconds')
``` The result is shown in the figure: for a sequence of length 500, as the value of k increases, the sorting speed is faster.

## 16. Modify the quickSort function to use the method of taking the middle of three numbers when selecting the reference value. The performance difference of the two techniques is compared through experiments.

```import timeit
import random
from chapter5.sort import quickSort

def quickSort2(alist):
quickSortHelper2(alist, 0, len(alist) - 1)

def quickSortHelper2(alist, first, last):
if first < last:
splitpoint = partition2(alist, first, last)
quickSortHelper2(alist, first, splitpoint - 1)
quickSortHelper2(alist, splitpoint + 1, last)

def partition2(alist, first, last):
mid = (first + last) // 2
firstvalue = alist[first]
midvalue = alist[mid]
lastvalue = alist[last]
if midvalue > firstvalue and midvalue < lastvalue:
alist[first], alist[mid] = alist[mid], alist[first]
elif lastvalue > firstvalue and lastvalue < midvalue:
alist[first], alist[last] = alist[last], alist[first]
pivotvalue = alist[first]
leftmark = first + 1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
leftmark += 1
while leftmark <= rightmark and alist[rightmark] >= pivotvalue:
rightmark -= 1
if rightmark < leftmark:
done = True
else:
temp = alist[leftmark]
alist[leftmark] = alist[rightmark]
alist[rightmark] = temp
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp
return rightmark

if __name__ == '__main__':
x = []
for i in range(1000):
x.append(random.randint(1, 1000))

t1 = timeit.Timer("quickSort(x)", "from __main__ import x, quickSort")
print('quickSort:', t1.timeit(number=1000), 'milliseconds')

t2 = timeit.Timer("quickSort2(x)", "from __main__ import x, quickSort2")
print('quickSort2:', t2.timeit(number=1000), 'milliseconds')

``` As shown in the figure, the three-number method is much faster than the unimproved quick sort.

# 3. Summary

Mainly learned binary search, hash table, various mainstream sorting algorithms and their improvements.

Posted by trawets on Fri, 20 Jan 2023 16:01:06 +0300