# Data structure - sorting

Data structure note Directory:

# 6.1 basic concepts

Order keywords

## 6.1.2 classification

### Internal sort

Data elements are all in memory during sorting

### External sort

During the sorting process, data is constantly moving between internal and external storage

## 6.1.3 stability

After sorting, the elements with the same keyword keep the original order

## 6.1.4 two operations

### compare

Determine the priority of keywords

### move

Move elements to achieve sequence order

## 6.1.5 one trip sequencing

Process all unprocessed elements once

# 6.2 internal sorting

## 6.2.1 summary

### stability

• Half insert
• Direct insertion
• Bubble sort
• Merge sort
• Cardinality sort

Is stable

Unstable sorting algorithms include:

• Shell Sort
• Quick platoon
• Simple selection sort
• Heap sort

### Applicable to chain storage

• Direct insert sort
• Bubble sort
• Simple selection sort
• Merge sort
• Cardinality sort

Quick platoon

### Time complexity

#### The time complexity is independent of the initial state

A bunch of () turtles () choose () friends ()

• Simple choice O ( n 2 ) O(n^2) O(n2)
• Heap sort O ( n l o g n ) O(nlogn) O(nlogn)
• Merge sort O ( n l o g n ) O(nlogn) O(nlogn)
• Cardinality sort O ( r d ( n + r ) ) O(rd(n+r)) O(rd(n+r))

Select sort

#### Related to initial state

##### Insert sort
• Best: orderly

direct O ( n ) O(n) O(n)

Half O ( n l o g n ) O(nlogn) O(nlogn)

• Worst case: reverse order

##### Swap sort
• Bubble

Best: orderly O ( n ) O(n) O(n)

Worst case: reverse order O ( n 2 ) O(n^2) O(n2)

• Quick platoon

Best: pivot split O ( n l o g n ) O(nlogn) O(nlogn)
Worst case: basic order, reverse order O ( n 2 ) O(n^2) O(n2)

#### Is O(nlogn)

• Quick platoon
• Stacking
• Merge

### Spatial complexity

#### O(1)

• Direct insertion
• Half insert
• Shell Sort
• Bubble sort
• Simple choice
• Heap sort

#### O(logn)

Quick platoon is the best

#### O(n)

• Quick platoon worst
• Merge sort
• Worst cardinality ®

### Applicable scenario:

#### Take the smallest k

• k times bubble sort
• k simple selection sorting
• k-pass heap sorting

### Forming a partially ordered sequence

• Direct insertion
• Bubble sort
• Quick platoon
• Simple choice
• Heap sort

## 6.2.2 sorting of five categories

### Insert sort

#### thinking

Before the last trip, all elements may not be in the final position

#### Direct insert sort

##### realization
//Sort from small to large
void Direct_InsertSort(ElemType A[],int n){
int i,j;
for(i = 2;i <= n;++i){
if(A[i] < A[i-1]){
//If the element to be inserted a [i] < a [I-1], find the insertion position from the back to the front
A[0] = A[i];
//A is a fixed sequence, and the data storage has been determined, so the sentinel needs to record the elements to be inserted
for(j = i-1;A[0] > A[j];--j)
A[j+1] = A[j];
A[j+1] = A[0];
}
}
}


#### Half insert

Direct insertion: when finding the insertion position, the traversal is changed to half insertion

##### realization
void Binary_InsertSort(ElemType A[],int n){
int i,j,low,high,mid;

for(i = 2;i <= n;++i){//Sequentially insert A[2]~A[n] into the ordered subsequence
A[0] = A[i];

low = 1;high = i - 1;
while(low <= high){
mid = (low+high)/2;
if(A[mid] > A[0])
high = mid - 1;
else
low = mid + 1;
}
//mid is the insertion position
for(j = i - 1;j >= mid;--j)
A[j+1] = A[j];
A[high + 1] = A[0];
}
}


#### Shell Sort

Understanding: since direct insertion sorting is suitable for roughly ordered sequences, data can be preprocessed before direct insertion sorting

##### thinking
• Take a number every other step
• The step size is the interval between two fetches

### Select sort

Idea: select the most value as the endpoint of the ordered subsequence

#### Simple selection sort

##### realization
void selectSort(ElemType A[],int n){
int i,j,min;
for(i = 0;i < n-1;++i){
minIndex = i;
for(j = i + 1;j < n;++j)
if(A[j] < A[minIndex])
minIndex = j;
if(minIndex != i)
swap(A[minIndex],A[i]);
}

}


#### Heap sort

##### step
1. Build initial reactor

Adjust from the last non leaf node n 2 or n 2 + 1 ∼ 1 \frac{n}{2} or \ frac{n}{2}+1 \sim 1 2n or 2n + 1 ∼ 1

2. Take the root and exchange with the last element of the heap

3. Readjust the stack and repeat the above steps

##### operation

delete

• Top to bottom comparison
• Call heapdadjust once, adjust from the root, compare the two nodes first, and exchange according to the result
• Time complexity O ( h ) = O ( l o g n ) O(h) = O(logn) O(h)=O(logn)

insert

• Comparison from bottom to top

• Comparison range: This subtree

take K n + 1 K_{n+1} Kn+1 is inserted at the end as a leaf node, and the K n + 1 K_{n+1} Kn+1 compared with its new parents until K n + 1 K_{n+1} Kn+1 meets the characteristics of the reactor

##### realization
void BuildMaxHeap(ElemType A[],int n){
for(i = n/2;i >= 1;--i)
}

A[0] = A[k];

for(i = 2*k;i <= n;i*=2){//Adjust downward layer by layer, and the maximum value must be at the left and right child nodes
if(i < n && A[i] < A[i+1])//If the right child node is larger than the left child node element
i++;	//Take the subscript of the right child node
if(A[0] >= A[i])
break;	//Put maximum value at root
else{
A[k] = A[i];
l = i;//k is the maximum subscript
}
}

A[k] = A[0];
}

void HeapSort(ElemType A[],int n){
BuildMaxHeap(A,n);
for(i = n;i > 1;--i){
swap(A[i],A[1]);
}
}


### Swap sort

#### Bubble sort

##### thinking

Each pass places a maximum or minimum element in the final position

##### realization
void BubbleSort(int arr[],int n){
for(int i = 0;i < n;++i){
bool flag = false;//Whether the tag is swapped
for(int j = 0;j > n-i-1;++j){//One trip bubble sorting
if(arr[i] > arr[j+1]){//Put the big one at the back
swap(arr[j],arr[j+1]);
flag = true;
}
}

if(flag = false)//If there is no exchange in a certain trip, it indicates that it is in order
return ;
}
}


#### Quick sort

##### thinking

Improvement of bubble sorting: divide and conquer method

Constantly move the pivot position to find the balance point

Overall process

One pass sorting

##### realization
int Partition(ElemType A[],int low,int high){
ElemType pivot = A[low];
while(low < high){
while(low < high && A[high] >= pivot)
--high;
A[low] = A[high];
while(low < high && A[low] <= pivot)
++low;
A[low] = A[high];
}

A[low] = pivot;

return low;
}

void QuickSort(ElemType A[],int low,int high){
if(low < high){
int pivotpos = Partition(A,low,high);
QuickSort(A,low,pivotpos-1);
QuickSort(A,pivotpos+1,high);
}
}


### Merge sort

##### thinking

decompose

The sorted table with n elements is divided into: n 2 \frac{n}{2} 2n element, sorting the two sub tables respectively

merge

Merge two sorted sub tables to get the result

##### realization
void Merge(ElemType A[],int low,int mid,int high){
//In the table, A[low,...,mid],A[mid+1,...,high] are ordered respectively
ElemType *B = (ElemType *)malloc(sizeof(ElemType)*(high-low+1));

for(int i = 1;i <= high;++i)
B[i] = A[i];//Initialize auxiliary array
for(int i = low,j = mid+1,k = 1;i <= mid && j <= high;++k){
if(B[i] <= B[j])
A[k] = B[i++];
else
A[k] = B[j++];
}

while(i <= mid)
A[k++] = B[i++];//The first table has not been detected, and assignment
while(j <= high)
A[k++] = B[j++];//The second table has not been detected, and the assignment
}

void MergeSort(ElemTypeA[],int low,int high){
if(low < high){
int mid = (low + high)/2;
MergeSort(A,low,mid);
MergeSort(A,mid+1,high);
Merge(A,low,mid,high);
}
}


### Cardinality sort

Not based on comparison and lookup

#### thinking

Each group of keywords, sorted by the same logic

Arrange primary keywords first and then secondary keywords

If the primary keyword is arranged first, the situation that the secondary keywords are the same cannot be handled after the primary keyword is arranged. The actual situation is to arrange the secondary keyword first and then the primary keyword

For example: 010 102 110 212 cardinal order

##### First primary, then secondary

Obviously, we can't be the first to be the second

### Bucket sort (count sort)

Applicable to: sorting integers within a certain range

# include<stdio.h>

void countSort(int nums[],int );

int main(){
int nums[] = {2,3,1,4,3};

countSort(nums,5);

return 0;
}

void countSort(int nums[],int n){
// 1. Determine bucket array
//If the maximum value is 4, it needs 0-4, that is, 5 arrays occupy 6 array element positions
int max = 0;
for(int i = 0;i < n;++i){
if(max < nums[i])
max = nums[i];
}

int bucket[max+1] = {0};

// 3. Counting
for(int i = 0;i < n;++i)
bucket[nums[i]]++;

for(int i = 0;i < max;++i)
printf("%d ",i);
printf("Number of\n");
for(int i = 0;i < max;++i)
printf("%d ",bucket[i]);
}


# 6.3 external sequencing

## 6.3.1 storage space

K-way merge sort, one output buffer and K input buffer in memory, sort r elements

## 6.3.2 process

### 1. Generate k initial merge segments

The blocks of K merge segments are read into K input buffers

The L records in each input buffer are internally sorted to form K ordered initial merge segments

### 2. Merge the route S and route K S = ⌈ l o g k r ⌉ S=\lceil log_kr\rceil S=⌈logk​r⌉

The smallest record of each segment is selected from the K merge segments with the idea of merge sorting and temporarily stored in the output buffer

• When an input buffer is empty, a new block is read immediately
• The next round of comparison is performed only when the buffer is not empty

Output buffer full, write out external memory

## 6.3.3 time cost

t Total time = t Read / write external memory + t Internal sort + t Internal merger t_ {total time} = t_ {read / write external memory} + T_ {internal sort} + T_ {internal merge} Ttotal time = t read / write external memory + T internal sort + T internal merge

• Reading, writing and going out: read as many times as you merge

• Internal sorting: the number of segments can be optimized, but the number of read / write IO is unchanged

Intra block sorting: it takes time to construct the initial merge period

• Internal merge: select the most value in the K-path and put it into the output buffer

## 6.3.4 optimization

### Increase the number of merging paths

#### cost

Increase the input buffer of the response

It is required to select a minimum element from k merge segments each time k − 1 k-1 k − 1 keyword comparison
S ( n − 1 ) ( k − 1 ) = ⌈ l o g k r ⌉ ( n − 1 ) ( k − 1 ) = ⌈ l o g 2 r ⌉ ( n − 1 ) ( k − 1 ) ⌈ l o g 2 r ⌉ S(n-1)(k-1) = \lceil log_kr\rceil(n-1)(k-1)=\frac{\lceil log_2r\rceil(n-1)(k-1)}{\lceil log_2r \rceil} S(n−1)(k−1)=⌈logk​r⌉(n−1)(k−1)=⌈log2​r⌉⌈log2​r⌉(n−1)(k−1)​

#### Internal merger optimization

Loser tree, reduce keyword comparison times

Path K has k leaves

• Tree height = Comparison times = ⌈ l o g 2 K ⌉ Tree height = comparison times = \ lceil log_2K \rceil Tree height = comparison times = ⌈ log2 K ⌉

Total comparisons

• S ( n − 1 ) ⌈ l o g 2 k ⌉ = ⌈ l o g k r ⌉ ( n − 1 ) ⌈ l o g 2 k ⌉ = ⌈ l o g 2 r ⌉ ( n − 1 ) S(n-1)\lceil log_2k \rceil = \lceil log_kr \rceil(n-1)\lceil log_2k \rceil = \lceil log_2r \rceil(n-1) S(n−1)⌈log2​k⌉=⌈logk​r⌉(n−1)⌈log2​k⌉=⌈log2​r⌉(n−1)

### k-way balanced merge

Up to k segments and 1

If there are m segments in each trip, there will be ⌈ m r ⌉ \lceil \frac{m}{r} \rceil ⌈rm​⌉

### Multiple balanced merge

Posted by bibie on Sun, 21 Aug 2022 09:50:12 +0300