Data structure - sorting

Data structure note Directory:

1. Introduction and time complexity
2. Linear meter
3. Trees
4. Fig
5. Find
6. Sorting

6.1 basic concepts

6.1.1 objectives

Order keywords

6.1.2 classification

Internal sort

Data elements are all in memory during sorting

External sort

During the sorting process, data is constantly moving between internal and external storage

6.1.3 stability

After sorting, the elements with the same keyword keep the original order

6.1.4 two operations

compare

Determine the priority of keywords

move

Move elements to achieve sequence order

6.1.5 one trip sequencing

Process all unprocessed elements once

6.2 internal sorting

6.2.1 summary

stability

Return to the foundation and stabilize

  • Half insert
  • Direct insertion
  • Bubble sort
  • Merge sort
  • Cardinality sort

Is stable

Unstable sorting algorithms include:

  • Shell Sort
  • Quick platoon
  • Simple selection sort
  • Heap sort

Applicable to chain storage

Selection and return to base

  • Direct insert sort
  • Bubble sort
  • Simple selection sort
  • Merge sort
  • Cardinality sort

No ordered subsequence will be formed

Quick platoon

Time complexity

The time complexity is independent of the initial state

A bunch of () turtles () choose () friends ()

  • Simple choice O ( n 2 ) O(n^2) O(n2)
  • Heap sort O ( n l o g n ) O(nlogn) O(nlogn)
  • Merge sort O ( n l o g n ) O(nlogn) O(nlogn)
  • Cardinality sort O ( r d ( n + r ) ) O(rd(n+r)) O(rd(n+r))

The number of comparisons is independent of the initial state

Select sort

Related to initial state

Insert sort
  • Best: orderly

    direct O ( n ) O(n) O(n)

    Half O ( n l o g n ) O(nlogn) O(nlogn)

  • Worst case: reverse order

Swap sort
  • Bubble

    Best: orderly O ( n ) O(n) O(n)

    Worst case: reverse order O ( n 2 ) O(n^2) O(n2)

  • Quick platoon

    Best: pivot split O ( n l o g n ) O(nlogn) O(nlogn)
    Worst case: basic order, reverse order O ( n 2 ) O(n^2) O(n2)

The number of bubble sorting passes is related to the initial state

Is O(nlogn)

  • Quick platoon
  • Stacking
  • Merge

Spatial complexity

O(1)

  • Direct insertion
  • Half insert
  • Shell Sort
  • Bubble sort
  • Simple choice
  • Heap sort

O(logn)

Quick platoon is the best

O(n)

  • Quick platoon worst
  • Merge sort
  • Worst cardinality ®

Applicable scenario:

scale

Take the smallest k

  • k times bubble sort
  • k simple selection sorting
  • k-pass heap sorting

Forming a partially ordered sequence

  • Direct insertion
  • Bubble sort
  • Quick platoon
  • Simple choice
  • Heap sort

6.2.2 sorting of five categories

Insert sort

thinking

Before the last trip, all elements may not be in the final position

Direct insert sort

thinking

realization
//Sort from small to large
void Direct_InsertSort(ElemType A[],int n){
	int i,j;
    for(i = 2;i <= n;++i){
        if(A[i] < A[i-1]){
            //If the element to be inserted a [i] < a [I-1], find the insertion position from the back to the front
            A[0] = A[i];
            //A is a fixed sequence, and the data storage has been determined, so the sentinel needs to record the elements to be inserted      
            for(j = i-1;A[0] > A[j];--j)
                A[j+1] = A[j];
           	A[j+1] = A[0];
        }
    }
}
characteristic

Half insert

Direct insertion: when finding the insertion position, the traversal is changed to half insertion

realization
void Binary_InsertSort(ElemType A[],int n){
    int i,j,low,high,mid;
    
    for(i = 2;i <= n;++i){//Sequentially insert A[2]~A[n] into the ordered subsequence
        A[0] = A[i];
        
        low = 1;high = i - 1;
        while(low <= high){
            mid = (low+high)/2;
            if(A[mid] > A[0])
                high = mid - 1;
            else
                low = mid + 1;
        }
        //mid is the insertion position
        for(j = i - 1;j >= mid;--j)
            A[j+1] = A[j];
        A[high + 1] = A[0];
    }
}
characteristic

Shell Sort

Understanding: since direct insertion sorting is suitable for roughly ordered sequences, data can be preprocessed before direct insertion sorting

thinking
  • Take a number every other step
  • The step size is the interval between two fetches
characteristic

Select sort

Idea: select the most value as the endpoint of the ordered subsequence

Simple selection sort

thinking


realization
void selectSort(ElemType A[],int n){
    int i,j,min;
    for(i = 0;i < n-1;++i){
        minIndex = i;
        for(j = i + 1;j < n;++j)
            if(A[j] < A[minIndex])
                minIndex = j;
        if(minIndex != i)
            swap(A[minIndex],A[i]);
    }
    
}
characteristic

Heap sort

thinking

storage structure

step
  1. Build initial reactor

    Adjust from the last non leaf node n 2 or n 2 + 1 ∼ 1 \frac{n}{2} or \ frac{n}{2}+1 \sim 1 2n or 2n + 1 ∼ 1

  2. Take the root and exchange with the last element of the heap

  3. Readjust the stack and repeat the above steps

operation

delete

  • Top to bottom comparison
  • Call heapdadjust once, adjust from the root, compare the two nodes first, and exchange according to the result
  • Time complexity O ( h ) = O ( l o g n ) O(h) = O(logn) O(h)=O(logn)

insert

  • Comparison from bottom to top

  • Comparison range: This subtree

    take K n + 1 K_{n+1} Kn+1 is inserted at the end as a leaf node, and the K n + 1 K_{n+1} Kn+1 compared with its new parents until K n + 1 K_{n+1} Kn+1 meets the characteristics of the reactor

realization
void BuildMaxHeap(ElemType A[],int n){
    for(i = n/2;i >= 1;--i)
        HeapAdjust(A,i,n);
}

void HeapAdjust(ElemType A[],int k,int n){
    A[0] = A[k];
    
    for(i = 2*k;i <= n;i*=2){//Adjust downward layer by layer, and the maximum value must be at the left and right child nodes
        if(i < n && A[i] < A[i+1])//If the right child node is larger than the left child node element
            i++;	//Take the subscript of the right child node
        if(A[0] >= A[i])
            break;	//Put maximum value at root
        else{
            A[k] = A[i];
            l = i;//k is the maximum subscript
        }
    }
    
    A[k] = A[0];
}

void HeapSort(ElemType A[],int n){
    BuildMaxHeap(A,n);
    for(i = n;i > 1;--i){
        swap(A[i],A[1]);
        HeapAdjust(A,1,i-2);
    }
}
characteristic

Swap sort

Bubble sort

thinking

Each pass places a maximum or minimum element in the final position

realization
void BubbleSort(int arr[],int n){
    for(int i = 0;i < n;++i){
        bool flag = false;//Whether the tag is swapped
        for(int j = 0;j > n-i-1;++j){//One trip bubble sorting
            if(arr[i] > arr[j+1]){//Put the big one at the back
                swap(arr[j],arr[j+1]);
                flag = true;
            }
        }
        
        if(flag = false)//If there is no exchange in a certain trip, it indicates that it is in order
            return ;
    }
}
characteristic

Quick sort

thinking

Improvement of bubble sorting: divide and conquer method

Constantly move the pivot position to find the balance point

Overall process

One pass sorting

realization
int Partition(ElemType A[],int low,int high){
    ElemType pivot = A[low];
    while(low < high){
        while(low < high && A[high] >= pivot)
            --high;
        A[low] = A[high];
        while(low < high && A[low] <= pivot)
            ++low;
        A[low] = A[high];
    }
    
    A[low] = pivot;
    
    return low;
}

void QuickSort(ElemType A[],int low,int high){
    if(low < high){
        int pivotpos = Partition(A,low,high);
        QuickSort(A,low,pivotpos-1);
        QuickSort(A,pivotpos+1,high);
    }
}
characteristic

Merge sort

thinking

decompose

The sorted table with n elements is divided into: n 2 \frac{n}{2} 2n element, sorting the two sub tables respectively

merge

Merge two sorted sub tables to get the result

realization
void Merge(ElemType A[],int low,int mid,int high){
    //In the table, A[low,...,mid],A[mid+1,...,high] are ordered respectively
    ElemType *B = (ElemType *)malloc(sizeof(ElemType)*(high-low+1));
    
    for(int i = 1;i <= high;++i)
        B[i] = A[i];//Initialize auxiliary array
    for(int i = low,j = mid+1,k = 1;i <= mid && j <= high;++k){
		if(B[i] <= B[j])
            A[k] = B[i++];
        else
            A[k] = B[j++];
    }
    
    while(i <= mid)
        A[k++] = B[i++];//The first table has not been detected, and assignment
    while(j <= high)
        A[k++] = B[j++];//The second table has not been detected, and the assignment
}

void MergeSort(ElemTypeA[],int low,int high){
	if(low < high){
        int mid = (low + high)/2;
        MergeSort(A,low,mid);
        MergeSort(A,mid+1,high);
        Merge(A,low,mid,high);
    }
}
characteristic

Cardinality sort

Not based on comparison and lookup

thinking

Each group of keywords, sorted by the same logic

Arrange primary keywords first and then secondary keywords

If the primary keyword is arranged first, the situation that the secondary keywords are the same cannot be handled after the primary keyword is arranged. The actual situation is to arrange the secondary keyword first and then the primary keyword

For example: 010 102 110 212 cardinal order

First primary, then secondary

Obviously, we can't be the first to be the second

The second is the first

characteristic

Bucket sort (count sort)

Applicable to: sorting integers within a certain range

# include<stdio.h>

void countSort(int nums[],int );

int main(){
    int nums[] = {2,3,1,4,3}; 

    countSort(nums,5);

    return 0;
}

void countSort(int nums[],int n){
    // 1. Determine bucket array
    //If the maximum value is 4, it needs 0-4, that is, 5 arrays occupy 6 array element positions
    int max = 0;
    for(int i = 0;i < n;++i){
        if(max < nums[i])
            max = nums[i];
    }

    int bucket[max+1] = {0};

    // 3. Counting
    for(int i = 0;i < n;++i)
        bucket[nums[i]]++;
    
    for(int i = 0;i < max;++i)
        printf("%d ",i);
    printf("Number of\n");
    for(int i = 0;i < max;++i)
        printf("%d ",bucket[i]);
}

6.3 external sequencing

6.3.1 storage space

K-way merge sort, one output buffer and K input buffer in memory, sort r elements

6.3.2 process

1. Generate k initial merge segments

The blocks of K merge segments are read into K input buffers

The L records in each input buffer are internally sorted to form K ordered initial merge segments

2. Merge the route S and route K S = ⌈ l o g k r ⌉ S=\lceil log_kr\rceil S=⌈logk​r⌉

The smallest record of each segment is selected from the K merge segments with the idea of merge sorting and temporarily stored in the output buffer

  • When an input buffer is empty, a new block is read immediately
  • The next round of comparison is performed only when the buffer is not empty

Output buffer full, write out external memory

6.3.3 time cost

t Total time = t Read / write external memory + t Internal sort + t Internal merger t_ {total time} = t_ {read / write external memory} + T_ {internal sort} + T_ {internal merge} Ttotal time = t read / write external memory + T internal sort + T internal merge

  • Reading, writing and going out: read as many times as you merge

  • Internal sorting: the number of segments can be optimized, but the number of read / write IO is unchanged

    Intra block sorting: it takes time to construct the initial merge period

  • Internal merge: select the most value in the K-path and put it into the output buffer

6.3.4 optimization

Increase the number of merging paths

cost

Increase the input buffer of the response

It is required to select a minimum element from k merge segments each time k − 1 k-1 k − 1 keyword comparison
S ( n − 1 ) ( k − 1 ) = ⌈ l o g k r ⌉ ( n − 1 ) ( k − 1 ) = ⌈ l o g 2 r ⌉ ( n − 1 ) ( k − 1 ) ⌈ l o g 2 r ⌉ S(n-1)(k-1) = \lceil log_kr\rceil(n-1)(k-1)=\frac{\lceil log_2r\rceil(n-1)(k-1)}{\lceil log_2r \rceil} S(n−1)(k−1)=⌈logk​r⌉(n−1)(k−1)=⌈log2​r⌉⌈log2​r⌉(n−1)(k−1)​

Internal merger optimization

Loser tree, reduce keyword comparison times

Path K has k leaves

  • Tree height = Comparison times = ⌈ l o g 2 K ⌉ Tree height = comparison times = \ lceil log_2K \rceil Tree height = comparison times = ⌈ log2 K ⌉

Total comparisons

  • S ( n − 1 ) ⌈ l o g 2 k ⌉ = ⌈ l o g k r ⌉ ( n − 1 ) ⌈ l o g 2 k ⌉ = ⌈ l o g 2 r ⌉ ( n − 1 ) S(n-1)\lceil log_2k \rceil = \lceil log_kr \rceil(n-1)\lceil log_2k \rceil = \lceil log_2r \rceil(n-1) S(n−1)⌈log2​k⌉=⌈logk​r⌉(n−1)⌈log2​k⌉=⌈log2​r⌉(n−1)

k-way balanced merge

Up to k segments and 1

If there are m segments in each trip, there will be ⌈ m r ⌉ \lceil \frac{m}{r} \rceil ⌈rm​⌉

Reduce the number of initial merging segments

Multiple balanced merge

Tags: Algorithm data structure

Posted by bibie on Sun, 21 Aug 2022 09:50:12 +0300