Data structure and algorithm day 7 common sorting + bubble sorting + quick sorting + file IO + big data sorting + file merging

If you really can't understand it, memorize it by rote. If you don't want to memorize it, treasure it

Chapter I bubble sorting


Bubble Sort is also a simple and intuitive sorting algorithm. It repeatedly visits the sequence to be sorted, compares two elements at a time, and exchanges them if they are in the wrong order. The work of visiting the sequence is repeated until there is no need to exchange, that is, the sequence has been sorted. The name of this algorithm comes from the fact that smaller elements will slowly "float" to the top of the sequence through exchange.

[1]Bubble_Sort.c

/*
 * @Author: your name
 * @Date: 2021-09-03 14:09:13
 * @LastEditTime: 2021-09-03 15:35:44
 * @LastEditors: Please set LastEditors
 * @Description: In User Settings Edit
 * @FilePath: \Desktop\bb.c
 */
#include <stdio.h>
#include <time.h>
#include <stdlib.h>


#define SIZE    10000

//Bubble sorting from small to large
void bubble_sort(int *arr, int len)//100
{
    int sort_count, pos;
    int tmp;

    //How many times to sort: sort_count: several data have been arranged for counting. len-1: the last one does not need to be sorted
    for(sort_count=0; sort_count<len-1; sort_count++)
    {
        //pos: where to start the comparison, len sort_ Count-1: total data length - the number of data already arranged - 1 (this - 1 is because pos + 1 will be accessed in the following operations of pos)
        for(pos=0; pos<len-sort_count-1; pos++)
        {
            if(arr[pos] > arr[pos+1])//Previous data
            {
                //Exchange data
                tmp = arr[pos];
                arr[pos] = arr[pos+1];
                arr[pos+1] = tmp;
            }
        }

    }

}

int main(void)
{
    int *array = malloc(SIZE*sizeof(int));
    int i;

    srand(time(NULL));//Random number seed

    for(i=0; i<SIZE; i++)
    {
        array[i] = rand()%SIZE;//Start seed
    }

    bubble_sort(array, SIZE);//Sorting function


    for(i=0; i<SIZE; i++)//Print
    {
        printf("%d ", array[i]);
    }

    printf("\n");

    return 0;
}

Chapter II quick sort

Quick sort uses the Divide and conquer strategy to divide a list into two smaller and larger subsequences, and then sort the two subsequences recursively.
The steps are:
Select benchmark value: select an element from the sequence, which is called "pivot";

Split: reorder the sequence. All elements smaller than the benchmark value are placed in front of the benchmark, and all elements larger than the benchmark value are placed behind the benchmark (the number equal to the benchmark value can be on either side). After this division, the sorting of benchmark values has been completed;

Recursively sort subsequence: recursively sort subsequences that are less than the reference value element and subsequences that are greater than the reference value element. The judgment condition for recursion to the bottom is that the size of the sequence is zero or one. At this time, the sequence is obviously ordered.
There are several specific methods to select the benchmark value, which has a decisive impact on the time performance of sorting.

[1]quick_sort.c

/*
 * @Author: your name
 * @Date: 2021-09-03 15:21:41
 * @LastEditTime: 2021-09-03 15:34:54
 * @LastEditors: Please set LastEditors
 * @Description: In User Settings Edit
 * @FilePath: \Desktop\quick.c
 */
/**********************************************
 *  File Name: quick.c
 *  Created  @ 2016-08-03 21:20
 *  Author   : Gec
 *  E-mail   : 2034294993@qq.com
 **********************************************/
#include <stdio.h>
#include <time.h>
#include <stdlib.h>


#define SIZE    100000

void quick(int arr[],  int left, int right)//left=0   right=100000-1
{
	int i;
	
	if(left < right)
	{
			int tmp = arr[left];//The first value is given to the temporary variable to make room for the first position
        	int low = left;//Low order 0
        	int high = right;//High 99999
			
			//Quick sort alternate comparison
        	while(low < high)//The loop is based on the fact that the left subscript is less than the right subscript
    		{
//Compare from right to left, and judge whether the left subscript has a value less than the right subscript in real time, and whether the right value is less than the exchange value in the middle
                while(low < high && arr[high] >= tmp)
				{
                	high--;//Until the condition is not satisfied, the value arr[high] is either = = TMP or < TMP 
                }//When you jump out of the loop, it is the first place: the following numbers have been compared with the first one one by one
				//Prove that the first one is really the smallest. It should be recorded at this time,

            	arr[low] = arr[high];//Put the right value into the free space

//Compare from left to right, and judge whether the left subscript is greater than or equal to the right subscript in real time, and whether the right value is less than the exchange value in the middle
                while(low < high && arr[low] < tmp)//Note that the high value has changed at this time
				{
                	low++;
                }

                arr[high] = arr[low];

			}

//The last remaining position is the empty one, and then it's sorted
        arr[low] = tmp;
		
		quick(arr,left,low-1);//Left
		quick(arr,low+1,right);//right
		
    }
}

int main(int argc, char **argv)
{
	
	int *array = malloc(SIZE*sizeof(int));
    int i;

    srand(time(NULL));//seed

    for(i=0; i<SIZE; i++)
    {
        array[i] = rand()%SIZE;//Generate random number
    }
	
	quick(array, 0, SIZE-1);// 0 -- 8 are subscripts
	

	for(i=0; i<SIZE; i++)//Print
    {
        printf("%d ", array[i]);
    }

    printf("\n");


	return 0;
}


Chapter III big data sorting

Here we need to use the method of file io

[1]calloc

calloc Function and malloc The functions are similar in that they allocate memory from the heap. The function declaration is as follows:
void *calloc(int n,int size);
Parameter definition:
size: How big is each piece
n: Number of blocks applied
 Note: the last applied space size is: n and size Multiply

[2]malloc

malloc The function can obtain the memory space of the specified byte from the heap, and its function declaration is as follows:
void * malloc(int n);
Parameter definition:
n: Requested space size (single type size)*Total number)

[3]readlloc

Generally, the size of malloc application is insufficient and needs to be expanded
Prototype: external void * realloc (void mem_address, unsigned int newsize);
Syntax: pointer name = (data type) realloc (pointer name to change memory size, new size).
//If the new size is smaller than the original size, the end of the original data may be lost (overwritten by other data using memory, etc.)
Header file: #include < stdlib h> Some compilers require #include < malloc h>

[4] File IO operation

[4.1]fprintf

[4.2]fscanf

[4.3]fnprintf

// Open a file that contains millions of data levels
    FILE *src = fopen("numbers.txt", "r");//Open in read-only mode
    if(src == NULL)
    {
        perror("fail to open file");
        exit(0);//End program
    }

int fscanf ( FILE *fp, char * format, ... );
int fprintf ( FILE *fp, char * format, ... );

Input file operation: src Is a file descriptor
%u:Is the format taken from the file, obviously unsigned int type
&data[i]:Array address
 Return value: failed EOF=-1
fscanf(src, "%u", &data[i])

Prototype: int snprintf(char *str, size_t size, const char *format,...);

Description: The functions snprintf() write at most size bytes (including the terminating null byte ('\ 0')) to str. that is, snprintf writes up to size characters into STR according to format format, where size characters already include terminators.

[4.4] rename and remove functions

rename The command modifies file names in batch by string replacement.
Format: rename from to file
from :  Indicates the character to be replaced or processed, which is generally a part of the file name,
to : take from Content, replace with to Content of the.
file : Documents to be processed.
remove(f1);//Delete f1 file descriptor
/*
 * @Author: your name
 * @Date: 2021-09-03 15:43:04
 * @LastEditTime: 2021-09-03 15:45:06
 * @LastEditors: Please set LastEditors
 * @Description: In User Settings Edit
 * @FilePath: \Desktop\big_data_sort.c
 */
#include <stdio.h>
#include <stdbool.h>

//Swap two numbers
void swap(int *a, int *b)//When the parameter is a pointer, you will keep the value of all operations on a and B, and there is no return form
{
    int tmp;
    tmp = *a;
    *a = *b;
    *b = tmp;
}

// Merge all files from begin to end and merge them into the file begin=1
void merge(int begin, int end)//end=20
{
    if(end-begin <= 0)
        return;

    // Merge all files from begin+1 to end and merge them into the file begin+1
    merge(begin+1, end);//Recursive self


    // Merge begin and begin+1
    char f1[10], f2[10];
    bzero(f1, 10);
    bzero(f2, 10);
	//Set begin Txt files saved in fx
    snprintf(f1, 10, "%d.txt", begin);
    snprintf(f2, 10, "%d.txt", begin+1);

    //printf("merging% s and% s...\n", f1, f2);

    FILE *fp1 = fopen(f1, "r");
    FILE *fp2 = fopen(f2, "r");

    unsigned n1, n2;
    if(fscanf(fp1, "%u", &n1) == EOF)
    {
        fclose(fp1);
        fclose(fp2);

        remove(f1);//delete
		
        rename(f2, f1);//Rename f2 to f1
        return;
    }
    if(fscanf(fp2, "%u", &n2) == EOF)
    {
        fclose(fp1);
        fclose(fp2);

        remove(f2);
        return;
    }

    // Temporarily store the merged data, and finally change its name to begin
    FILE *tmp = fopen("tmp.txt", "w+");//Open file in read-write mode
    int fp1_done = false;
    int fp2_done = false;
    while(1)
    {
        if(n1 < n2)//Comparison of two file descriptors
        {
            fprintf(tmp, "%u\n", n1);//tmp file descriptor = n1(fp1) file content
            if(fscanf(fp1, "%u", &n1) == EOF)//
            {
                fprintf(tmp, "%u\n", n2);
                fp1_done = true;
                break;
            }
        }
        else
        {
            fprintf(tmp, "%u\n", n2);
            if(fscanf(fp2, "%u", &n2) == EOF)
            {
                fprintf(tmp, "%u\n", n1);
                fp2_done = true;
                break;
            }
        }
    }

    // Put the remaining data into tmp
    if(fp1_done)
    {
        while(fscanf(fp2, "%u", &n2) != EOF)
        {
            fprintf(tmp, "%u\n", n2);
        }
    }
    else
    {
        while(fscanf(fp1, "%u", &n1) != EOF)
        {
            fprintf(tmp, "%u\n", n1);
        }
    }

    // Delete the original file and rename tmp to begin
    fclose(fp1);
    fclose(fp2);

    if(remove(f1) == -1)
    {
        printf("Delete file%s fail:%s\n", f1, strerror(errno));
        exit(0);
    }
    if(remove(f2) == -1)
    {
        printf("Delete file%s fail:%s\n", f2, strerror(errno));
        exit(0);
    }

    fclose(tmp);
    rename("tmp.txt", f1);
}

int main(int argc, char **argv)
{
    // Open a file that contains millions of data levels
    FILE *src = fopen("numbers.txt", "r");//Open in read-only mode
    if(src == NULL)
    {
        perror("fail to open file");
        exit(0);//End program
    }

    // 1. Divide the original data file into N ordered sub files
    bool done = false;
    char file[20];
    int N = 0;
    int wanted = 10*10000; // It is assumed that only 100000 data can be read at a time
    int infact = wanted;//Crazy iron: 100000 Volts
    while(1)
    {
        // Attempted to read wanted data from file
        unsigned *data = calloc(wanted, sizeof(unsigned));//Apply for 100000 pieces, 4 bytes each
        for(int i=0; i<wanted; i++)
        {
            if(fscanf(src, "%u", &data[i]) == EOF)//From the file numbers Txt takes out the unsigned data and stores it in the data array
            {
                done = true;
                infact = i;
                break;
            }
        }

        // Sort the read infact data
        quickSort(data, infact);//Quick sort


        // Create temporary files to store some data
        bzero(file, 20);
		//File array name, 20 array size, format, stored data example: file [0] = 0 txt
        snprintf(file, 20, "%d.txt", ++N);//Used to load file names, up to 20 txt file name
		
        FILE *fp = fopen(file, "w");//Write only mode

        // Write the ordered part of the data into a temporary file and save it
        for(int i=0; i<infact; i++)
        {
            fprintf(fp, "%u\n", data[i]);
        }
        free(data);
        fclose(fp);

        if(done)
            break;
    }
    fclose(src);

    // 2. Merge N ordered sub files into one ordered file
    merge(1, N);

    return 0;
}

Creation is really not easy, but what does it matter if you can help people

Tags: C Algorithm data structure

Posted by nitm on Sun, 01 May 2022 19:51:41 +0300