# Data structure and algorithm day 7 common sorting + bubble sorting + quick sorting + file IO + big data sorting + file merging

If you really can't understand it, memorize it by rote. If you don't want to memorize it, treasure it

# Chapter I bubble sorting

Bubble Sort is also a simple and intuitive sorting algorithm. It repeatedly visits the sequence to be sorted, compares two elements at a time, and exchanges them if they are in the wrong order. The work of visiting the sequence is repeated until there is no need to exchange, that is, the sequence has been sorted. The name of this algorithm comes from the fact that smaller elements will slowly "float" to the top of the sequence through exchange.

## [1]Bubble_Sort.c

/*
* @Date: 2021-09-03 14:09:13
* @LastEditTime: 2021-09-03 15:35:44
* @Description: In User Settings Edit
* @FilePath: \Desktop\bb.c
*/
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define SIZE    10000

//Bubble sorting from small to large
void bubble_sort(int *arr, int len)//100
{
int sort_count, pos;
int tmp;

//How many times to sort: sort_count: several data have been arranged for counting. len-1: the last one does not need to be sorted
for(sort_count=0; sort_count<len-1; sort_count++)
{
//pos: where to start the comparison, len sort_ Count-1: total data length - the number of data already arranged - 1 (this - 1 is because pos + 1 will be accessed in the following operations of pos)
for(pos=0; pos<len-sort_count-1; pos++)
{
if(arr[pos] > arr[pos+1])//Previous data
{
//Exchange data
tmp = arr[pos];
arr[pos] = arr[pos+1];
arr[pos+1] = tmp;
}
}

}

}

int main(void)
{
int *array = malloc(SIZE*sizeof(int));
int i;

srand(time(NULL));//Random number seed

for(i=0; i<SIZE; i++)
{
array[i] = rand()%SIZE;//Start seed
}

bubble_sort(array, SIZE);//Sorting function

for(i=0; i<SIZE; i++)//Print
{
printf("%d ", array[i]);
}

printf("\n");

return 0;
}


# Chapter II quick sort

Quick sort uses the Divide and conquer strategy to divide a list into two smaller and larger subsequences, and then sort the two subsequences recursively.
The steps are:
Select benchmark value: select an element from the sequence, which is called "pivot";

Split: reorder the sequence. All elements smaller than the benchmark value are placed in front of the benchmark, and all elements larger than the benchmark value are placed behind the benchmark (the number equal to the benchmark value can be on either side). After this division, the sorting of benchmark values has been completed;

Recursively sort subsequence: recursively sort subsequences that are less than the reference value element and subsequences that are greater than the reference value element. The judgment condition for recursion to the bottom is that the size of the sequence is zero or one. At this time, the sequence is obviously ordered.
There are several specific methods to select the benchmark value, which has a decisive impact on the time performance of sorting.

## [1]quick_sort.c

/*
* @Date: 2021-09-03 15:21:41
* @LastEditTime: 2021-09-03 15:34:54
* @Description: In User Settings Edit
* @FilePath: \Desktop\quick.c
*/
/**********************************************
*  File Name: quick.c
*  Created  @ 2016-08-03 21:20
*  Author   : Gec
*  E-mail   : 2034294993@qq.com
**********************************************/
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define SIZE    100000

void quick(int arr[],  int left, int right)//left=0   right=100000-1
{
int i;

if(left < right)
{
int tmp = arr[left];//The first value is given to the temporary variable to make room for the first position
int low = left;//Low order 0
int high = right;//High 99999

//Quick sort alternate comparison
while(low < high)//The loop is based on the fact that the left subscript is less than the right subscript
{
//Compare from right to left, and judge whether the left subscript has a value less than the right subscript in real time, and whether the right value is less than the exchange value in the middle
while(low < high && arr[high] >= tmp)
{
high--;//Until the condition is not satisfied, the value arr[high] is either = = TMP or < TMP
}//When you jump out of the loop, it is the first place: the following numbers have been compared with the first one one by one
//Prove that the first one is really the smallest. It should be recorded at this time,

arr[low] = arr[high];//Put the right value into the free space

//Compare from left to right, and judge whether the left subscript is greater than or equal to the right subscript in real time, and whether the right value is less than the exchange value in the middle
while(low < high && arr[low] < tmp)//Note that the high value has changed at this time
{
low++;
}

arr[high] = arr[low];

}

//The last remaining position is the empty one, and then it's sorted
arr[low] = tmp;

quick(arr,left,low-1);//Left
quick(arr,low+1,right);//right

}
}

int main(int argc, char **argv)
{

int *array = malloc(SIZE*sizeof(int));
int i;

srand(time(NULL));//seed

for(i=0; i<SIZE; i++)
{
array[i] = rand()%SIZE;//Generate random number
}

quick(array, 0, SIZE-1);// 0 -- 8 are subscripts

for(i=0; i<SIZE; i++)//Print
{
printf("%d ", array[i]);
}

printf("\n");

return 0;
}



# Chapter III big data sorting

Here we need to use the method of file io

## [1]calloc

calloc Function and malloc The functions are similar in that they allocate memory from the heap. The function declaration is as follows:
void *calloc(int n,int size);
Parameter definition:
size: How big is each piece
n: Number of blocks applied
Note: the last applied space size is: n and size Multiply


## [2]malloc

malloc The function can obtain the memory space of the specified byte from the heap, and its function declaration is as follows:
void * malloc(int n);
Parameter definition:
n: Requested space size (single type size)*Total number)


Generally, the size of malloc application is insufficient and needs to be expanded
Prototype: external void * realloc (void mem_address, unsigned int newsize);
Syntax: pointer name = (data type) realloc (pointer name to change memory size, new size).
//If the new size is smaller than the original size, the end of the original data may be lost (overwritten by other data using memory, etc.)
Header file: #include < stdlib h> Some compilers require #include < malloc h>

## [4] File IO operation

### [4.3]fnprintf

// Open a file that contains millions of data levels
FILE *src = fopen("numbers.txt", "r");//Open in read-only mode
if(src == NULL)
{
perror("fail to open file");
exit(0);//End program
}


int fscanf ( FILE *fp, char * format, ... );
int fprintf ( FILE *fp, char * format, ... );

Input file operation: src Is a file descriptor
%u:Is the format taken from the file, obviously unsigned int type
Return value: failed EOF=-1
fscanf(src, "%u", &data[i])


Prototype: int snprintf(char *str, size_t size, const char *format,...);

Description: The functions snprintf() write at most size bytes (including the terminating null byte ('\ 0')) to str. that is, snprintf writes up to size characters into STR according to format format, where size characters already include terminators.

### [4.4] rename and remove functions

rename The command modifies file names in batch by string replacement.
Format: rename from to file
from :  Indicates the character to be replaced or processed, which is generally a part of the file name,
to : take from Content, replace with to Content of the.
file : Documents to be processed.

remove(f1);//Delete f1 file descriptor

/*
* @Date: 2021-09-03 15:43:04
* @LastEditTime: 2021-09-03 15:45:06
* @Description: In User Settings Edit
* @FilePath: \Desktop\big_data_sort.c
*/
#include <stdio.h>
#include <stdbool.h>

//Swap two numbers
void swap(int *a, int *b)//When the parameter is a pointer, you will keep the value of all operations on a and B, and there is no return form
{
int tmp;
tmp = *a;
*a = *b;
*b = tmp;
}

// Merge all files from begin to end and merge them into the file begin=1
void merge(int begin, int end)//end=20
{
if(end-begin <= 0)
return;

// Merge all files from begin+1 to end and merge them into the file begin+1
merge(begin+1, end);//Recursive self

// Merge begin and begin+1
char f1[10], f2[10];
bzero(f1, 10);
bzero(f2, 10);
//Set begin Txt files saved in fx
snprintf(f1, 10, "%d.txt", begin);
snprintf(f2, 10, "%d.txt", begin+1);

//printf("merging% s and% s...\n", f1, f2);

FILE *fp1 = fopen(f1, "r");
FILE *fp2 = fopen(f2, "r");

unsigned n1, n2;
if(fscanf(fp1, "%u", &n1) == EOF)
{
fclose(fp1);
fclose(fp2);

remove(f1);//delete

rename(f2, f1);//Rename f2 to f1
return;
}
if(fscanf(fp2, "%u", &n2) == EOF)
{
fclose(fp1);
fclose(fp2);

remove(f2);
return;
}

// Temporarily store the merged data, and finally change its name to begin
FILE *tmp = fopen("tmp.txt", "w+");//Open file in read-write mode
int fp1_done = false;
int fp2_done = false;
while(1)
{
if(n1 < n2)//Comparison of two file descriptors
{
fprintf(tmp, "%u\n", n1);//tmp file descriptor = n1(fp1) file content
if(fscanf(fp1, "%u", &n1) == EOF)//
{
fprintf(tmp, "%u\n", n2);
fp1_done = true;
break;
}
}
else
{
fprintf(tmp, "%u\n", n2);
if(fscanf(fp2, "%u", &n2) == EOF)
{
fprintf(tmp, "%u\n", n1);
fp2_done = true;
break;
}
}
}

// Put the remaining data into tmp
if(fp1_done)
{
while(fscanf(fp2, "%u", &n2) != EOF)
{
fprintf(tmp, "%u\n", n2);
}
}
else
{
while(fscanf(fp1, "%u", &n1) != EOF)
{
fprintf(tmp, "%u\n", n1);
}
}

// Delete the original file and rename tmp to begin
fclose(fp1);
fclose(fp2);

if(remove(f1) == -1)
{
printf("Delete file%s fail:%s\n", f1, strerror(errno));
exit(0);
}
if(remove(f2) == -1)
{
printf("Delete file%s fail:%s\n", f2, strerror(errno));
exit(0);
}

fclose(tmp);
rename("tmp.txt", f1);
}

int main(int argc, char **argv)
{
// Open a file that contains millions of data levels
FILE *src = fopen("numbers.txt", "r");//Open in read-only mode
if(src == NULL)
{
perror("fail to open file");
exit(0);//End program
}

// 1. Divide the original data file into N ordered sub files
bool done = false;
char file[20];
int N = 0;
int wanted = 10*10000; // It is assumed that only 100000 data can be read at a time
int infact = wanted;//Crazy iron: 100000 Volts
while(1)
{
// Attempted to read wanted data from file
unsigned *data = calloc(wanted, sizeof(unsigned));//Apply for 100000 pieces, 4 bytes each
for(int i=0; i<wanted; i++)
{
if(fscanf(src, "%u", &data[i]) == EOF)//From the file numbers Txt takes out the unsigned data and stores it in the data array
{
done = true;
infact = i;
break;
}
}

// Sort the read infact data
quickSort(data, infact);//Quick sort

// Create temporary files to store some data
bzero(file, 20);
//File array name, 20 array size, format, stored data example: file [0] = 0 txt
snprintf(file, 20, "%d.txt", ++N);//Used to load file names, up to 20 txt file name

FILE *fp = fopen(file, "w");//Write only mode

// Write the ordered part of the data into a temporary file and save it
for(int i=0; i<infact; i++)
{
fprintf(fp, "%u\n", data[i]);
}
free(data);
fclose(fp);

if(done)
break;
}
fclose(src);

// 2. Merge N ordered sub files into one ordered file
merge(1, N);

return 0;
}


Creation is really not easy, but what does it matter if you can help people

Posted by nitm on Sun, 01 May 2022 19:51:41 +0300