Algorithm - four channel shell programming about LeetCode

Recently, I came across some simple commands under linux and the use of awk sed and other text editors in the interview. I tried my best to button up four shell questions for learning. Of course, I also refer to some other people's answers as a summary. Please supplement and exchange. If there is infringement, contact and delete it

Question 1: Line 10 (examine and print specific lines)

Example:
Suppose the file Txt has the following contents:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Your script should display the tenth line:
Line 10

The following three modes can be operated:

1. grep -n "" file.txt | grep -w '10' | cut -d: -f2
2. sed -n '10p' file.txt
3. awk '{if(NR==10){print $0}}' file.txt

However, considering that the number of lines in the description is less than 10, the following can be done:

row_num=$(cat file.txt | wc -l)
echo $row_num
if [ $row_num -lt 10 ];then
    echo "The number of row is less than 10"
else
    awk '{if(NR==10){print $0}}' file.txt
fi

Supplement: the number of file lines is row_num can be obtained in the following ways

1. awk '{print NR}' file.txt | tail -n1
2. awk 'END{print NR}' file.txt 
3. grep -nc "" file.txt
4. grep -c "" file.txt 
5. grep -vc "^$" file.txt 
6. grep -n "" file.txt|awk -F: '{print '}|tail -n1 | cut -d: -f1
7. grep -nc "" file.txt
8. sed -n "$=" file.txt 
9. wc -l file.txt 
  10 file.txt
  cat file.txt | wc -l
10. wc -l file.txt | cut -d' ' -f1

Question 2: valid phone numbers (check regular expressions)

Give a text file containing a list of phone numbers (one phone number per line) Txt, write a bash script and output all valid phone numbers.
You can assume that a valid phone number must meet the following two formats: (xxx) XXX XXXX or xxx xxx xxx XXXX. (x represents a number)
You can also assume that there are no extra space characters before and after each line.
Example:
Suppose the file Txt contents are as follows:
987-123-4567
123 456 7890
(123) 456-7890

Your script should output the following valid phone numbers:
987-123-4567
(123) 456-7890

Preliminary knowledge:
1. Special character expression:

2. Qualifier expression:

Note: the number of occurrences in the table meaning: the number of occurrences of characters before the qualifier.

3. Locator:

About grep's 4 Chinese writing method:

1. cat file.txt  | grep -P "^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$|^[0-9]{3}-[0-9]{3}-[0-9]{4}$"
2. grep -P '^(\(\d{3}\) |\d{3}-)\d{3}-\d{4}$' file.txt
3. grep -P '^([(]\d{3}[)] |\d{3}-)\d{3}-\d{4}$' file.txt
4. grep -E '^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$' file.txt

Note: don't lose the space, () is an ordinary character, and "" don't lose it
^: indicates the beginning of the line, with Start, which means to start with (xxx) or XXX -, pay attention to the space
(): select an operator, either ([0-9] {3}) or [0-9] {3}-
|: or join operator, indicating:
[]: single character space occupation, [0-9] represents one digit
{n} : match n bits, [0-9] {3} match three consecutive digits
$: end of line

About the 3 middle way of awk '/ regular expression / {print $0}':

1. awk '/^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/' file.txt
2. gawk '/^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/' file.txt
3. awk '/^\([0-9]{3}\)\s[0-9]{3}\-[0-9]{4}$|^[0-9]{3}\-[0-9]{3}\-[0-9]{4}$/{print $0}' file.txt

Where \ s in 3 represents a space

Question 3: effective word frequency

Write a bash script to count a text file {words The frequency of each word in txt.
For simplicity, you can assume:
words.txt includes only lowercase letters and ''.
Each word consists of only lowercase letters.
Words are separated by one or more space characters.

Example:
Suppose words Txt reads as follows:
the day is sunny the the
the sunny is is

Your script should output (in descending order of word frequency):
the 4
is 3
sunny 2
day 1

Method 1:

Problem solving ideas:
Take out each word; NF = how many columns are there in a row, take them out one by one in the for loop, and print one for each row
Put the same words together: sort
Merge the same words and mark the number: uniq -c; c stands for count
Sort by number in descending order: sort-$ 1 sorted by $1 count- r descending order
Print words first, then numbers: awk print ,
code:

awk '{for(i=0;i<NF;i++){print $(i+1)}}' words.txt |sort|uniq -c|sort -$1 -r|awk '{print $2,$1}'
Method 2:

Problem solving ideas:
xargs split string - n 1 means that one is output for each line, and - d can be added to specify the split
To use uniq to count word frequency, you need to count the same characters of the text together, so sort uniq -c first, indicating the number of occurrences output at the same time
sort -nr, where - n means to treat the number as a real number (when the number is treated as a string, 11 is smaller than 2)
code:

cat words.txt | xargs -n 1 | sort | uniq -c | sort -nr | awk '{print $2" "$1}'
Method 3:

Get words Txt content
Replace spaces with line breaks and make each word a separate column
Sort alphabetically
Use uniq -c to count frequency
Reverse sort
Filter empty characters and output the results
code:

cat words.txt | tr ' ' '\n' | sort | uniq -c | sort -r | awk '$2!="" {print $2" "$1}'

Question 4: transpose files

Given a file, {file Txt, transpose its contents.
You can assume that the number of rows and columns is the same, and each field is separated by ''
Example:
Suppose the file Txt file contents are as follows:
name age
alice 21
ryan 30

Should output:
name alice ryan
age 21 30

Attach a particularly clear explanation:

link

The following is the content of the blog:

awk command:

awk is a report generator with powerful text formatting capabilities. It was created by Alfred Aho, Peter Weinberger and Brian Kernighan. awk is composed of the first letter of their surnames.

The basic syntax of awk is awk [options] 'Pattern{Action}' file.

From the simplest command as start, omit [options] and Pattern, and set the Action to the simplest print:

$ echo abc > test.txt
$ awk '{print}' test.txt

abc
This is the simplest awk usage: use the awk command to test the text Txt (each line of the), and the processing action is print.

Solution:
awk '{
    for (i=1;i<=NF;i++){
        if (NR==1){  
            res[i]=$i
        }
        else{
            res[i]=res[i]" "$i
        }
    }
}END{
    for(j=1;j<=NF;j++){
        print res[j]
    }
}' file.txt

Resolution:
awk processes text files line by line. The operation process is as follows:

  1. Run {Action} after BEGIN, which is equivalent to the header
  2. Then run the file processing body command in {Action}
  3. Finally, run the command in {Action} after END

There are several frequently used awk constants: NF is the number of field s in the current row; NR is the current number of rows being processed.

Note that it is transposed. If the original text has m rows and N columns (fields), the transposed text should have n rows and m columns, that is, each field of the original text corresponds to a row of the new text. We can use the array res to store the new text, and save each line of the new text as an element of the array res.

Before END, we traverse the file Txt, and make a judgment: in the first line, put each field in the res array in order; Starting from the second line, each field is appended to the END of the corresponding element (with a space in the middle).

The text processing is finished and needs to be output at last. After END, traverse the array and output each row. Note that printf does not wrap, but print does.

Tags: Programmer

Posted by ruzztec on Wed, 25 May 2022 10:59:59 +0300