[Must learn] The skills of the Three Musketeers under Linux, do you dare not learn?

awk is a programming language for processing text, pattern matching. With sed and grep, commonly known as the Three Musketeers under Linux. Learning awk means that you have another option for processing text in the Linux command line. This article focuses on teaching you how to use it. After reading this article, you will roughly know how to use it, and try to use it simply.

Terminology foreshadowing

In awk's text processing rules, awk treats text files as text databases consisting of fields and records. By default, awk treats each line as a record, that is, the record separator is \n, and the record separator can be changed by the built-in variable RS.

In each record, the record is divided into several fields, that is, the record is composed of fields, and the default separator of the fields is a space or a tab.

1. Basic usage

Like the Linux commands we usually use, awk is also used in a certain format, the format is as follows:

# use format
awk Executed event file

# E.g:
[root@iamshuaidi ~]# awk '{print $0}' test.txt
my first language:Java
second languange:python
third language:C

Note: You can pull left and right

Among them, print means printing, $0 means an entire record, and test.txt means a file. so

awk '{print $0}' test.txt

Indicates that each line of records in the test.txt file is printed out.

We just said that records are composed of fields, and the default delimiter for fields is space or tab. Below we print the first field of each record, as follows:

# print the first field of each line
[root@iamshuaidi # awk '{print $1}' test.txt

$0 means the whole record, but $1, $2, $3..... means the first field in the whole record, the second field... .

Just now we said that the default delimiter of a field is a space or a tab. The default means that we can explicitly specify the delimiter ourselves. Let's use ":" as our delimiter.

# print the second field
[root@iamshuaidi ~]# awk -F ':' '{print $2}' test.txt

Above we used the parameter -F to specify our delimiter, that is, if you want to specify the delimiter of the field, you can use the parameter -F to specify the delimiter.

2. Conditional restrictions

When printing text, we can specify some conditions. The format is as follows:

awk Parameter Condition Action to execute File

For example, we specify that the delimiter is ":", and the condition is the record whose second field is "Java".

# Print the text with the second field as "Java"
[root@iamshuaidi ~]# awk -F ':''$2 == "Java" {print $2}' test.txt

Print the second field of odd lines:

# print records with odd lines
[root@iamshuaidi ~]# awk -F ':' 'NR % 2 == 1 {print $2}' test.txt

Among them, NR is a built-in variable that represents the record currently being processed, that is, the current record is the number of records.

3. Conditional Statements

Like our usual programming, awk also provides if, else, while and other conditional statements.

For example, print the second and following records:

root@iamshuaidi ~]# awk '{if(NR > 1) print $2}' test.txt

Note that the field separator above is a space, and the if statement is specified in "{}".

Let's look at another example:

# If the first field is greater than "s", print the first field, otherwise print the second field
[root@iamshuaidi ~]# awk '{if($1 < "s") print $1; else print $2}' test.txt

Note: You can pull left and right

The above prints: if the first field is greater than "s", print the first field, otherwise print the second field.

4. Function

awk provides some built-in functions for us to use. The commonly used functions are as follows:

tolower(): Characters are converted to lowercase.
toupper(): Convert characters to uppercase
length(): Returns the length of the string.
substr(): Return a substring.
sqrt(): square root.
rand(): random number.

For example, we want to convert the printed field to size

# Convert the first field to uppercase output
[root@iamshuaidi ~]# awk '{print toupper($1)}' test.txt

5. Variables

Just now we said that NR is a built-in variable that indicates which record is currently being processed. The commonly used built-in variables are as follows:

NR: Indicates which line is currently being processed
NF: Indicates how many fields the current row has
FILENAME: current file name
FS: Field separator, default is space and tab.
RS: Line separator, used to split each line, default is newline.
OFS: Separator for output fields, used to separate fields when printing, defaults to spaces.
ORS: Separator for output records, used to separate records when printing, defaults to newline.

For example, if we want to print the last field of each record, we can use the variable NF.

[root@iamshuaidi ~]# awk '{print $NF}' test.txt

By the way, the NR variable just now is also very useful, for example:

# Mark the current row, so it seems more comfortable?
[root@iamshuaidi ~]# awk '{print NR ". "  $0}' test.txt
1. my first language:Java
2. second languange:python
3. third language:C

This is basically the end of this article. This article is an introductory article, which shields many details. It briefly introduces how to use it. For more specific usage, you can find related functions according to the functions you want to implement.

Have a harvest? Click the bottom card to add a chicken leg as a reward?

Posted by hemantraijain on Thu, 05 May 2022 11:13:01 +0300