Analysis of reasons for the rise of swap under Linux

Machine configuration: 2 CPU, 8GB memory

You need to pre install tools such as sysstat, such as yum install sysstat

Run the free command in the terminal to check the usage of Swap.

$ free
             total        used        free      shared  buff/cache   available
Mem:        8169348      331668     6715972         696     1121708     7522896
Swap:             0           0           0

From the free output, you can see that the size of Swap is 0, which indicates that my machine is not configured with Swap. To continue the Swap case, you need to configure and enable Swap first. If you have enabled Swap in your environment, you can skip the following steps and continue to move forward. To enable Swap, we must first understand that Linux itself supports two types of Swap, namely Swap partition and Swap file. Take the Swap file as an example. Run the following command in the first terminal to start Swap. The size of the configured Swap file here is 8GB:

# Create Swap file
$ fallocate -l 8G /mnt/swapfile
# Modify permissions can only be accessed by the root user
$ chmod 600 /mnt/swapfile
# Configure Swap file
$ mkswap /mnt/swapfile
# Turn on Swap
$ swapon /mnt/swapfile

Then execute the free command to confirm that the Swap configuration is successful:

$ free
             total        used        free      shared  buff/cache   available
Mem:        8169348      331668     6715972         696     1121708     7522896
Swap:       8388604           0     8388604

Now, in the free} output, the Swap space and the remaining space have changed from 0 to 8GB, indicating that Swap has been turned on normally. Next, in the first terminal, run the following dd command to simulate the reading of large files:

# When writing to an empty device, there are actually only disk read requests
$ dd if=/dev/sda1 of=/dev/null bs=1G count=2048

Then, run the sar command in the second terminal to check the changes of each index of memory.

# Output a set of data at an interval of 1 second
# -r shows memory usage, - S shows Swap usage
$ sar -r -S 1
04:39:56    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
04:39:57      6249676   6839824   1919632     23.50    740512     67316   1691736     10.22    815156    841868         4

04:39:56    kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
04:39:57      8388604         0      0.00         0      0.00

04:39:57    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
04:39:58      6184472   6807064   1984836     24.30    772768     67380   1691736     10.22    847932    874224        20

04:39:57    kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
04:39:58      8388604         0      0.00         0      0.00


04:44:06    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
04:44:07       152780   6525716   8016528     98.13   6530440     51316   1691736     10.22    867124   6869332         0

04:44:06    kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
04:44:07      8384508      4096      0.05        52      1.27

The output of sar is two tables. The first table represents the memory usage and the second table represents the Swap usage. The KB prefix in front of each indicator name indicates that the unit of these indicators is KB.

We have seen most of the indicators. Let's briefly introduce the remaining new indicators.

kbcommit, indicating the memory required by the current system load. It is actually an estimate of the required memory to ensure that the system memory does not overflow commit is the percentage of this value relative to the total memory.

kbactive indicates the active memory, that is, the recently used memory, which is generally not recycled by the system.

kbinact means that inactive memory, that is, memory that is not often accessed, may be recycled by the system.

After the meaning of the interface index is clear, the relevant phenomena are analyzed in combination with the specific values. It can be clearly seen that the total memory utilization (% memused) is increasing from 23% to 98%, and the main memory is occupied by kbbuffers. Specifically:

At the beginning, the remaining memory (kbmemfree) is decreasing, while the buffers (kbbuffers) are increasing. Therefore, it can be seen that the remaining memory is continuously allocated to the buffers.

After some time, the remaining memory is very small, and the buffer takes up most of the memory. At this time, the use of Swap begins to increase gradually, and the buffer and remaining memory fluctuate only in a small range.

We also have to look at the process cache. cachetop can just meet this.

In the second terminal, press Ctrl+C to stop the sar command, and then run the following cachetop command to observe the usage of the cache:

$ cachetop 5
12:28:28 Buffers MB: 6349 / Cached MB: 87 / Sort: HITS / Order: ascending
PID      UID      CMD              HITS     MISSES   DIRTIES  READ_HIT%  WRITE_HIT%
   18280 root     python                 22        0        0     100.0%       0.0%
   18279 root     dd                  41088    41022        0      50.0%      50.0%

Through the output of cachetop, the hit rate of read and write requests of dd process is only 50%, and the number of missed cache pages (MISSES) is 41022 (in pages). This shows that it is the dd running at the beginning of the case that leads to the increase of buffer usage.

In this case, you have to further observe the remaining memory, memory threshold and the activity of anonymous pages and file pages through / proc/zoneinfo.

In the second terminal, press Ctrl+C to stop the cachetop command. Then run the following command to observe the changes of these indicators in / proc/zoneinfo:

# -d indicates the highlighted field
# -A indicates that only the Normal line and the following 15 lines of output are displayed
$ watch -d grep -A 15 'Normal' /proc/zoneinfo
Node 0, zone   Normal
  pages free     21328
        min      14896
        low      18620
        high     22344
        spanned  1835008
        present  1835008
        managed  1796710
        protection: (0, 0, 0, 0, 0)
      nr_free_pages 21328
      nr_zone_inactive_anon 79776
      nr_zone_active_anon 206854
      nr_zone_inactive_file 918561
      nr_zone_active_file 496695
      nr_zone_unevictable 2251
      nr_zone_write_pending 0

It can be found that the remaining memory (pages_free) fluctuates continuously in a small range. When it is less than the pages_low threshold, it will suddenly increase to a value greater than the pages_high threshold. "

Combined with the changes of the remaining memory and buffer just seen with sar, we can deduce that the fluctuation of the remaining memory and buffer is due to the cycle of memory recycling and cache reallocation. When the remaining memory is less than the page low threshold, the system will reclaim some cache and anonymous memory to increase the remaining memory. Among them, cache recycling leads to the reduction of buffer in sar, while anonymous memory recycling leads to the increase of Swap usage. Then, as dd continues, the remaining memory will be reallocated to the cache, resulting in the decrease of the remaining memory and the increase of the buffer.

In fact, there is another interesting phenomenon. If you run dd and sar multiple times, you may find that in multiple cycles, sometimes Swap is used more, sometimes Swap is less, but the fluctuation of the buffer is greater. In other words, when the system reclaims memory, it sometimes reclaims more file pages, and sometimes reclaims more anonymous pages. Obviously, the tendency of the system to reclaim different types of memory seems less obvious. You should think of the swappiness mentioned in the last lesson, which is the configuration option to adjust different types of memory recycling.

In the second terminal, press Ctrl+C to stop the watch command, and then run the following command to view the configuration of switching:


$ cat /proc/sys/vm/swappiness

swappiness displays the default value of 60, which is a relatively neutral configuration, so the system will select the appropriate recycling type according to the actual operation, such as recycling inactive anonymous pages or inactive file pages

Here, we have found out the root cause of Swap

The proc file system is also recommended to check the virtual memory size swapped out by the process Swap. It is saved in VmSwap in / proc/pid/status (it is recommended that you execute man proc to query the meaning of other fields).

Run the following command in the second terminal to view the processes that use the most Swap.

# Sort the processes by VmSwap usage, and output the process name, process ID and SWAP usage
$ for file in /proc/*/status ; do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r | head
dockerd 2226 10728 kB
docker-containe 2251 8516 kB
snapd 936 4020 kB
networkd-dispat 911 836 kB
polkitd 1004 44 kB

As you can see from here, the dockerd and docker container processes are mostly used in Swap. Therefore, when dockerd accesses the memory swapped out to the disk again, it will be relatively slow. This also shows that although the cache belongs to recyclable memory, in scenarios such as large file copy, the system will still use the Swap mechanism to reclaim anonymous memory, not just file pages that occupy most of the memory. Finally, if you configured Swap at the beginning, don't forget to close it at the end of the case. You can close Swap by running the following command:

$ swapoff -a

In fact, closing Swap and reopening it is also a common method to clean up Swap space, such as:

$ swapoff -a && swapon -a 


Posted by integravtec on Wed, 18 May 2022 16:30:28 +0300