MVCC for PostgreSQL -- In-page vacuum and HOT updates

in-page vacuum and HOT update, both techniques can be called optimization. They are important but not actually included in the documentation.

Perform in-page vacuum during regular updates

When a page is accessed for a read or update, PostgreSQL can perform a quick in-page vacuum if it knows the page is out of space. Happens in the following two situations:

1. A previous update on this page could not find enough space to allocate a new row version in the same page. This will be recorded in the page header, and the page will be vacuum ed next time.

2. The page has exceeded the percentage set by fillfactor. In this case, the vacuum is performed immediately without delaying until the next time.

fillfactor is a storage parameter that can be defined for tables (and indexes). PostgresSQL will insert a new row in a page only if the page is less than the fillfactor percentage. The remaining space is reserved for new tuples produced by the update. The default value for a table is 100, i.e. no space is reserved (the default value for an index is 90).

in-page vacuum removes tuples that are not visible in any snapshot (those are outside the transaction scope of the database, which was discussed last), but does so strictly within a table page. Since the pointer to the vacuum ed tuple can be referenced from the index, the pointer is not freed, the index is in another page. in-page vacuum never exceeds one table page, but works very quickly.

For the same reason, the free space map is also not updated. This also reserves extra space for updates instead of inserts. The visibility map is also not updated.

The fact that pages can be vacuum ed during reads means that SELECT queries may change pages. In addition to the delayed change of hint bits discussed earlier, this is another case.

Let's see an example of how this works.

=> CREATE TABLE hot(id integer, s char(2000)) WITH (fillfactor = 75);
=> CREATE INDEX hot_id ON hot(id);
=> CREATE INDEX hot_s ON hot(s);

If the s column stores only Latin characters, each row version will take up 2004 bytes plus the 24 bytes of the header. We set the fillfactor storage parameter to 75%, which will reserve enough space for three rows.

To easily view the contents of the table page, let's recreate an already familiar function by adding two more fields to the output:

=> CREATE FUNCTION heap_page(relname text, pageno integer)
RETURNS TABLE(ctid tid, state text, xmin text, xmax text, hhu text, hot text, t_ctid tid)
AS $$
SELECT (pageno,lp)::text::tid AS ctid,
       CASE lp_flags
         WHEN 0 THEN 'unused'
         WHEN 1 THEN 'normal'
         WHEN 2 THEN 'redirect to '||lp_off
         WHEN 3 THEN 'dead'
       END AS state,
       t_xmin || CASE
         WHEN (t_infomask & 256) > 0 THEN ' (c)'
         WHEN (t_infomask & 512) > 0 THEN ' (a)'
         ELSE ''
       END AS xmin,
       t_xmax || CASE
         WHEN (t_infomask & 1024) > 0 THEN ' (c)'
         WHEN (t_infomask & 2048) > 0 THEN ' (a)'
         ELSE ''
       END AS xmax,
       CASE WHEN (t_infomask2 & 16384) > 0 THEN 't' END AS hhu,
       CASE WHEN (t_infomask2 & 32768) > 0 THEN 't' END AS hot,
       t_ctid
FROM heap_page_items(get_raw_page(relname,pageno))
ORDER BY lp;
$$ LANGUAGE SQL;

Let's also create a function to view the index page:

=> CREATE FUNCTION index_page(relname text, pageno integer)
RETURNS TABLE(itemoffset smallint, ctid tid)
AS $$
SELECT itemoffset,
       ctid
FROM bt_page_items(relname,pageno);
$$ LANGUAGE SQL;

Let's take a look at how in-page vacuum works. To do this, we insert a line and modify it a few times:

=> INSERT INTO hot VALUES (1, 'A');
=> UPDATE hot SET s = 'B';
=> UPDATE hot SET s = 'C';
=> UPDATE hot SET s = 'D';

There are now four tuples in the page:

=> SELECT * FROM heap_page('hot',0);
 ctid  | state  |   xmin   |   xmax   | hhu | hot | t_ctid 
-------+--------+----------+----------+-----+-----+--------
 (0,1) | normal | 3979 (c) | 3980 (c) |     |     | (0,2)
 (0,2) | normal | 3980 (c) | 3981 (c) |     |     | (0,3)
 (0,3) | normal | 3981 (c) | 3982     |     |     | (0,4)
 (0,4) | normal | 3982     | 0 (a)    |     |     | (0,4)
(4 rows)

As expected, we just passed the fillfactor threshold. This is clear from the difference between the pagesize value and the upper limit: it exceeds a threshold equal to 75% of the page size, which is 6144 bytes.

=> SELECT lower, upper, pagesize FROM page_header(get_raw_page('hot',0));
 lower | upper | pagesize 
-------+-------+----------
    40 |    64 |     8192
(1 row)

Therefore, the next time the page is accessed, in-page vacuum must occur. Let's check.

=> UPDATE hot SET s = 'E';
=> SELECT * FROM heap_page('hot',0);
 ctid  | state  |   xmin   | xmax  | hhu | hot | t_ctid 
-------+--------+----------+-------+-----+-----+--------
 (0,1) | dead   |          |       |     |     | 
 (0,2) | dead   |          |       |     |     | 
 (0,3) | dead   |          |       |     |     | 
 (0,4) | normal | 3982 (c) | 3983  |     |     | (0,5)
 (0,5) | normal | 3983     | 0 (a) |     |     | (0,5)
(5 rows)

Clear all dead tuples (0,1), (0,2) and (0,3); after that, add a new tuple (0,5) in the freed space.

The tuples that survived are physically moved to the upper addresses of the page, so all free space is represented by a contiguous area. The value of the pointer changes accordingly. So there is no problem with free space fragmentation in the page.

Since the pointers to the vacuum ed tuples are referenced from the index page, they cannot be freed. Let's look at the first page of the hot_s index (since the zeroth page is occupied by meta information):

=> SELECT * FROM index_page('hot_s',1);
 itemoffset | ctid  
------------+-------
          1 | (0,1)
          2 | (0,2)
          3 | (0,3)
          4 | (0,4)
          5 | (0,5)
(5 rows)

We see the same result in another index:

=> SELECT * FROM index_page('hot_id',1);
 itemoffset | ctid  
------------+-------
          1 | (0,5)
          2 | (0,4)
          3 | (0,3)
          4 | (0,2)
          5 | (0,1)
(5 rows)

You may notice that pointers to table rows follow here in reverse order, but that makes no difference since all tuples have the same value: id=1. But in the previous index, the pointers are ordered by the value of s, which is essential.

With indexed access, PostgreSQL can obtain (0,1), (0,2) or (0,3) as tuple identifiers. It will then try to get the appropriate row version from the table page, but due to the "dead" status of the pointer, PostgreSQL will see that version no longer exists and will ignore it. (Actually, once the version of the table row is found to be unavailable, PostgreSQL will change the pointer state in the index page so that the table page is no longer accessed)

Importantly, in-page vacuum can only work within a table page, not index pages.

HOT update

Why is there no benefit to storing references to all row versions in the index?

First, for any change to the row, all indexes created for the table must be updated: after a new row version is created, it needs to be referenced. And we still need to do this even if we change an unindexed field. This is obviously not very efficient.

Second, the index accumulates references to historical tuples, which are then cleaned up along with the tuples themselves (we'll discuss how this is done later).

Also, B-trees in PostgreSQL have implementation details. If the index page does not have enough space to insert a new row, the page will be split in two and all data will be distributed between them. This is called page splitting. However, when a row is deleted, the two index pages are not merged into one. Therefore, the index size may not be reduced even if a considerable portion of the data is removed.

Naturally, the more indexes you create on a table, the more complexity you encounter.

However, if you change a value in a column that is not indexed at all, it doesn't make sense to create an extra B-tree row containing the same key value. This is how the so-called HOT update (heap-tuple-only update) optimization works.

During this update, the index page contains only one row that refers to the first version of the row in the table page. Inside the table page, a chain of tuples has been organized:

· The updated row in the chain is marked with the heap hot updated bit. Rows not referenced by an index are marked with the heap only tupe bit. · Typically, row versions are linked via the ctid field.

If, during an index scan, PostgreSQL accesses a table page and finds a tuple marked Heap Hot Updated, pg knows that it shouldn't stop, but must follow the HOT chain, considering each tuple in it. Of course, for all tuples obtained this way, visibility is checked before returning it to the client.

To see how HOT updates work, let's drop an index and clear the table.

=> DROP INDEX hot_s;
=> TRUNCATE TABLE hot;

Now we redo the insert and update of a row.

=> INSERT INTO hot VALUES (1, 'A');
=> UPDATE hot SET s = 'B';

The content of the table page is as follows:

=> SELECT * FROM heap_page('hot',0);
 ctid  | state  |   xmin   | xmax  | hhu | hot | t_ctid 
-------+--------+----------+-------+-----+-----+--------
 (0,1) | normal | 3986 (c) | 3987  | t   |     | (0,2)
 (0,2) | normal | 3987     | 0 (a) |     | t   | (0,2)
(2 rows)

There are a series of changes in the page:

· The Heap Hot Updated flag indicates that the ctid chain must be followed. · The Heap Only Tuple flag indicates that the new line is not referenced by the index.

The chain will grow as the page changes (within the page):

=> UPDATE hot SET s = 'C';
=> UPDATE hot SET s = 'D';
=> SELECT * FROM heap_page('hot',0);
 ctid  | state  |   xmin   |   xmax   | hhu | hot | t_ctid 
-------+--------+----------+----------+-----+-----+--------
 (0,1) | normal | 3986 (c) | 3987 (c) | t   |     | (0,2)
 (0,2) | normal | 3987 (c) | 3988 (c) | t   | t   | (0,3)
 (0,3) | normal | 3988 (c) | 3989     | t   | t   | (0,4)
 (0,4) | normal | 3989     | 0 (a)    |     | t   | (0,4)
(4 rows)

But there is only one reference to the chain head in the index:

=> SELECT * FROM index_page('hot_id',1);
 itemoffset | ctid  
------------+-------
          1 | (0,1)
(1 row)

It is important to emphasize that HOT updates work when the field to be updated is not indexed. Otherwise, some indexes will directly contain references to the new row version, which is incompatible with the concept of this optimization.

The optimization is only done within one page, so additional traversal in the chain does not require visiting other pages and does not affect performance.

Perform in-page vacuum during HOT update

A vacuum during a HOT update is a special but important case of in-page vacuum.

As above, we have exceeded the fillfactor threshold, so the next update must cause in-page vacuum. But this time there is a series of updates on the page. Since the index already refers to the head of that HOT chain, it must always remain in its place, while the rest of the pointers can be freed: they are known to have no external references.

In order not to change the head pointer, indirection is used: in this example, the index refers to (0,1), and the pointer state is redirected to the appropriate tuple.

=> UPDATE hot SET s = 'E';
=> SELECT * FROM heap_page('hot',0);
 ctid  |     state     |   xmin   | xmax  | hhu | hot | t_ctid 
-------+---------------+----------+-------+-----+-----+--------
 (0,1) | redirect to 4 |          |       |     |     | 
 (0,2) | normal        | 3990     | 0 (a) |     | t   | (0,2)
 (0,3) | unused        |          |       |     |     | 
 (0,4) | normal        | 3989 (c) | 3990  | t   | t   | (0,2)
(4 rows)

Notice:

· Clear the tuples (0,1), (0,2) and (0,3). · The head pointer (0,1) remains, but it gets the "redirected" state. · Since there are definitely no references to the tuple, the new row version has overwritten (0,2) and the pointer is freed ("unused" state).

Let's make a few more updates:

=> UPDATE hot SET s = 'F';
=> UPDATE hot SET s = 'G';
=> SELECT * FROM heap_page('hot',0);
 ctid  |     state     |   xmin   |   xmax   | hhu | hot | t_ctid 
-------+---------------+----------+----------+-----+-----+--------
 (0,1) | redirect to 4 |          |          |     |     | 
 (0,2) | normal        | 3990 (c) | 3991 (c) | t   | t   | (0,3)
 (0,3) | normal        | 3991 (c) | 3992     | t   | t   | (0,5)
 (0,4) | normal        | 3989 (c) | 3990 (c) | t   | t   | (0,2)
 (0,5) | normal        | 3992     | 0 (a)    |     | t   | (0,5)
(5 rows)

The next update causes the in-page vacuum to be performed again:

=> UPDATE hot SET s = 'H';
=> SELECT * FROM heap_page('hot',0);
 ctid  |     state     |   xmin   | xmax  | hhu | hot | t_ctid 
-------+---------------+----------+-------+-----+-----+--------
 (0,1) | redirect to 5 |          |       |     |     | 
 (0,2) | normal        | 3993     | 0 (a) |     | t   | (0,2)
 (0,3) | unused        |          |       |     |     | 
 (0,4) | unused        |          |       |     |     | 
 (0,5) | normal        | 3992 (c) | 3993  | t   | t   | (0,2)
(5 rows)

Also, some tuples are cleared and the pointer to the head of the chain is moved accordingly.

Conclusion: If columns that are not indexed are frequently updated, it may make sense to reduce the fillfactor parameter in order to reserve some page space for the update. However, we should take into account that the less the fill factor, the more space is left in the page and therefore the physical size of the table increases.

HOT chain breaks

If the page lacks free space to allocate a new tuple, the chain will break. Also, we will create separate references for rows located in different pages.

To reproduce this situation, let's start a concurrent transaction and build a snapshot of the data in it.

|  => BEGIN ISOLATION LEVEL REPEATABLE READ;
|  => SELECT count(*) FROM hot;
|   count 
|  -------
|       1
|  (1 row)

Snapshots do not allow tuples to be flushed from the page. Now let's make an update in the first session:

=> UPDATE hot SET s = 'I';
=> UPDATE hot SET s = 'J';
=> UPDATE hot SET s = 'K';
=> SELECT * FROM heap_page('hot',0);
 ctid  |     state     |   xmin   |   xmax   | hhu | hot | t_ctid 
-------+---------------+----------+----------+-----+-----+--------
 (0,1) | redirect to 2 |          |          |     |     | 
 (0,2) | normal        | 3993 (c) | 3994 (c) | t   | t   | (0,3)
 (0,3) | normal        | 3994 (c) | 3995 (c) | t   | t   | (0,4)
 (0,4) | normal        | 3995 (c) | 3996     | t   | t   | (0,5)
 (0,5) | normal        | 3996     | 0 (a)    |     | t   | (0,5)
(5 rows)

At the next update, the page won't have enough space, but in-page vacuum won't be able to clear anything:

=> UPDATE hot SET s = 'L';

|  => COMMIT; -- snapshot no longer needed

=> SELECT * FROM heap_page('hot',0);
 ctid  |     state     |   xmin   |   xmax   | hhu | hot | t_ctid 
-------+---------------+----------+----------+-----+-----+--------
 (0,1) | redirect to 2 |          |          |     |     | 
 (0,2) | normal        | 3993 (c) | 3994 (c) | t   | t   | (0,3)
 (0,3) | normal        | 3994 (c) | 3995 (c) | t   | t   | (0,4)
 (0,4) | normal        | 3995 (c) | 3996 (c) | t   | t   | (0,5)
 (0,5) | normal        | 3996 (c) | 3997     |     | t   | (1,1)
(5 rows)

In the tuple (0,5) there is a reference to (1,1) in page 1

=> SELECT * FROM heap_page('hot',1);
 ctid  | state  | xmin | xmax  | hhu | hot | t_ctid 
-------+--------+------+-------+-----+-----+--------
 (1,1) | normal | 3997 | 0 (a) |     |     | (1,1)
(1 row)

There are now two rows in the index, each pointing to the start of its HOT chain:

=> SELECT * FROM index_page('hot_id',1);
 itemoffset | ctid  
------------+-------
          1 | (1,1)
          2 | (0,1)
(2 rows)

  

 

Original address:

https://habr.com/en/company/postgrespro/blog/483768/

Tags: postgres pg

Posted by halfman on Thu, 19 May 2022 14:41:11 +0300