Index in PostgreSQL -- Btree

We have discussed the interface of index engine and access method of PostgreSQL, as well as hash index. Now we will consider the b-tree, the most traditional and widely used index. This article is very long, please wait patiently.

Btree structure

B-tree index type, implemented by « btree » access method, is suitable for sortable data. In other words, « greater », « greater or equal », « less », « less or equal » and « equal » operators must be defined for the data type. Note that the same data may sometimes be sorted differently, which goes back to the concept of the operator family.

The index rows of the b-tree are packaged into pages. In the leaf page, these rows contain the data to be indexed (keys) and references to table rows (TIDs). In the internal page, each row refers to a sub page of the index and contains the minimum value in that page.

B-tree has some important characteristics:

·The B-tree is balanced, that is, each leaf page and root page are separated by the same number of internal pages. Therefore, searching for any value takes the same time.

·B-trees are multi branched, that is, each page (usually 8KB) contains many (hundreds) TIDs. Therefore, the depth of B-tree is very small. For very large tables, it can actually reach 4-5.

·The data in the index is sorted in non descending order (between pages and within each page), and peer pages are connected to each other through a two-way list. Therefore, we can get an ordered dataset by traversing the list in one direction or the other, instead of returning to the root every time.

The following is a simplified example of indexing on a field with an integer key.

 

The first page of the index is the metadata page, which refers to the index root. The internal node is located below the root and the leaf page is located on the bottom row. The downward arrow indicates the leaf node's reference to the table row (tid).

Equivalent retrieval

Let's consider searching the tree for a value based on the condition "indexed field = expression". For example, we are interested in key 49.

The search starts from the root node. We need to determine which child node to search down. By solving the keys in the root node (4, 32, 64), we can calculate the value range in the child node. Because 32 ≤ 49 < 64, we need to descend to the second child node. Next, repeat the same process recursively until we reach a leaf node from which we can obtain the required TIDs.

In fact, some special circumstances complicate this seemingly simple process. For example, an index can contain non unique keys and may have many equal values that cannot fit on one page. Returning to our example, it seems that we should extend down from the reference of the internal node to the value 49. However, it is clear from the figure that we will skip a « 49 » key in the previous page. Therefore, once we find an exactly equal key in an internal page, we must drop one position to the left, and then look at the underlying index row from left to right to search for the key we are looking for.

(another complex problem is that other processes can change the data during the search process: they can rebuild the tree, may split the page in two, etc. all algorithms are designed for these concurrent operations, and will not interfere with each other or cause additional locks whenever possible. However, we will avoid detailing this.)

Unequal retrieval

When searching according to the condition "indexed field ≤ expression" (or "indexed field ≥ expression"), first find a value (if any) in the index according to the equal condition "indexed field = expression", and then traverse the page in the appropriate direction until the end.

The process when n ≤ 35 is shown in the figure:

The « greater » and « less » operators are supported in a similar manner, except that the initially found value must be eliminated.

Range retrieval

When searching according to the range of "expression1 ≤ indexed field ≤ expression2", find a value according to the condition "indexed field = expression1". When the condition "indexed field ≤ expression2" is met, continue to traverse the page; Vice versa: start with the second expression and go in the opposite direction until we reach the first expression.

The process under condition 23 ≤ n ≤ 64 is shown in the figure:

Examples

Let's look at an example of a query plan. As usual, we will use the demo database, and this time we will consider the aircraft table. It contains only 9 rows, and the planner will choose not to use the index because the entire table is in only one page.

demo=# select * from aircrafts;
 aircraft_code |        model        | range 
---------------+---------------------+-------
 773           | Boeing 777-300      | 11100
 763           | Boeing 767-300      |  7900
 SU9           | Sukhoi SuperJet-100 |  3000
 320           | Airbus A320-200     |  5700
 321           | Airbus A321-200     |  5600
 319           | Airbus A319-100     |  6700
 733           | Boeing 737-300      |  4200
 CN1           | Cessna 208 Caravan  |  1200
 CR2           | Bombardier CRJ-200  |  2700
(9 rows)
demo=# create index on aircrafts(range);

demo=# set enable_seqscan = off;

btree index is the default index for index creation.

Use equivalent search:

demo=# explain(costs off) select * from aircrafts where range = 3000;
                    QUERY PLAN                     
---------------------------------------------------
 Index Scan using aircrafts_range_idx on aircrafts
   Index Cond: (range = 3000)
(2 rows)

Unequal retrieval:

demo=# explain(costs off) select * from aircrafts where range < 3000;
                    QUERY PLAN                    
---------------------------------------------------
 Index Scan using aircrafts_range_idx on aircrafts
   Index Cond: (range < 3000) 
(2 rows)

Query by range:

demo=# explain(costs off) select * from aircrafts
where range between 3000 and 5000;
                     QUERY PLAN                      
-----------------------------------------------------
 Index Scan using aircrafts_range_idx on aircrafts
   Index Cond: ((range >= 3000) AND (range <= 5000))
(2 rows)

sort

Let's emphasize again that for any type of scan (index, index only or bitmap), the « btree » access method returns ordered data, which we can clearly see in the figure above.

Therefore, if a table has an index under sorting conditions, the optimizer will consider two options at the same time: index scanning of the table (it can return the sorted data at any time) and sequential scanning of the table (then sort the results).

Sort order

When creating an index, we can explicitly specify the sort order. For example, we can create an index based on the flight range in the following ways:

demo=# create index on aircrafts(range desc);

In this case, the larger value will appear in the tree on the left and the smaller value will appear on the right. If we can traverse index values in any direction, why do we need to do so?

Its purpose is to establish a multi column index. Let's create a view to show the aircraft model and the traditional division into short, medium, and long-range aircraft:

demo=# create view aircrafts_v as
select model,
       case
           when range < 4000 then 1
           when range < 10000 then 2
           else 3
       end as class
from aircrafts;

demo=# select * from aircrafts_v;
        model        | class
---------------------+-------
 Boeing 777-300      |     3
 Boeing 767-300      |     2
 Sukhoi SuperJet-100 |     1
 Airbus A320-200     |     2
 Airbus A321-200     |     2
 Airbus A319-100     |     2
 Boeing 737-300      |     2
 Cessna 208 Caravan  |     1
 Bombardier CRJ-200  |     1
(9 rows)

Let's create an index (using an expression):

demo=# create index on aircrafts(
  (case when range < 4000 then 1 when range < 10000 then 2 else 3 end),
  model);

Now we can use this index to get the data in ascending order of two columns:

demo=# select class, model from aircrafts_v order by class, model;
 class |        model        
-------+---------------------
     1 | Bombardier CRJ-200
     1 | Cessna 208 Caravan
     1 | Sukhoi SuperJet-100
     2 | Airbus A319-100
     2 | Airbus A320-200
     2 | Airbus A321-200
     2 | Boeing 737-300
     2 | Boeing 767-300
     3 | Boeing 777-300
(9 rows)
demo=# explain(costs off)
select class, model from aircrafts_v order by class, model;
                       QUERY PLAN                       
--------------------------------------------------------
 Index Scan using aircrafts_case_model_idx on aircrafts
(1 row)

Similarly, we can execute a query to sort the data in descending order:

demo=# select class, model from aircrafts_v order by class desc, model desc;
 class |        model        
-------+---------------------
     3 | Boeing 777-300
     2 | Boeing 767-300
     2 | Boeing 737-300
     2 | Airbus A321-200
     2 | Airbus A320-200
     2 | Airbus A319-100
     1 | Sukhoi SuperJet-100
     1 | Cessna 208 Caravan
     1 | Bombardier CRJ-200
(9 rows)
demo=# explain(costs off)
select class, model from aircrafts_v order by class desc, model desc;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Index Scan BACKWARD using aircrafts_case_model_idx on aircrafts
(1 row)

However, we cannot use this index to obtain data sorted in descending order of one column and ascending order of another column. This will need to be sorted separately:

demo=# explain(costs off)
select class, model from aircrafts_v order by class ASC, model DESC;
                   QUERY PLAN                    
-------------------------------------------------
 Sort
   Sort Key: (CASE ... END), aircrafts.model DESC
   ->  Seq Scan on aircrafts
(3 rows)

Note that as a last resort, the planner selects sequential scanning regardless of the « enable setting previously_ seqscan = off». This is because in fact, this setting does not prohibit table scanning, but only sets its cost setting to a large extent - please check the plan with « costs on ».

In order for this query to use the index, the latter must establish the required sorting direction:

demo=# create index aircrafts_case_asc_model_desc_idx on aircrafts(
 (case
    when range < 4000 then 1
    when range < 10000 then 2
    else 3
  end) ASC,
  model DESC);

demo=# explain(costs off)
select class, model from aircrafts_v order by class ASC, model DESC;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Index Scan using aircrafts_case_asc_model_desc_idx on aircrafts
(1 row)

Order of columns

Another problem with using a multi column index is the order in which columns are listed in the index. For B-tree, this order is very important: the data in the page will be sorted by the first field, then by the second field, and so on.

We can express the index established on the range interval and model in a symbolic way:

In fact, such a small index must be in a root page. In the figure, it is specially distributed in several pages for clarity.

It is clear from this chart that searching through predicates such as « class = 3 » (searching through the first field only) or « class = 3 and model = 'Boeing 777-300' » (searching through two fields) will be very effective.

However, searching based on the predicate « model = 'Boeing 777-300' » is much less efficient: starting from the root node, we cannot determine which child node to drill down to, so we will have to drill down to all child nodes. That doesn't mean an index like this can never be used -- its efficiency is a problem. For example, if we have three levels of aircraft with many models at each level, we will have to browse about one third of the index, which may be more efficient... Or inefficient than full table scanning.

However, if we create such an index:

demo=# create index on aircrafts(
  model,
  (case when range < 4000 then 1 when range < 10000 then 2 else 3 end));

The order of the fields will change:

With this index, a search based on the predicate « model = 'Boeing 777-300' » will be valid, but a search based on the predicate « class = 3 » will not be valid.

NULL value

The btree access method indexes null values and supports searching by conditions of is null and is not null.

Let's consider the flight table, where null occurs:

demo=# create index on flights(actual_arrival);

demo=# explain(costs off) select * from flights where actual_arrival is null;
                      QUERY PLAN                       
-------------------------------------------------------
 Bitmap Heap Scan on flights
   Recheck Cond: (actual_arrival IS NULL)
   ->  Bitmap Index Scan on flights_actual_arrival_idx
         Index Cond: (actual_arrival IS NULL)
(4 rows)

The null value is located at one end or the other end of the leaf node, depending on how the index is created (null first, or null last). This is important if the query contains sorting: indexes can be used if the SELECT command specifies null values in its order BY clause in the same order as specified for building the index (empty first or empty later).

In the following example, these orders are the same, so we can use the index:

demo=# explain(costs off)
select * from flights order by actual_arrival NULLS LAST;
                       QUERY PLAN                      
--------------------------------------------------------
 Index Scan using flights_actual_arrival_idx on flights
(1 row)

Here, these orders are different. The optimizer selects sequential scanning and subsequent sorting:

demo=# explain(costs off)
select * from flights order by actual_arrival NULLS FIRST;
               QUERY PLAN              
----------------------------------------
 Sort
   Sort Key: actual_arrival NULLS FIRST
   ->  Seq Scan on flights
(3 rows)

To use an index, it must set a null value at the beginning:

demo=# create index flights_nulls_first_idx on flights(actual_arrival NULLS FIRST);

demo=# explain(costs off)
select * from flights order by actual_arrival NULLS FIRST;
                     QUERY PLAN                      
-----------------------------------------------------
 Index Scan using flights_nulls_first_idx on flights
(1 row)

This problem must be caused by the inability of nulls to sort, that is, the comparison result between NULL and other values is undefined:

demo=# \pset null NULL

demo=# select null < 42;
 ?column?
----------
 NULL
(1 row)

This runs counter to the concept of b-tree and is not suitable for general patterns. However, null s play such an important role in databases that we always have to set exceptions for them.

Because null s can be indexed, indexes can be used even if no conditions are imposed on the table (because indexes must contain information on all rows in the table). This makes sense if the query requires data sorting and the index ensures the required order. In this case, the planner can choose index access to save separate sorting.

attribute

Let's look at the properties of the « btree » access method (query has been provided).

 

postgres=# select a.amname, p.name, pg_indexam_has_property(a.oid,p.name)
from pg_am a,
     unnest(array['can_order','can_unique','can_multi_col','can_exclude']) p(name)
where a.amname = 'btree'
order by a.amname;
 amname |     name      | pg_indexam_has_property
--------+---------------+-------------------------
 btree  | can_order     | t
 btree  | can_unique    | t
 btree  | can_multi_col | t
 btree  | can_exclude   | t
(4 rows)

As we can see, B-tree can sort data and support uniqueness -- the only way to provide us with access to these properties. Multi column indexes are also allowed, but other access methods (although not all methods) may also support such indexes. We will discuss support for exclusion constraints next time.

postgres=# select p.name, pg_index_has_property('t_a_idx'::regclass,p.name)
from unnest(array[
       'clusterable','index_scan','bitmap_scan','backward_scan'
     ]) p(name);
     name      | pg_index_has_property
---------------+-----------------------
 clusterable   | t
 index_scan    | t
 bitmap_scan   | t
 backward_scan | t
(4 rows)

The « btree » access method supports two techniques for obtaining values: index scanning and bitmap scanning. As we can see, access methods can be « forward » and « backward » during tree traversal

postgres=# select p.name,
     pg_index_column_has_property('t_a_idx'::regclass,1,p.name)
from unnest(array[
       'asc','desc','nulls_first','nulls_last','orderable','distance_orderable',
       'returnable','search_array','search_nulls'
     ]) p(name);
        name        | pg_index_column_has_property
--------------------+------------------------------
 asc                | t
 desc               | f
 nulls_first        | f
 nulls_last         | t
 orderable          | t
 distance_orderable | f
 returnable         | t
 search_array       | t
 search_nulls       | t
(9 rows)

The first four attributes of this layer explain how the values of a particular column are sorted precisely. In this example, the values are sorted in ascending order (« asc ») and null values (« nulls_last ») are provided at the end. As we have seen, other combinations are possible.

 

«search_ The array » attribute indicates that such expressions are supported by index:

demo=# explain(costs off)
select * from aircrafts where aircraft_code in ('733','763','773');
                           QUERY PLAN                            
-----------------------------------------------------------------
 Index Scan using aircrafts_pkey on aircrafts
   Index Cond: (aircraft_code = ANY ('{733,763,773}'::bpchar[]))
(2 rows)

The « returnable » attribute indicates that index only scanning is supported, which is reasonable because the index row itself stores the index value (for example, different from hash index). It is necessary to talk about index coverage based on b-tree.

Unique indexes with additional rows

As we discussed earlier, an overlay index is an index that stores all the values required by a query and does not require (almost) access to the table itself.

However, let's assume that we want to add the additional columns required for the query for the unique index. However, the uniqueness of this combined value does not guarantee the uniqueness of the key, so two indexes on the same column will be required: one unique to support integrity constraints and the other unique to override. This must be inefficient.

In our company, Anastasiya Lubennikova lubennikovaav improved the « btree » method so that additional, non unique columns can be included in a unique index. We hope this patch will be adopted by the community and become a part of PostgreSQL, but it will not appear in version 10. At this point, the patch is available in the professional standard 9.5 +, which looks like this.

In fact, this patch was submitted to PostgreSQL 11.

Let's consider the reservation form:

demo=# \d bookings
              Table "bookings.bookings"
    Column    |           Type           | Modifiers
--------------+--------------------------+-----------
 book_ref     | character(6)             | not null
 book_date    | timestamp with time zone | not null
 total_amount | numeric(10,2)            | not null
Indexes:
    "bookings_pkey" PRIMARY KEY, btree (book_ref)
Referenced by:
    TABLE "tickets" CONSTRAINT "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref)

In this table, the primary key (book_ref, booking code) is provided by a regular « btree » index. Let's create a new unique index with an additional column:

demo=# create unique index bookings_pkey2 on bookings(book_ref) INCLUDE (book_date);

Now let's replace the existing index with a new index (in the transaction, apply all changes at the same time):

demo=# begin;

demo=# alter table bookings drop constraint bookings_pkey cascade;

demo=# alter table bookings add primary key using index bookings_pkey2;

demo=# alter table tickets add foreign key (book_ref) references bookings (book_ref);

demo=# commit;

This is what we get:

demo=# \d bookings
              Table "bookings.bookings"
    Column    |           Type           | Modifiers
--------------+--------------------------+-----------
 book_ref     | character(6)             | not null
 book_date    | timestamp with time zone | not null
 total_amount | numeric(10,2)            | not null
Indexes:
    "bookings_pkey2" PRIMARY KEY, btree (book_ref) INCLUDE (book_date)
Referenced by:
    TABLE "tickets" CONSTRAINT "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref)

Now an index is used as the uniqueness constraint and as the overlay index of the query, for example:

demo=# explain(costs off)
select book_ref, book_date from bookings where book_ref = '059FC4';
                    QUERY PLAN                    
--------------------------------------------------
 Index Only Scan using bookings_pkey2 on bookings
   Index Cond: (book_ref = '059FC4'::bpchar)
(2 rows)

Index creation

As we all know, but equally important, for a large table, it is best to load data without an index, and then create the required index. This is not only faster, but also the space size of the index is likely to be smaller.

The problem is that creating the « btree » index uses a more efficient process than inserting values into the tree by row. Roughly speaking, all available data in the table is sorted and leaves of these data are created. The inner page is then "built on" until the entire pyramid converges to the root.

The speed of this process depends on the size of available RAM, which is controlled by « maintenance_ work_ Limit of MEM » parameter. Therefore, increasing the parameter value can speed up the processing speed. For unique indexes, except « maintenance_ work_ In addition to MEM », the size « work » should also be allocated_ MEM » memory.

Comparative semantics

As we mentioned last time, PostgreSQL needs to know which hash function to call for different types of values, and this association is stored in the « hash » access method. Similarly, the system must figure out how to sort the values. This is required in operations such as sorting, grouping (sometimes), merging, and joining. PostgreSQL will not bind itself to operator names (such as >, <, =), because users can define their own data types and provide different names for the corresponding operators. The operator name is defined by the operator family used by the « btree » access method.

For example, these comparison operators are used for « bool_ops » operator family:

postgres=# select   amop.amopopr::regoperator as opfamily_operator,
         amop.amopstrategy
from     pg_am am,
         pg_opfamily opf,
         pg_amop amop
where    opf.opfmethod = am.oid
and      amop.amopfamily = opf.oid
and      am.amname = 'btree'
and      opf.opfname = 'bool_ops'
order by amopstrategy;
  opfamily_operator  | amopstrategy
---------------------+-------------- 
 <(boolean,boolean)  |            1
 <=(boolean,boolean) |            2
 =(boolean,boolean)  |            3
 >=(boolean,boolean) |            4
 >(boolean,boolean)  |            5
(5 rows) 

Here we can see five comparison operators, but as mentioned earlier, we should not rely on their names. In order to find out what comparisons each operator makes, the concept of strategy is introduced. Five strategies are defined to describe the meaning of operators:

·1--less

·2--less or equal

·3--equal

·4--greater or equal

·5--greater

Some operator families can contain multiple operators that implement a policy. For example, « integer_ The OPS » operator family contains the following operators for policy 1:

postgres=# select   amop.amopopr::regoperator as opfamily_operator
from     pg_am am,
         pg_opfamily opf,
         pg_amop amop
where    opf.opfmethod = am.oid
and      amop.amopfamily = opf.oid
and      am.amname = 'btree'
and      opf.opfname = 'integer_ops'
and      amop.amopstrategy = 1
order by opfamily_operator;
  opfamily_operator  
---------------------- 
 <(integer,bigint)
 <(smallint,smallint)
 <(integer,integer)
 <(bigint,bigint)
 <(bigint,integer)
 <(smallint,integer)
 <(integer,smallint)
 <(smallint,bigint)
 <(bigint,smallint)
(9 rows) 

Because of this, the optimizer can avoid type casting when comparing different types of values contained in an operator family.

Indexes that support new data types

Documentation( https://postgrespro.com/docs/postgrespro/9.6/xindex )Provides examples of creating new data types for complex numbers, as well as examples of operator classes that sort such values. This example uses C language, which is absolutely reasonable when speed is very critical. However, in order to better understand the comparison semantics, we can use pure SQL in the same experiment.

Let's create a new combination type with two fields: real part and imaginary part:

postgres=# create type complex as (re float, im float);

We can create a table with new type fields and add some values to the table:

postgres=# create table numbers(x complex);

postgres=# insert into numbers values ((0.0, 10.0)), ((1.0, 3.0)), ((1.0, 1.0));

Now a question arises: if the complex numbers do not define the order relationship in the mathematical sense, how to order them?

As a result, the comparison operator has been defined for us:

postgres=# select * from numbers order by x;
   x    
--------
 (0,10)
 (1,1)
 (1,3)
(3 rows)

By default, combination types are sorted by component: compare the first field, then the second field, and so on, in much the same way as comparing text strings character by character. But we can define different orders. For example, a complex number can be treated as a vector and sorted by modulus (length), which is calculated by the square root of the sum of squares of coordinates (Pythagorean theorem). To define this order, let's create an auxiliary function to calculate the modulus:

postgres=# create function modulus(a complex) returns float as $$
    select sqrt(a.re*a.re + a.im*a.im);
$$ immutable language sql;

We now define these five auxiliary functions with this operator:

postgres=# create function complex_lt(a complex, b complex) returns boolean as $$
    select modulus(a) < modulus(b);
$$ immutable language sql;

postgres=# create function complex_le(a complex, b complex) returns boolean as $$
    select modulus(a) <= modulus(b);
$$ immutable language sql;

postgres=# create function complex_eq(a complex, b complex) returns boolean as $$
    select modulus(a) = modulus(b);
$$ immutable language sql;

postgres=# create function complex_ge(a complex, b complex) returns boolean as $$
    select modulus(a) >= modulus(b);
$$ immutable language sql;

postgres=# create function complex_gt(a complex, b complex) returns boolean as $$
    select modulus(a) > modulus(b);
$$ immutable language sql;

We will create the corresponding operator. In order to illustrate that they do not need to be called ">", "<", etc., let's name them and compare « weird ».

postgres=# create operator #<#(leftarg=complex, rightarg=complex, procedure=complex_lt);

postgres=# create operator #<=#(leftarg=complex, rightarg=complex, procedure=complex_le);

postgres=# create operator #=#(leftarg=complex, rightarg=complex, procedure=complex_eq);

postgres=# create operator #>=#(leftarg=complex, rightarg=complex, procedure=complex_ge);

postgres=# create operator #>#(leftarg=complex, rightarg=complex, procedure=complex_gt);

In this way, we can compare the numbers:

postgres=# select (1.0,1.0)::complex #<# (1.0,3.0)::complex;
 ?column?
----------
 t
(1 row)

In addition to the five operators, « btree » access method needs to define a function (too many but convenient): if the first value is less than, equal to or greater than the second value, it must return - 1, 0 or 1. This auxiliary function is called support. Other access methods may need to define other support functions.

postgres=# create function complex_cmp(a complex, b complex) returns integer as $$
    select case when modulus(a) < modulus(b) then -1
                when modulus(a) > modulus(b) then 1 
                else 0
           end;
$$ language sql;

Now we are ready to create an operator class (an operator family with the same name will be automatically created):

postgres=# create operator class complex_ops
default for type complex
using btree as
    operator 1 #<#,
    operator 2 #<=#,
    operator 3 #=#,
    operator 4 #>=#,
    operator 5 #>#,
    function 1 complex_cmp(complex,complex);

Here is the sort:

postgres=# select * from numbers order by x;
   x    
--------
 (1,1)
 (1,3)
 (0,10)
(3 rows)

And it will certainly be supported by the « btree » index.

You can get support functions through this query:

postgres=# select amp.amprocnum,
       amp.amproc,
       amp.amproclefttype::regtype,
       amp.amprocrighttype::regtype
from   pg_opfamily opf,
       pg_am am,
       pg_amproc amp
where  opf.opfname = 'complex_ops'
and    opf.opfmethod = am.oid
and    am.amname = 'btree'
and    amp.amprocfamily = opf.oid;
 amprocnum |   amproc    | amproclefttype | amprocrighttype
-----------+-------------+----------------+-----------------
         1 | complex_cmp | complex        | complex
(1 row)

Internal principle

We can use the « pageinspect » extension to explore the internal structure of the b-tree

demo=# create extension pageinspect;

Index metadata page:

demo=# select * from bt_metap('ticket_flights_pkey');
 magic  | version | root | level | fastroot | fastlevel
--------+---------+------+-------+----------+-----------
 340322 |       2 |  164 |     2 |      164 |         2
(1 row)

The most interesting thing here is the index hierarchy: for a table with one million rows, the index on two columns only needs two layers (excluding root).

Statistics of block 164 (root):

demo=# select type, live_items, dead_items, avg_item_size, page_size, free_size
from bt_page_stats('ticket_flights_pkey',164);
 type | live_items | dead_items | avg_item_size | page_size | free_size
------+------------+------------+---------------+-----------+-----------
 r    |         33 |          0 |            31 |      8192 |      6984
(1 row)

Data in the block (« data » field here sacrifices the screen width and contains the binary representation value of the index key):

demo=# select itemoffset, ctid, itemlen, left(data,56) as data
from bt_page_items('ticket_flights_pkey',164) limit 5;
 itemoffset |  ctid   | itemlen |                           data                           
------------+---------+---------+----------------------------------------------------------
          1 | (3,1)   |       8 |
          2 | (163,1) |      32 | 1d 30 30 30 35 34 33 32 33 30 35 37 37 31 00 00 ff 5f 00
          3 | (323,1) |      32 | 1d 30 30 30 35 34 33 32 34 32 33 36 36 32 00 00 4f 78 00
          4 | (482,1) |      32 | 1d 30 30 30 35 34 33 32 35 33 30 38 39 33 00 00 4d 1e 00
          5 | (641,1) |      32 | 1d 30 30 30 35 34 33 32 36 35 35 37 38 35 00 00 2b 09 00
(5 rows)

The first element is technology related and specifies the upper limit of all elements in the block (we don't discuss the implementation details), while the data itself starts with the second element. Obviously, the leftmost child node is block 163, then generation 323, and so on. Conversely, you can use the same functions to study them.

Another potentially useful extension is "amcheck", which will be incorporated into PostgreSQL 10. A lower version can be obtained from github. This extension checks the logical consistency of the data in the b-tree and enables us to detect faults in advance.

 

Tags: postgres pg

Posted by acook on Sun, 22 May 2022 15:15:48 +0300