F.50. pgpro_gbtree — global index on a partitioned table #
The pgpro_gbtree extension enables creation and use of a global index on a partitioned table.
A global index is an index that is created for one or several columns of a partitioned table and does not include the partition key. At the physical level, a global index on a partitioned table contains data related to the indexed columns for all the partitions, which means that the global index allows you to perform index scan without iterating through all the partitions.
A global index provides the following benefits:
It enables a much faster search of rows of the partitioned table by a condition on the indexed columns. The search appears to be faster because the global index is not partitioned, and therefore, there is no need to perform index scans of individual partitions.
It lifts a Postgres Pro restriction that does not allow creation of a unique index for columns that do not contain the partition key.
F.50.1. Global Index Operation #
These are a few details of how a global index works:
A global index is created for a user-defined set of columns, and the primary key columns of the partitioned table are automatically added to the global index as
INCLUDEcolumns.An index scan with a global index is performed in two steps:
The value is searched for in the global index.
The value of the primary key in
INCLUDEindex columns is used to search for the partition and for the value in this partition.
A global index is not rebuilt for
DELETEandDETACH PARTITIONoperations. So it is OK that a global index contains some outdated data, which can be cleaned up with VACUUM.A global index is not partitioned. Therefore, the data size limitation of 32 TB holds for it. It is assumed that the global index fits these 32 TB as it contains only part of data of the partitioned table (this assumption may, however, appear not to be true in some circumstances).
F.50.2. Limitations #
Creation of global indexes has the following limitations:
The partitioned table for which a global index is created must have a primary key.
Expressions used instead of column names and predicates in the
WHEREclause are not currently supported.Specifying non-key columns (in the
INCLUDEclause) is not supported. TheCREATE INDEXcommand issues a warning if such columns are specified and ignores them.Duplicate columns in a global index are not allowed.
Usage of global indexes has the following limitations:
A global index, like a standard b-tree index, is not suitable for columns that contain few unique values. An example of an unsuitable column is the
gendercolumn with two valuesMandF. In addition to a high overhead of the pure index, for the global index, when a new value is added, a hash sum of the value gets locked. With few possible values, modifying transactions are likely to block one another, that is, hang on this lock.It is forbidden to add a partitioned table with a global index, as a partition, to another partitioned table.
If a partitioned table has a global index, deleting its primary key is not allowed. Delete the global index first.
The CLUSTER command for global indexes is not currently supported.
Operations with the
WHERE CURRENT OFcondition are not supported for cursors that use global indexes.It is forbidden to create unique constraints for unique global indexes. Consequently, global indexes cannot have
DEFERRABLEconstraints.
F.50.3. Installation #
The pgpro_gbtree extension is included in Postgres Pro Enterprise. To enable pgpro_gbtree, create the extension using the following query:
CREATE EXTENSION pgpro_gbtree;
F.50.4. Usage #
F.50.4.1. Creating and Removing Global Indexes #
To create a global index, execute the CREATE INDEX command specifying the gbtree access method from the pgpro_gbtree extension. The following command creates a global index for the t_int table:
CREATE INDEX t_int_v ON t_int USING gbtree (v);
To remove a global index, execute the DROP INDEX command. For example:
DROP INDEX t_int_v;
F.50.4.2. Using Global Indexes in Queries #
It is not possible to implement a usual Postgres Pro behavior for a global index as in the standard implementation, an entire partitioned table, on which the global index is created, is not regarded as a node when the query plan is built. Plans get linked to partitions, and then the best scan method is chosen for each partition, such as SeqScan, IndexOnlyScan, IndexScan, and so on. For the Postgres Pro planner, a partitioned table only provides the list of partitions and is not considered a planning object. A global index can be used if the partitioned table becomes a planning object. And this only happens under the following conditions:
The query does not contain a condition on the partition key that excludes at least one partition. It is assumed that the search by a condition on the partition key in a query is more efficient than the global index. This can be controlled using the gbtree.usage_partition_fraction and gbtree.partitions_number_threshold configuration parameters. For example, if
gbtree.partitions_number_thresholdis set to2, a global index is used when the planner's condition on the partition key retains more than two partitions.The query contains a condition on columns of the global index.
F.50.4.3. Vacuuming Global Indexes #
For global indexes on a partitioned table, vacuuming may operate in two modes:
When regular vacuuming is performed on a partitioned table, it processes only those pages of the global index that are marked as containing index tuples bearing a deletion mark. Furthermore, only the tuples that have this deletion mark are processed. For each of these tuples, the corresponding index value is searched for within the table partitions. If the value is not found in any partition, the index tuple is removed. Conversely, if the value is found, the deletion mark is removed from the index tuple.
Index tuples are marked for deletion when a
SELECTquery that utilizes the global index determines that the index value is absent from the partitions. Note that regular vacuuming may skip index tuples that have no corresponding values in the partitions if those tuples are not marked as deleted. Consequently, it is not guaranteed that all obsolete index tuples will be completely removed. This operating mode also applies to autovacuum.When the
VACUUMcommand is executed with theDISABLE_PAGE_SKIPPINGparameter enabled, it iterates through all pages of the global index and examines every index tuple residing on those pages. For each tuple, the system searches for the corresponding value within the table partitions. This mode requires substantially more system resources than regular vacuuming.The
VACUUM FULLcommand has no effect when executed on a partitioned table and produces no output messages.Partitioned tables can be processed by autovacuum. Autovacuum behavior for such tables is controlled with the table storage parameters listed below. These parameters operate in the same manner as they do for regular (non-partitioned) tables:
Note that this threshold applies to all global indexes of a partitioned table collectively. Consequently, autovacuum may be triggered when the total number of index tuples marked for deletion across all global indexes exceeds this value.
F.50.5. Configuration Parameters #
gbtree.usage_partition_fraction(int) #Specifies the percentage of partitions that determines whether a global index is used. For example:
The value is set to
100: If a query contains a condition on the partition key that excludes at least one partition, a global index is not used.The value is set to
50: If a query contains a condition on the partition key that excludes fewer than half of all partitions, a global index is used.
The default value is
100.gbtree.partitions_number_threshold(int) #Specifies the number of partitions that determines whether a global index is used. If a query contains a condition on the partition key that retains the number of partitions greater than the value set in this configuration parameter, a global index is used.
This configuration parameter is designed for cases where gbtree.usage_partition_fraction does not provide adequate control, for example, when a partitioned table contains thousands or tens of thousands of partitions.
The default value is
0, meaning that the configuration parameter is not set.
Note
Postgres Pro checks gbtree.usage_partition_fraction first, then gbtree.partitions_number_threshold.
F.50.6. Example #
The following example illustrates usage of a global index.
-- Creating the pgpro_gbtree extension CREATE EXTENSION pgpro_gbtree; -- Creating a partitioned table with three partitions CREATE TABLE t_int (i int PRIMARY KEY, v int, x int) PARTITION BY RANGE (i); CREATE TABLE t_int_1 PARTITION OF t_int FOR VALUES FROM (0) TO (100); CREATE TABLE t_int_2 PARTITION OF t_int FOR VALUES FROM (100) TO (200); CREATE TABLE t_int_3 PARTITION OF t_int FOR VALUES FROM (200) TO (300); -- Creating a global index for the v column CREATE INDEX t_int_v ON t_int USING gbtree (v); -- Creating a normal index for the x column CREATE INDEX t_int_x ON t_int (x); -- Query plan for a condition on the column with the global index EXPLAIN SELECT * FROM t_int WHERE v > 100; -- Query plan for a condition on the column with the normal index EXPLAIN SELECT * FROM t_int WHERE x > 100;
The example produces the following output:
CREATE EXTENSION
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t_int_v on t_int (cost=0.12..8.14 rows=0 width=0)
Index Cond: (v > 100)
(2 rows)
QUERY PLAN
------------------------------------------------------------------------------------
Append (cost=9.42..93.97 rows=2040 width=12)
-> Bitmap Heap Scan on t_int_1 (cost=9.42..27.92 rows=680 width=12)
Recheck Cond: (x > 100)
-> Bitmap Index Scan on t_int_1_x_idx (cost=0.00..9.25 rows=680 width=0)
Index Cond: (x > 100)
-> Bitmap Heap Scan on t_int_2 (cost=9.42..27.92 rows=680 width=12)
Recheck Cond: (x > 100)
-> Bitmap Index Scan on t_int_2_x_idx (cost=0.00..9.25 rows=680 width=0)
Index Cond: (x > 100)
-> Bitmap Heap Scan on t_int_3 (cost=9.42..27.92 rows=680 width=12)
Recheck Cond: (x > 100)
-> Bitmap Index Scan on t_int_3_x_idx (cost=0.00..9.25 rows=680 width=0)
Index Cond: (x > 100)
(13 rows)
The result of this example shows that with the global index, the index scan is performed for the entire partitioned table, while with the normal index, the index scan is performed for each partition individually, and this search can be slow when the number of partitions is high.