Home > Blog

May 21, 2024 • PostgreSQL • By Pavel Luzanov

PostgreSQL 17: part 4 or CommitFest 2024-01

PostgreSQL

Source:
habr.com

Subscribe to blog

Source:
habr.com

Subscribe to blog

Spring is in full swing as we bring you the hottest winter news of the January Commitfest. Let’s get to the good stuff right away!

Previous articles about PostgreSQL 17: 2023-07, 2023-09, 2023-11.

Incremental backup
Logical replication: maintaining the subscription status when upgrading the subscriber server
Dynamic shared memory registry
EXPLAIN (memory): report memory usage for planning
pg_stat_checkpointer: restartpoint monitoring on replicas
Building BRIN indexes in parallel mode
Queries with the IS [NOT] NULL condition for NOT NULL columns
Optimization of SET search_path
GROUP BY optimization
Support planner functions for range types
PL/pgSQL: %TYPE and %ROWTYPE arrays
Jsonpath: new data conversion methods
COPY ... FROM: ignoring format conversion errors
to_timestamp: format codes TZ and OF
GENERATED AS IDENTITY in partitioned tables
ALTER COLUMN ... SET EXPRESSION

Incremental backup

commit: 174c4805, dc212340

To quickly restore to a point in time in the past, backups should be made frequently so that there are fewer WAL segments to go through to get to the desired point. However, full backups of large databases take up a lot of space, take a long time to create, and create unwanted load on the system.

PostgreSQL 17 solves this with incremental backups, which save only changes made relative to another backup.

Here’s how it works.

It all starts with walsummarizer, a process that scans WALs and collects information about modified pages. The process needs the summarize_wal parameter to be switched on. No restart is required to change the parameter value, just a configuration reload. You can also run the process on your replicas, not just on the primary machine.


ALTER SYSTEM SET summarize_wal = on;
SELECT pg_reload_conf();

The new process appears in pg_stat_activity and starts recording page changes in pg_wal/summaries in a compact form.

First, create a full backup.


$ pg_basebackup -c fast -D backups/full

Now, introduce some changes; for example, make a copy of a table.


CREATE TABLE tickets_copy AS SELECT * FROM tickets;

Next, create an incremental backup relative to the full one. To do this, specify the full backup manifest file in the -i parameter.


$ pg_basebackup -c fast -D backups/increment1 \
    -i backups/full/backup_manifest

Make some more changes.


DELETE FROM tickets_copy;

Create another incremental backup that will be based on the last incremental backup, not on the full one. Specify the incremental backup manifest file with the -i parameter.


$ pg_basebackup -c fast -D backups/increment2 \
    -i backups/increment1/backup_manifest

To recover from an incremental backup, first, you need to reconstruct the full backup using the pg_combinebackup tool. Here, to recover from the second incremental backup, we need to specify the two incremental backups, the full backup, and a location for the new reconstructed full backup.


$ pg_combinebackup backups/full backups/increment1 backups/increment2 \
    -o backups/full_combined

Let’s look at the backup sizes. As expected, incremental copies are significantly smaller than full ones.


$ du -h -s backups/*


2,7G    backups/full
2,7G    backups/full_combined
411M    backups/increment1
25M     backups/increment2

Start the server from the new full backup on a free port:


$ pg_ctl start -D backups/full_combined \
    -o '-p 5400' \
    -l backups/full_combined/logfile

Confirm that the changes from the second incremental backup have been restored.


$ psql -p 5400 -c 'SELECT count(*) FROM tickets_copy'


 count
-------
     0
(1 row)

Let’s summarize. The server architecture offers two new components to support incremental backup.

The walsummarizer process, controlled by the summarize_wal parameter, collects information about modified pages.
The pg_walsummary tool and the SQL functions pg_available_wal_summaries, pg_wal_summary_contents and pg_get_wal_summarizer_state serve as monitoring and diagnostics tools for the walsummarizer process and the contents of the pg_wal/summaries directory.
The replication protocol is updated with the UPLOAD_MANIFEST command, and the BASE_BACKUP command gets the INCREMENTAL parameter.
pg_basebackup gets the -i (--incremental) parameter that lets it create incremental backups using the updated replication protocol.
The pg_combinebackup tool reconstructs a full backup from an incremental copy and all copies on which it depends.

Incremental Backups: Evergreen and Other Use Cases (Robert Haas)

Waiting for Postgres 17: Incremental base backups (Lukas Fittl)

Waiting for PostgreSQL 17 -- Add support for incremental backup. (Hubert ’depesz’ Lubaczewski)

Logical replication: maintaining the subscription status when upgrading the subscriber server

commit: 9a17be1e

The previous article explained migrating replication slots on the publishing server during an upgrade to a new major version. This is another patch concerning upgrading the server in the context of logical replication. Now, when upgrading the subscriber server, the status of subscriptions is preserved, which will allow the subscriber to continue receiving changes from the publisher without re-synchronizing data.

Dynamic shared memory registry

commit: 8b2bcf3f, abb0b4fc

To be able to use shared memory, modules and libraries usually need to be loaded via shared_preload_libraries, and this requires restarting the server. Processes can be allocated dynamic shared memory segments (via the DSM API), but in order for other processes to access these, they need to know where to look. This is where the patch comes in: it creates and maintains a registry of dynamic shared memory segments (first commit).

The first extension to use the new interface is pg_prewarm (second commit). Calling the autoprewarm_start_worker and autoprewarm_dump_now functions no longer overconsumes shared memory if the pg_prewarn library was not loaded when the server started.

EXPLAIN (memory): report memory usage for planning

commit: 5de890e3

PostgreSQL needs memory not only to execute queries, but also to plan them. The new memory parameter for EXPLAIN shows how much memory was used for constructing the plan.


EXPLAIN (memory, costs off)
SELECT * FROM tickets;


                   QUERY PLAN                    
-------------------------------------------------
 Seq Scan on tickets
 Planning:
   Memory: used=7920 bytes  allocated=8192 bytes
(3 rows)


EXPLAIN (memory, costs off)
SELECT * FROM tickets a, tickets b;


                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on tickets a
   ->  Materialize
         ->  Seq Scan on tickets b
 Planning:
   Memory: used=19392 bytes  allocated=65536 bytes
(6 rows)

The more tables involved in the query, the more memory is needed to create a plan. This is especially true for partitioned tables.

pg_stat_checkpointer: restartpoint monitoring on replicas

commit: 12915a58

pg_stat_checkpointer, a new view in PostgreSQL 17, gets columns for restartpoint monitoring: restartpoints_timed, restartpoints_req and restartpoints_done. The last column shows how many restartpoints were actually made. This is needed because restartpoints on a replica cannot be made more often than those on the main server. Therefore, until the replica receives a WAL record saying that a restartpoint was made on the main server, an attempt to make a checkpoint on the replica will fail.

Building BRIN indexes in parallel mode

commit: b4375717

The patch enables building BRIN indexes by multiple worker processes in parallel. Previously, this was possible only for B-tree indexes.

Queries with the IS [NOT] NULL condition for NOT NULL columns

commit: b262ad44

In PostgreSQL 16, for each row of the tickets table, a check is performed to ensure that the ticket_no column is not empty:


16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NOT NULL;


            QUERY PLAN             
-----------------------------------
 Seq Scan on tickets
   Filter: (ticket_no IS NOT NULL)
(2 rows)

However, the ticket_no column has a NOT NULL constraint, so it cannot be empty, and checking it again is a waste of resources. It gets even worse with the IS NULL condition: the whole table has to be scanned (by index, but nevertheless) just to confirm that there’s nothing to return.


16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;


                QUERY PLAN                
------------------------------------------
 Index Scan using tickets_pkey on tickets
   Index Cond: (ticket_no IS NULL)
(2 rows)

Of course, you shouldn’t use such conditions in a query. But nothing says that you couldn’t.

Moreover, in some cases, when optimizing min/max aggregates, the planner itself may include the IS NOT NULL condition into the query when rewriting it. This, in turn, may lead to poor index selection. There are more details on that in the patch discussion.

In PostgreSQL 17, to improve planning when IS [NOT] NULL conditions are involved, the NOT NULL condition check for the column is performed first.


17=# EXPLAIN (costs off)
SELECT * FROM bookings WHERE book_date IS NOT NULL;


      QUERY PLAN      
----------------------
 Seq Scan on bookings
(1 row)


17=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;


        QUERY PLAN        
--------------------------
 Result
   One-Time Filter: false
(2 rows)

Note that the constraint is specifically NOT NULL. The constraint CHECK (column_name IS NOT NULL) performs the same check, but will not be taken into account by the planner.

Optimization of SET search_path

commit: ad57c2a7, a86c61c9

The function definition parameter search_path ensures safer operation with database objects. However, this comes at a cost.

Let’s run some simple functions in psql and time them. First, without setting the parameter:


16=# CREATE FUNCTION f() RETURNS int AS 'SELECT 0' LANGUAGE SQL;
16=# \timing on
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';


DO
Time: 1909,958 ms (00:01,910)

Now with the parameter set:


16=# ALTER FUNCTION f SET search_path = a,b,c;
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';


DO
Time: 12594,300 ms (00:12,594)

In PostgreSQL 17, search_path has been optimized by implementing a hash table of last used values, which are checked against during function calls. This has sped up the last test query:


17=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';


DO
Time: 9501,717 ms (00:09,502)

GROUP BY optimization

commit: 0452b461

This optimization almost premiered in PostgreSQL 15, but was rolled back shortly before release. This is the second attempt.

In short, in queries where several columns are listed in GROUP BY, the planner can swap columns around in search of the optimal plan, for example, to use an index or incremental sorting. The output is unaffected, but performance may increase.

See also:

GROUP BY reordering (Adrien Nayrat)

Support planner functions for range types

commit: 075df6b2

Suppose we need to find out which flights were made in the first week of August 2017:


16=# EXPLAIN (costs off)
SELECT *
FROM flights
WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');


                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on flights
   Filter: (actual_departure <@ '["2017-08-01 00:00:00+03","2017-08-08 00:00:00+03")'::tstzrange)
(2 rows)

Note that the actual_departure column has an index, but it cannot be used with the <@ operator.

In PostgreSQL 17, the planner gets support functions for range type operators @> and <@. These functions rewrite the comparison so that it checks both boundaries of the range.

Here’s the plan of the same query after the patch:


17=# EXPLAIN (costs off)
SELECT *
FROM flights
WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');


                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using flights_actual_departure_idx on flights
   Index Cond: ((actual_departure >= '2017-08-01 00:00:00+03'::timestamp with time zone) AND (actual_departure < '2017-08-08 00:00:00+03'::timestamp with time zone))
(2 rows)

This way, the index is used as intended.

PL/pgSQL: %TYPE and %ROWTYPE arrays

commit: 5e8674dc

PL/pgSQL now supports arrays with inheritance constructs of the %TYPE and %ROWTYPE types.


DO $$
DECLARE
    seats_list seats.seat_no%TYPE[] :=
        (SELECT array_agg(seat_no)
         FROM seats
         WHERE aircraft_code = '733' AND
         fare_conditions = 'Business');
BEGIN
    RAISE NOTICE '%', seats_list;
END;
$$ LANGUAGE plpgsql;


NOTICE:  {1A,1C,1D,1F,2A,2C,2D,2F,3A,3C,3D,3F}
DO

Similarly, you can create a composite type array table_name%ROWTYPE[].

jsonpath: new data conversion methods

commit: 66ea94e8

Support for new data conversion methods has been added to the jsonpath language in accordance with the SQL standard: .bigint(), .boolean(), .date(), .decimal(), .integer(), .number(), .string(), .time(), .time_tz(), .timestamp(), .timestamp_tz().

Some usage examples:


\set json '{"key1": 42, "key2": "t", "key3": "Hello, World!"}'

SELECT jsonb_path_query(:'json', '$.key1.integer()'),
       jsonb_path_query(:'json', '$.key2.boolean()'),
       jsonb_path_query(:'json', '$.key3.string()');


 jsonb_path_query | jsonb_path_query | jsonb_path_query
------------------+------------------+------------------
 42               | true             | "Hello, World!"

COPY ... FROM: ignoring format conversion errors

commit: 9e2d8701, b725b7ee, 72943960

If an error loading any line occurs during COPY ... FROM execution, the entire transaction is rolled back. It would be nice to be able to load all the correct rows and then process the ones with errors separately.

The first step has been taken in this direction. The COPY ... FROM command now can ignore errors related to incorrect value formats in columns. This was based on the “soft” error handling feature introduced in PostgreSQL 16.

Here’s an example table:


CREATE TABLE t (id int PRIMARY KEY);

The patch gives the COPY command a new parameter on_error: The default value stop makes it behave like it did before, stopping after the first error. The second and so far last value ignore makes the command ignore format conversion errors:


COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> two
>> 3
>> \.


NOTICE:  1 row was skipped due to data type incompatibility
COPY 2

Two of the three rows were loaded. One was not, hence the warning. If the process takes a while, the number of skipped rows can be monitored in the tuples_skipped column of the pg_stat_progress_copy view (the third commit).

How do we find out which rows were skipped? In the future, the parameter on_error will get values file and table to specify the name of the file or table.

Note that only format conversion errors will be ignored. Any other row load errors will still interrupt the process. For example, the first row here leads to a primary key integrity error, which is not allowed:


COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 3
>> 4
>> \.


ERROR:  duplicate key value violates unique constraint "t_pkey"
DETAIL:  Key (id)=(3) already exists.
CONTEXT:  COPY t, line 1


SELECT * FROM t;


 id
----
  1
  3
(2 rows)

to_timestamp: format codes TZ and OF

commit: 8ba6fdf9

The function to_char has learned to understand the codes TZ (abbreviation from time zone) and OF (time zone offset from UTC) a while back.

The patch adds the support for these format codes to the reverse conversion function to_timestamp:


SELECT to_timestamp('2024-02-18 23:50:00MSK', 'YYYY-MM-DD HH24:MI:SSTZ') tz,
       to_timestamp('2024-02-18 23:50:00+03', 'YYYY-MM-DD HH24:MI:SSOF') of;


           tz           |           of           
------------------------+------------------------
 2024-02-18 23:50:00+03 | 2024-02-18 23:50:00+03

GENERATED AS IDENTITY in partitioned tables

commit: 69958631

Identity columns declared as GENERATED AS IDENTITY are fully supported in partitioned tables.

When creating a new partition or attaching a partition from an existing table (ATTACH PARTITION), the identity column is associated with a specific sequence defined for the partitioned table. New values will be selected from the sequence both when inserted into a separate partition and into the partitioned table itself. When a partition is detached (DETACH PARTITION), the column of the now independent table is detached from the sequence.

See also:

Waiting for PostgreSQL 17 -- Support identity columns in partitioned tables (Hubert ’depesz’ Lubaczewski)

ALTER COLUMN... SET EXPRESSION

commit: 5d06e99a

The patch makes it possible to change the generated table column expressions. Previously, the expressions could only be deleted.

As an example, let’s distribute the rows of the table into three nodes in the node column:


CREATE TABLE t (
    id int PRIMARY KEY,
    node int GENERATED ALWAYS AS (mod(id, 3)) STORED
);

WITH ins AS (
    INSERT INTO t SELECT x FROM generate_series(1,100) AS x
    RETURNING *
)
SELECT node, count(*) FROM ins GROUP BY 1 ORDER BY 1;


 node | count
------+-------
    0 |    33
    1 |    34
    2 |    33
(3 rows)

When changing the number of nodes, you can replace the expression with a single command:


ALTER TABLE t ALTER COLUMN node SET EXPRESSION AS (mod(id, 4));

Notably, the command not only sets a new expression, but also completely rewrites the table, updating existing values:


SELECT node, count(*) FROM t GROUP BY 1 ORDER BY 1;


 node | count
------+-------
    0 |    25
    1 |    25
    2 |    25
    3 |    25
(4 rows)

This is all for now. The news of the last March Commitfest is soon to follow.

← Back to all articles