PostgreSQL 17: part 4 or CommitFest 2024-01
Spring is in full swing as we bring you the hottest winter news of the January Commitfest. Let’s get to the good stuff right away!
Previous articles about PostgreSQL 17: 2023-07, 2023-09, 2023-11.
- Incremental backup
- Logical replication: maintaining the subscription status when upgrading the subscriber server
- Dynamic shared memory registry
- EXPLAIN (memory): report memory usage for planning
- pg_stat_checkpointer: restartpoint monitoring on replicas
- Building BRIN indexes in parallel mode
- Queries with the IS [NOT] NULL condition for NOT NULL columns
- Optimization of SET search_path
- GROUP BY optimization
- Support planner functions for range types
- PL/pgSQL: %TYPE and %ROWTYPE arrays
- Jsonpath: new data conversion methods
- COPY ... FROM: ignoring format conversion errors
- to_timestamp: format codes TZ and OF
- GENERATED AS IDENTITY in partitioned tables
- ALTER COLUMN ... SET EXPRESSION
Incremental backup
To quickly restore to a point in time in the past, backups should be made frequently so that there are fewer WAL segments to go through to get to the desired point. However, full backups of large databases take up a lot of space, take a long time to create, and create unwanted load on the system.
PostgreSQL 17 solves this with incremental backups, which save only changes made relative to another backup.
Here’s how it works.
It all starts with walsummarizer
, a process that scans WALs and collects information about modified pages. The process needs the summarize_wal parameter to be switched on. No restart is required to change the parameter value, just a configuration reload. You can also run the process on your replicas, not just on the primary machine.
ALTER SYSTEM SET summarize_wal = on;
SELECT pg_reload_conf();
The new process appears in pg_stat_activity
and starts recording page changes in pg_wal/summaries in a compact form.
First, create a full backup.
$ pg_basebackup -c fast -D backups/full
Now, introduce some changes; for example, make a copy of a table.
CREATE TABLE tickets_copy AS SELECT * FROM tickets;
Next, create an incremental backup relative to the full one. To do this, specify the full backup manifest file in the -i
parameter.
$ pg_basebackup -c fast -D backups/increment1 \
-i backups/full/backup_manifest
Make some more changes.
DELETE FROM tickets_copy;
Create another incremental backup that will be based on the last incremental backup, not on the full one. Specify the incremental backup manifest file with the -i
parameter.
$ pg_basebackup -c fast -D backups/increment2 \
-i backups/increment1/backup_manifest
To recover from an incremental backup, first, you need to reconstruct the full backup using the pg_combinebackup
tool. Here, to recover from the second incremental backup, we need to specify the two incremental backups, the full backup, and a location for the new reconstructed full backup.
$ pg_combinebackup backups/full backups/increment1 backups/increment2 \
-o backups/full_combined
Let’s look at the backup sizes. As expected, incremental copies are significantly smaller than full ones.
$ du -h -s backups/*
2,7G backups/full
2,7G backups/full_combined
411M backups/increment1
25M backups/increment2
Start the server from the new full backup on a free port:
$ pg_ctl start -D backups/full_combined \
-o '-p 5400' \
-l backups/full_combined/logfile
Confirm that the changes from the second incremental backup have been restored.
$ psql -p 5400 -c 'SELECT count(*) FROM tickets_copy'
count
-------
0
(1 row)
Let’s summarize. The server architecture offers two new components to support incremental backup.
- The
walsummarizer
process, controlled by the summarize_wal parameter, collects information about modified pages. - The pg_walsummary tool and the SQL functions
pg_available_wal_summaries
,pg_wal_summary_contents
andpg_get_wal_summarizer_state
serve as monitoring and diagnostics tools for thewalsummarizer
process and the contents of the pg_wal/summaries directory. - The replication protocol is updated with the
UPLOAD_MANIFEST
command, and theBASE_BACKUP
command gets theINCREMENTAL
parameter. - pg_basebackup gets the
-i
(--incremental
) parameter that lets it create incremental backups using the updated replication protocol. - The pg_combinebackup tool reconstructs a full backup from an incremental copy and all copies on which it depends.
See also:
Incremental Backup: What To Copy? (Robert Haas)
Incremental Backups: Evergreen and Other Use Cases (Robert Haas)
Waiting for Postgres 17: Incremental base backups (Lukas Fittl)
Waiting for PostgreSQL 17 -- Add support for incremental backup. (Hubert ’depesz’ Lubaczewski)
Logical replication: maintaining the subscription status when upgrading the subscriber server
commit: 9a17be1e
The previous article explained migrating replication slots on the publishing server during an upgrade to a new major version. This is another patch concerning upgrading the server in the context of logical replication. Now, when upgrading the subscriber server, the status of subscriptions is preserved, which will allow the subscriber to continue receiving changes from the publisher without re-synchronizing data.
Dynamic shared memory registry
To be able to use shared memory, modules and libraries usually need to be loaded via shared_preload_libraries, and this requires restarting the server. Processes can be allocated dynamic shared memory segments (via the DSM API), but in order for other processes to access these, they need to know where to look. This is where the patch comes in: it creates and maintains a registry of dynamic shared memory segments (first commit).
The first extension to use the new interface is pg_prewarm
(second commit). Calling the autoprewarm_start_worker
and autoprewarm_dump_now
functions no longer overconsumes shared memory if the pg_prewarn
library was not loaded when the server started.
EXPLAIN (memory): report memory usage for planning
commit: 5de890e3
PostgreSQL needs memory not only to execute queries, but also to plan them. The new memory
parameter for EXPLAIN
shows how much memory was used for constructing the plan.
EXPLAIN (memory, costs off)
SELECT * FROM tickets;
QUERY PLAN
-------------------------------------------------
Seq Scan on tickets
Planning:
Memory: used=7920 bytes allocated=8192 bytes
(3 rows)
EXPLAIN (memory, costs off)
SELECT * FROM tickets a, tickets b;
QUERY PLAN
---------------------------------------------------
Nested Loop
-> Seq Scan on tickets a
-> Materialize
-> Seq Scan on tickets b
Planning:
Memory: used=19392 bytes allocated=65536 bytes
(6 rows)
The more tables involved in the query, the more memory is needed to create a plan. This is especially true for partitioned tables.
pg_stat_checkpointer: restartpoint monitoring on replicas
commit: 12915a58
pg_stat_checkpointer, a new view in PostgreSQL 17, gets columns for restartpoint monitoring: restartpoints_timed
, restartpoints_req
and restartpoints_done
. The last column shows how many restartpoints were actually made. This is needed because restartpoints on a replica cannot be made more often than those on the main server. Therefore, until the replica receives a WAL record saying that a restartpoint was made on the main server, an attempt to make a checkpoint on the replica will fail.
Building BRIN indexes in parallel mode
commit: b4375717
The patch enables building BRIN indexes by multiple worker processes in parallel. Previously, this was possible only for B-tree indexes.
Queries with the IS [NOT] NULL condition for NOT NULL columns
commit: b262ad44
In PostgreSQL 16, for each row of the tickets table, a check is performed to ensure that the ticket_no column is not empty:
16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NOT NULL;
QUERY PLAN
-----------------------------------
Seq Scan on tickets
Filter: (ticket_no IS NOT NULL)
(2 rows)
However, the ticket_no
column has a NOT NULL
constraint, so it cannot be empty, and checking it again is a waste of resources. It gets even worse with the IS NULL
condition: the whole table has to be scanned (by index, but nevertheless) just to confirm that there’s nothing to return.
16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;
QUERY PLAN
------------------------------------------
Index Scan using tickets_pkey on tickets
Index Cond: (ticket_no IS NULL)
(2 rows)
Of course, you shouldn’t use such conditions in a query. But nothing says that you couldn’t.
Moreover, in some cases, when optimizing min
/max
aggregates, the planner itself may include the IS NOT NULL
condition into the query when rewriting it. This, in turn, may lead to poor index selection. There are more details on that in the patch discussion.
In PostgreSQL 17, to improve planning when IS [NOT] NULL
conditions are involved, the NOT NULL
condition check for the column is performed first.
17=# EXPLAIN (costs off)
SELECT * FROM bookings WHERE book_date IS NOT NULL;
QUERY PLAN
----------------------
Seq Scan on bookings
(1 row)
17=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;
QUERY PLAN
--------------------------
Result
One-Time Filter: false
(2 rows)
Note that the constraint is specifically NOT NULL
. The constraint CHECK (column_name IS NOT NULL)
performs the same check, but will not be taken into account by the planner.
Optimization of SET search_path
The function definition parameter search_path ensures safer operation with database objects. However, this comes at a cost.
Let’s run some simple functions in psql and time them. First, without setting the parameter:
16=# CREATE FUNCTION f() RETURNS int AS 'SELECT 0' LANGUAGE SQL;
16=# \timing on
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 1909,958 ms (00:01,910)
Now with the parameter set:
16=# ALTER FUNCTION f SET search_path = a,b,c;
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 12594,300 ms (00:12,594)
In PostgreSQL 17, search_path has been optimized by implementing a hash table of last used values, which are checked against during function calls. This has sped up the last test query:
17=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 9501,717 ms (00:09,502)
GROUP BY optimization
commit: 0452b461
This optimization almost premiered in PostgreSQL 15, but was rolled back shortly before release. This is the second attempt.
In short, in queries where several columns are listed in GROUP BY
, the planner can swap columns around in search of the optimal plan, for example, to use an index or incremental sorting. The output is unaffected, but performance may increase.
See also:
GROUP BY reordering (Adrien Nayrat)
Support planner functions for range types
commit: 075df6b2
Suppose we need to find out which flights were made in the first week of August 2017:
16=# EXPLAIN (costs off)
SELECT *
FROM flights
WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on flights
Filter: (actual_departure <@ '["2017-08-01 00:00:00+03","2017-08-08 00:00:00+03")'::tstzrange)
(2 rows)
Note that the actual_departure column has an index, but it cannot be used with the <@
operator.
In PostgreSQL 17, the planner gets support functions for range type operators @>
and <@
. These functions rewrite the comparison so that it checks both boundaries of the range.
Here’s the plan of the same query after the patch:
17=# EXPLAIN (costs off)
SELECT *
FROM flights
WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using flights_actual_departure_idx on flights
Index Cond: ((actual_departure >= '2017-08-01 00:00:00+03'::timestamp with time zone) AND (actual_departure < '2017-08-08 00:00:00+03'::timestamp with time zone))
(2 rows)
This way, the index is used as intended.
PL/pgSQL: %TYPE and %ROWTYPE arrays
commit: 5e8674dc
PL/pgSQL now supports arrays with inheritance constructs of the %TYPE
and %ROWTYPE
types.
DO $$
DECLARE
seats_list seats.seat_no%TYPE[] :=
(SELECT array_agg(seat_no)
FROM seats
WHERE aircraft_code = '733' AND
fare_conditions = 'Business');
BEGIN
RAISE NOTICE '%', seats_list;
END;
$$ LANGUAGE plpgsql;
NOTICE: {1A,1C,1D,1F,2A,2C,2D,2F,3A,3C,3D,3F}
DO
Similarly, you can create a composite type array table_name%ROWTYPE[]
.
See also:
Waiting for PostgreSQL 17 -- In plpgsql, allow %TYPE and %ROWTYPE to be followed by array decoration. (Hubert ’depesz’ Lubaczewski)
jsonpath: new data conversion methods
commit: 66ea94e8
Support for new data conversion methods has been added to the jsonpath language in accordance with the SQL standard: .bigint()
, .boolean()
, .date()
, .decimal()
, .integer()
, .number()
, .string()
, .time()
, .time_tz()
, .timestamp()
, .timestamp_tz()
.
Some usage examples:
\set json '{"key1": 42, "key2": "t", "key3": "Hello, World!"}'
SELECT jsonb_path_query(:'json', '$.key1.integer()'),
jsonb_path_query(:'json', '$.key2.boolean()'),
jsonb_path_query(:'json', '$.key3.string()');
jsonb_path_query | jsonb_path_query | jsonb_path_query
------------------+------------------+------------------
42 | true | "Hello, World!"
COPY ... FROM: ignoring format conversion errors
commit: 9e2d8701, b725b7ee, 72943960
If an error loading any line occurs during COPY ... FROM
execution, the entire transaction is rolled back. It would be nice to be able to load all the correct rows and then process the ones with errors separately.
The first step has been taken in this direction. The COPY ... FROM
command now can ignore errors related to incorrect value formats in columns. This was based on the “soft” error handling feature introduced in PostgreSQL 16.
Here’s an example table:
CREATE TABLE t (id int PRIMARY KEY);
The patch gives the COPY
command a new parameter on_error
: The default value stop
makes it behave like it did before, stopping after the first error. The second and so far last value ignore
makes the command ignore format conversion errors:
COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> two
>> 3
>> \.
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
Two of the three rows were loaded. One was not, hence the warning. If the process takes a while, the number of skipped rows can be monitored in the tuples_skipped
column of the pg_stat_progress_copy
view (the third commit).
How do we find out which rows were skipped? In the future, the parameter on_error
will get values file
and table
to specify the name of the file or table.
Note that only format conversion errors will be ignored. Any other row load errors will still interrupt the process. For example, the first row here leads to a primary key integrity error, which is not allowed:
COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 3
>> 4
>> \.
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (id)=(3) already exists.
CONTEXT: COPY t, line 1
SELECT * FROM t;
id
----
1
3
(2 rows)
See also:
Waiting for PostgreSQL 17 -- Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR (Hubert ’depesz’ Lubaczewski)
to_timestamp: format codes TZ and OF
commit: 8ba6fdf9
The function to_char
has learned to understand the codes TZ (abbreviation from time zone) and OF (time zone offset from UTC) a while back.
The patch adds the support for these format codes to the reverse conversion function to_timestamp
:
SELECT to_timestamp('2024-02-18 23:50:00MSK', 'YYYY-MM-DD HH24:MI:SSTZ') tz,
to_timestamp('2024-02-18 23:50:00+03', 'YYYY-MM-DD HH24:MI:SSOF') of;
tz | of
------------------------+------------------------
2024-02-18 23:50:00+03 | 2024-02-18 23:50:00+03
GENERATED AS IDENTITY in partitioned tables
commit: 69958631
Identity columns declared as GENERATED AS IDENTITY
are fully supported in partitioned tables.
When creating a new partition or attaching a partition from an existing table (ATTACH PARTITION
), the identity column is associated with a specific sequence defined for the partitioned table. New values will be selected from the sequence both when inserted into a separate partition and into the partitioned table itself. When a partition is detached (DETACH PARTITION
), the column of the now independent table is detached from the sequence.
See also:
Waiting for PostgreSQL 17 -- Support identity columns in partitioned tables (Hubert ’depesz’ Lubaczewski)
ALTER COLUMN... SET EXPRESSION
commit: 5d06e99a
The patch makes it possible to change the generated table column expressions. Previously, the expressions could only be deleted.
As an example, let’s distribute the rows of the table into three nodes in the node column:
CREATE TABLE t (
id int PRIMARY KEY,
node int GENERATED ALWAYS AS (mod(id, 3)) STORED
);
WITH ins AS (
INSERT INTO t SELECT x FROM generate_series(1,100) AS x
RETURNING *
)
SELECT node, count(*) FROM ins GROUP BY 1 ORDER BY 1;
node | count
------+-------
0 | 33
1 | 34
2 | 33
(3 rows)
When changing the number of nodes, you can replace the expression with a single command:
ALTER TABLE t ALTER COLUMN node SET EXPRESSION AS (mod(id, 4));
Notably, the command not only sets a new expression, but also completely rewrites the table, updating existing values:
SELECT node, count(*) FROM t GROUP BY 1 ORDER BY 1;
node | count
------+-------
0 | 25
1 | 25
2 | 25
3 | 25
(4 rows)
See also:
Waiting for PostgreSQL 17 -- ALTER TABLE command to change generation expression (Hubert ’depesz’ Lubaczewski)
This is all for now. The news of the last March Commitfest is soon to follow.