PostgreSQL 18: part 3 or CommitFest 2024-11
We continue the series of articles about new patches coming to PostgreSQL 18, this one covering the news of the November CommitFest.
If you missed the previous reviews, you can check them out here: 2024-07, 2024-09.
- initdb: checksum calculation enabled by default
- Planner: array lookup instead of multiple similar conditions under OR
- Planner: switching around expressions in DISTINCT clauses
- GROUPING SETS: HAVING -> WHERE
- Data type cache entry invalidation and temporary tables
- Planner: incremental sorting during Merge Join
- New function array_reverse
- Functions min and max for the type bytea
- Parallel worker usage statistics
- New function pg_ls_summariesdir
- new contrib module: pg_logicalsnapinspect
- Improved extension installation error messages
- NOT NULL constraints in the system catalog
- A TOAST table for pg_index
- COPY... FROM and file_fdw: rejected row limit
- LIKE support with nondetermenistic collations
- TLS v1.3: cipher suites
initdb: checksum calculation enabled by default
A simple but important change. When initializing a cluster with initdb
, specifying the option --data-checksums
(-k
) is no longer necessary: checksum calculation is enabled by default.
If you donʼt need checksums, then use the new option --no-data-checksums
.
Planner: array lookup instead of multiple similar conditions under OR
In this query, the planner in PostgreSQL 17 scans the index three times for each of the similar conditions:
17=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = 1 OR flight_id = 2 OR flight_id = 3;
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on flights
Recheck Cond: ((flight_id = 1) OR (flight_id = 2) OR (flight_id = 3))
-> BitmapOr
-> Bitmap Index Scan on flights_pkey
Index Cond: (flight_id = 1)
-> Bitmap Index Scan on flights_pkey
Index Cond: (flight_id = 2)
-> Bitmap Index Scan on flights_pkey
Index Cond: (flight_id = 3)
(9 rows)
The query could be rewritten more efficiently, so that a single index scan is enough:
17=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = ANY (ARRAY[1,2,3]);
QUERY PLAN
--------------------------------------------------------
Index Scan using flights_pkey on flights
Index Cond: (flight_id = ANY ('{1,2,3}'::integer[]))
(2 rows)
Such optimizations must rewrite the query, though, which is not always an option, especially for queries coming from ORMs. In PostgreSQL 18, the planner can recognize the situations where the query could be optimized in this way:
18=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = 1 OR flight_id = 2 OR flight_id = 3;
QUERY PLAN
--------------------------------------------------------
Index Scan using flights_pkey on flights
Index Cond: (flight_id = ANY ('{1,2,3}'::integer[]))
(2 rows)
Planner: switching around expressions in DISTINCT clauses
commit: a8ccf4e93
Here is another example of relaxed expectations from queries.
In PostgreSQL 17, the planner learned to rearrange expressions in GROUP BY
according to their positions in the index. The ticket_flights
table has an index for the composite primary key. Letʼs look at the order of the columns in the index.
\d ticket_flights_pkey
Index "bookings.ticket_flights_pkey"
Column | Type | Key? | Definition
-----------+---------------+------+------------
ticket_no | character(13) | yes | ticket_no
flight_id | integer | yes | flight_id
primary key, btree, for table "bookings.ticket_flights"
In the following query, the planner will use the index instead of scanning the whole table, as it would have done in earlier versions, despite the fact that the columns in GROUP BY
are listed in a different order.
17=# EXPLAIN (costs off)
SELECT flight_id,ticket_no
FROM ticket_flights
GROUP BY flight_id,ticket_no; –- the columns are in the index
QUERY PLAN
-------------------------------------------------------------------
Group
Group Key: ticket_no, flight_id
-> Index Only Scan using ticket_flights_pkey on ticket_flights
(3 rows)
Following in tow, the new patch teaches the planner to do the same for DISTINCT
, including DISTINCT ON
. Here it is in PostgreSQL 18:
18=# EXPLAIN (costs off)
SELECT DISTINCT flight_id,ticket_no
FROM ticket_flights;
QUERY PLAN
-------------------------------------------------------------------
Unique
-> Index Only Scan using ticket_flights_pkey on ticket_flights
(2 rows)
Both optimizations are toggled via the configuration parameters and are on by default.
\dconfig enable_*_reordering
List of configuration parameters
Parameter | Value
----------------------------+-------
enable_distinct_reordering | on
enable_group_by_reordering | on
(2 rows)
GROUPING SETS: HAVING -> WHERE
commit: 67a54b9e8
Note the Filter
line in the plan below. It sits under the Seq Scan
node, while it used to be under HashAggregate
. This means that the HAVING
condition is checked when scanning table rows, that is, before aggregation.
EXPLAIN (costs off)
SELECT aircraft_code, status, COUNT(*)
FROM flights
GROUP BY GROUPING SETS ((aircraft_code, status), (status))
HAVING status = 'Arrived';
QUERY PLAN
----------------------------------------------------
HashAggregate
Hash Key: status, aircraft_code
Hash Key: status
-> Seq Scan on flights
Filter: ((status)::text = 'Arrived'::text)
(5 rows)
Of course, the query could be rewritten and the condition moved from HAVING
to WHERE
. But it is great to see the planner learning to find more efficient solutions on its own.
Data type cache entry invalidation and temporary tables
commit: cc5ef90ed, d0f020037, b85a9d046
Temporary tables should be used very carefully. When a temporary table is created, the information about the table and related entities, such as data types, is stored in the system catalog. The danger here is not only possible bloating of system tables, but also the overhead of keeping the system catalog cache up to date in each client backend process.
In the example below, two anonymous blocks run sequentially in the same session. In the first block, twenty thousand temporary tables are created. The tricky construction under VALUES
is there to take the type just created for the temporary table and put it into the cache. In the second block, the temporary tables are dropped. The data types are removed together with their tables, and the related system catalog cache entries are invalidated. This cache entry invalidation is what the patch optimizes.
\timing on
DO $$BEGIN
FOR i IN 1 .. 20_000 LOOP
EXECUTE format('CREATE TEMP TABLE t%s (id int)', i);
EXECUTE format('INSERT INTO t%s VALUES ( (ROW(0)::t%s).id)', i, i);
END LOOP;
END; $$;
DO $$BEGIN
FOR i IN 1 .. 20_000 LOOP
EXECUTE format('DROP TABLE t%s', i);
END LOOP;
END; $$;
The execution times for each block in PostgreSQL 17:
Time: 12251,384 ms (00:12,251)
Time: 22494,163 ms (00:22,494)
And in PostgreSQL 18:
Time: 2872,074 ms (00:02,872)
Time: 1495,051 ms (00:01,495)
The difference is plain to see, especially for the second block.
Planner: incremental sorting during Merge Join
commit: 828e94c9d
As Tomas Vondra aptly noted, incremental sorting is added incrementally.
Now, during a Merge Join, the outer set (higher up in the plan) can be sorted incrementally: provided that it is partially sorted already.
An example from the patch description:
CREATE TABLE t (a int, b int);
INSERT INTO t SELECT random(1,100), random(1,100) FROM generate_series(1,100000);
CREATE INDEX ON t (a);
ANALYZE t;
EXPLAIN (costs off)
SELECT *
FROM (SELECT * FROM t ORDER BY a) t1
JOIN t t2 ON t1.a = t2.a AND t1.b = t2.b
ORDER BY t1.a, t1.b;
QUERY PLAN
-----------------------------------------------
Merge Join
Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
-> Incremental Sort
Sort Key: t.a, t.b
Presorted Key: t.a
-> Index Scan using t_a_idx on t
-> Sort
Sort Key: t2.a, t2.b
-> Seq Scan on t t2
(9 rows)
Previously, the Sort
node was used for the outer set every time, so it was always sorted in full, which may be less efficient:
QUERY PLAN
-----------------------------------------------
Merge Join
Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
-> Sort
Sort Key: t.a, t.b
-> Index Scan using t_a_idx on t
-> Materialize
-> Sort
Sort Key: t2.a, t2.b
-> Seq Scan on t t2
(9 rows)
New function array_reverse
commit: 49d6c7d8d
The name of the function speaks for itself: it returns the elements of the array in the reverse order.
SELECT array_reverse(string_to_array('PostgreSQL',NULL));
array_reverse
-----------------------
{L,Q,S,e,r,g,t,s,o,P}
(1 row)
See also:
- Waiting for PostgreSQL 18 – Add SQL function array_reverse() (Hubert ʻdepeszʼ Lubaczewski)
Functions min and max for the type bytea
commit: 2d24fd942
The functions min and max are now implemented for the bytea
data type.
WITH t AS (
SELECT book_ref::bytea AS c FROM bookings
)
SELECT min(c), max(c) FROM t;
min | max
----------------+----------------
\x303030303034 | \x464646464639
(1 row)
Parallel worker usage statistics
The statistics views pg_stat_statements (the first commit) and pg_stat_database (the second commit) get two new columns: parallel_workers_to_launch
and parallel_workers_launched
. They show how many parallel workers were planned and how many were actually run.
Letʼs check it out, using the default parallelization settings:
\dconfig max_parallel_workers*|max_worker*
List of configuration parameters
Parameter | Value
---------------------------------+-------
max_parallel_workers | 8
max_parallel_workers_per_gather | 2
max_worker_processes | 8
(3 rows)
Clear the statistics before testing:
SELECT pg_stat_statements_reset();
SELECT pg_stat_reset();
Using pgbench
, start 10 processes, each running queries in parallel mode:
$ pgbench --client=10 --time=5 --file=test.sql
The queries from test.sql aggregate fairly large tables, forcing the planner to choose a parallel plan in each case:
$ cat test.sql
SELECT count(*) FROM bookings;
SELECT count(*) FROM tickets;
The pool of parallel workers wouldnʼt be sufficient for the number of workers planned by all 10 clients, as evident from the statistics at the query level:
SELECT query,
parallel_workers_to_launch AS plan,
parallel_workers_launched AS fact
FROM pg_stat_statements
WHERE query ~ 'tickets|bookings';
query | plan | fact
-------------------------------+------+------
SELECT count(*) FROM bookings | 84 | 42
SELECT count(*) FROM tickets | 84 | 40
(2 rows)
And at the database level:
SELECT parallel_workers_to_launch AS plan,
parallel_workers_launched AS fact
FROM pg_stat_database
WHERE datname = current_database();
plan | fact
------+------
168 | 82
(1 row)
If the planned values are regularly much higher than the actual numbers, it may be a hint to reevaluate your parallel execution policies – as long as the hardware allows.
New function pg_ls_summariesdir
commit: 4e1fad378
The new function pg_ls_summariesdir
will come in handy for incremental backup monitoring. It shows the contents of pg_wal/summaries, and, unlike the good old pg_ls_dir('pg_wal/summaries')
, displays the file sizes and last change dates.
SELECT * FROM pg_ls_summariesdir();
name | size | modification
--------------------------------------------------+-------+------------------------
0000000100000001412CF0980000000162404D08.summary | 17164 | 2025-02-03 15:22:33+03
(1 row)
This is a good place to recall that the list of functions to display the contents of service directories is quite impressive:
\df pg_ls_*dir
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------------------------+------------------+--------------------------------------------------------------------------------------------+------
pg_catalog | pg_ls_archive_statusdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_dir | SETOF text | text | func
pg_catalog | pg_ls_dir | SETOF text | text, boolean, boolean | func
pg_catalog | pg_ls_logdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_logicalmapdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_logicalsnapdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_replslotdir | SETOF record | slot_name text, OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_summariesdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_tmpdir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_tmpdir | SETOF record | tablespace oid, OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
pg_catalog | pg_ls_waldir | SETOF record | OUT name text, OUT size bigint, OUT modification timestamp with time zone | func
(11 rows)
New contrib module: pg_logicalsnapinspect
commit: 7cdfeee32
The new module pg_logicalsnapinspect serves to inspect the contents of logical decoding components: for example, for debugging or research into logical replication.
It comes with just two functions for now: pg_get_logical_snapshot_meta
and pg_get_logical_snapshot_info
. The functions retrieve snapshot data from files in $PGDATA/pg_logical/snapshots.
Improved extension installation error messages
commit: 14e5680ee, 774171c4f, 924e03917
As an example, letʼs install the extension uom
, which comes in three different versions:
SELECT name, version
FROM pg_available_extension_versions
WHERE name = 'uom';
name | version
------+---------
uom | 1.0
uom | 1.1
uom | 1.2
(3 rows)
I have added an illegal command into one of my SQL installation scripts. Attempt to install the extension:
17=# CREATE EXTENSION uom VERSION '1.2';
ERROR: division by zero
An error! But in which file?
In PostgreSQL 18, the message and context extension installation errors are significantly improved:
18=# CREATE EXTENSION uom VERSION '1.2';
ERROR: division by zero
CONTEXT: SQL statement "SELECT 1/0"
extension script file "uom--1.1--1.2.sql", near line 9
NOT NULL constraints in the system catalog
commit: 14e87ffa5
This is the second attempt to start recording NOT NULL
constraints into pg_constraint
. The first one was in PostgreSQL 17. The patch was reverted then, but the description remains accurate.
A TOAST table for pg_index
commit: b52c4fc3c
The system catalog pg_index
didnʼt have a TOAST table. The column values never were sufficiently long. Now, pg_index
has a TOAST table:
SELECT reltoastrelid::regclass
FROM pg_class
WHERE oid = 'pg_index'::regclass;
reltoastrelid
------------------------
pg_toast.pg_toast_2610
(1 row)
The change will be useful primarily for expression indexes, where the expression can be quite long.
COPY... FROM and file_fdw: rejected row limit
commit: 4ac2a9bec, a39297ec0, 6c8f67032
Now you can not only ignore format conversion errors, but also specify in the new REJECT_LIMIT
option how many erroneous rows are allowed when attempting to COPY
data.
CREATE TABLE t (id int PRIMARY KEY);
COPY t FROM STDIN (on_error 'ignore', log_verbosity verbose, reject_limit 1);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> one
>> two
>> 3
>> \.
NOTICE: skipping row due to data type incompatibility at line 2 for column "id": "one"
NOTICE: skipping row due to data type incompatibility at line 3 for column "id": "two"
ERROR: skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
CONTEXT: COPY t, line 3, column id: "two"
Exceeding the limit results in an error.
The third commit is a similar patch for file_fdw
. In the previous article, I talked about how the extension learned to handle format conversion errors. Now you can specify a REJECT_LIMIT
when attempting a COPY
.
LIKE support with nondetermenistic collations
commit: 85b7efa1c
With nondetermenistic collations, strings can store different bytes but still be considered equal. The following example defines a case-independent collation. For this rule, strings with letters in different cases are equal:
CREATE COLLATION ignore_case
(provider = icu, locale = 'und-u-ks-level2', deterministic = false);
SELECT 'postgres' = 'POSTGRES' COLLATE "ignore_case" AS "ignore_case";
ignore_case
-------------
t
(1 row)
However, the LIKE
operation for such strings has not been supported before. Now it is:
SELECT 'PostgreSQL' LIKE 'post%' COLLATE "ignore_case";
?column?
----------
t
(1 row)
See also:
- Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations (Hubert ʻdepeszʼ Lubaczewski)
TLS v1.3: cipher suites
commit: 45188c2ea, 3d1ef3a15, 6c66b7443
The new parameter ssl_tls13_ciphers defines the list of cipher suites to be used with TLS 1.3. For previous versions, the old parameter ssl_ciphers is used.
Thatʼs all for now. Next: the best commits of the January CommitFest.