•   PostgreSQL   •   By Pavel Luzanov

PostgreSQL 18: part 3 or CommitFest 2024-11

We continue the series of articles about new patches coming to PostgreSQL 18, this one covering the news of the November CommitFest.

If you missed the previous reviews, you can check them out here: 2024-07, 2024-09.

initdb: checksum calculation enabled by default

commit: 983a588e0, 04bec894a

A simple but important change. When initializing a cluster with initdb, specifying the option --data-checksums (-k) is no longer necessary: checksum calculation is enabled by default.

If you donʼt need checksums, then use the new option --no-data-checksums.

Planner: array lookup instead of multiple similar conditions under OR

commit: d4378c000, ae4569161

In this query, the planner in PostgreSQL 17 scans the index three times for each of the similar conditions:

17=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = 1 OR flight_id = 2 OR flight_id = 3;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on flights
   Recheck Cond: ((flight_id = 1) OR (flight_id = 2) OR (flight_id = 3))
   ->  BitmapOr
         ->  Bitmap Index Scan on flights_pkey
               Index Cond: (flight_id = 1)
         ->  Bitmap Index Scan on flights_pkey
               Index Cond: (flight_id = 2)
         ->  Bitmap Index Scan on flights_pkey
               Index Cond: (flight_id = 3)
(9 rows)

The query could be rewritten more efficiently, so that a single index scan is enough:

17=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = ANY (ARRAY[1,2,3]);
                       QUERY PLAN                       
--------------------------------------------------------
 Index Scan using flights_pkey on flights
   Index Cond: (flight_id = ANY ('{1,2,3}'::integer[]))
(2 rows)

Such optimizations must rewrite the query, though, which is not always an option, especially for queries coming from ORMs. In PostgreSQL 18, the planner can recognize the situations where the query could be optimized in this way:

18=# EXPLAIN (costs off)
SELECT * FROM flights WHERE flight_id = 1 OR flight_id = 2 OR flight_id = 3;
                       QUERY PLAN                       
--------------------------------------------------------
 Index Scan using flights_pkey on flights
   Index Cond: (flight_id = ANY ('{1,2,3}'::integer[]))
(2 rows)

Planner: switching around expressions in DISTINCT clauses

commit: a8ccf4e93

Here is another example of relaxed expectations from queries.

In PostgreSQL 17, the planner learned to rearrange expressions in GROUP BY according to their positions in the index. The ticket_flights table has an index for the composite primary key. Letʼs look at the order of the columns in the index.

\d ticket_flights_pkey
     Index "bookings.ticket_flights_pkey"
  Column   |     Type      | Key? | Definition
-----------+---------------+------+------------
 ticket_no | character(13) | yes  | ticket_no
 flight_id | integer       | yes  | flight_id
primary key, btree, for table "bookings.ticket_flights"

In the following query, the planner will use the index instead of scanning the whole table, as it would have done in earlier versions, despite the fact that the columns in GROUP BY are listed in a different order.

17=# EXPLAIN (costs off)
SELECT flight_id,ticket_no          
FROM ticket_flights
GROUP BY flight_id,ticket_no; –- the columns are in the index
                            QUERY PLAN                             
-------------------------------------------------------------------
 Group
   Group Key: ticket_no, flight_id
   ->  Index Only Scan using ticket_flights_pkey on ticket_flights
(3 rows)

Following in tow, the new patch teaches the planner to do the same for DISTINCT, including DISTINCT ON. Here it is in PostgreSQL 18:

18=# EXPLAIN (costs off)
SELECT DISTINCT flight_id,ticket_no 
FROM ticket_flights;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Unique
   ->  Index Only Scan using ticket_flights_pkey on ticket_flights
(2 rows)

Both optimizations are toggled via the configuration parameters and are on by default.

\dconfig enable_*_reordering
  List of configuration parameters
         Parameter          | Value
----------------------------+-------
 enable_distinct_reordering | on
 enable_group_by_reordering | on
(2 rows)

GROUPING SETS: HAVING -> WHERE

commit: 67a54b9e8

Note the Filter line in the plan below. It sits under the Seq Scan node, while it used to be under HashAggregate. This means that the HAVING condition is checked when scanning table rows, that is, before aggregation.

EXPLAIN (costs off)
SELECT aircraft_code, status, COUNT(*)
FROM flights
GROUP BY GROUPING SETS ((aircraft_code, status), (status))
HAVING status = 'Arrived';
                     QUERY PLAN                     
----------------------------------------------------
 HashAggregate
   Hash Key: status, aircraft_code
   Hash Key: status
   ->  Seq Scan on flights
         Filter: ((status)::text = 'Arrived'::text)
(5 rows)

Of course, the query could be rewritten and the condition moved from HAVING to WHERE. But it is great to see the planner learning to find more efficient solutions on its own.

Data type cache entry invalidation and temporary tables

commit: cc5ef90ed, d0f020037, b85a9d046

Temporary tables should be used very carefully. When a temporary table is created, the information about the table and related entities, such as data types, is stored in the system catalog. The danger here is not only possible bloating of system tables, but also the overhead of keeping the system catalog cache up to date in each client backend process.

In the example below, two anonymous blocks run sequentially in the same session. In the first block, twenty thousand temporary tables are created. The tricky construction under VALUES is there to take the type just created for the temporary table and put it into the cache. In the second block, the temporary tables are dropped. The data types are removed together with their tables, and the related system catalog cache entries are invalidated. This cache entry invalidation is what the patch optimizes.

\timing on
DO $$BEGIN
  FOR i IN 1 .. 20_000 LOOP
    EXECUTE format('CREATE TEMP TABLE t%s (id int)', i);
    EXECUTE format('INSERT INTO t%s VALUES ( (ROW(0)::t%s).id)', i, i);
  END LOOP;
END; $$;
DO $$BEGIN
  FOR i IN 1 .. 20_000 LOOP
    EXECUTE format('DROP TABLE t%s', i);
  END LOOP;
END; $$;

The execution times for each block in PostgreSQL 17:


Time: 12251,384 ms (00:12,251)
Time: 22494,163 ms (00:22,494)

And in PostgreSQL 18:


Time: 2872,074 ms (00:02,872)
Time: 1495,051 ms (00:01,495)

The difference is plain to see, especially for the second block.

Planner: incremental sorting during Merge Join

commit: 828e94c9d

As Tomas Vondra aptly noted, incremental sorting is added incrementally.

Now, during a Merge Join, the outer set (higher up in the plan) can be sorted incrementally: provided that it is partially sorted already.

An example from the patch description:

CREATE TABLE t (a int, b int);
INSERT INTO t SELECT random(1,100), random(1,100) FROM generate_series(1,100000);
CREATE INDEX ON t (a);
ANALYZE t;

EXPLAIN (costs off)
SELECT *
FROM (SELECT * FROM t ORDER BY a) t1
     JOIN t t2 ON t1.a = t2.a AND t1.b = t2.b
ORDER BY t1.a, t1.b;
                  QUERY PLAN                   
-----------------------------------------------
 Merge Join
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Incremental Sort
         Sort Key: t.a, t.b
         Presorted Key: t.a
         ->  Index Scan using t_a_idx on t
   ->  Sort
         Sort Key: t2.a, t2.b
         ->  Seq Scan on t t2
(9 rows)

Previously, the Sort node was used for the outer set every time, so it was always sorted in full, which may be less efficient:

                  QUERY PLAN                   
-----------------------------------------------
 Merge Join
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Sort
         Sort Key: t.a, t.b
         ->  Index Scan using t_a_idx on t
   ->  Materialize
         ->  Sort
               Sort Key: t2.a, t2.b
               ->  Seq Scan on t t2
(9 rows)

New function array_reverse

commit: 49d6c7d8d

The name of the function speaks for itself: it returns the elements of the array in the reverse order.

SELECT array_reverse(string_to_array('PostgreSQL',NULL));
     array_reverse     
-----------------------
 {L,Q,S,e,r,g,t,s,o,P}
(1 row)

See also:

Functions min and max for the type bytea

commit: 2d24fd942

The functions min and max are now implemented for the bytea data type.

WITH t AS (
    SELECT book_ref::bytea AS c FROM bookings
)
SELECT min(c), max(c) FROM t;
      min       |      max       
----------------+----------------
 \x303030303034 | \x464646464639
(1 row)

Parallel worker usage statistics

commit: cf54a2c00, e7a9496de

The statistics views pg_stat_statements (the first commit) and pg_stat_database (the second commit) get two new columns: parallel_workers_to_launch and parallel_workers_launched. They show how many parallel workers were planned and how many were actually run.

Letʼs check it out, using the default parallelization settings:

\dconfig max_parallel_workers*|max_worker*
    List of configuration parameters
            Parameter            | Value
---------------------------------+-------
 max_parallel_workers            | 8
 max_parallel_workers_per_gather | 2
 max_worker_processes            | 8
(3 rows)

Clear the statistics before testing:

SELECT pg_stat_statements_reset();
SELECT pg_stat_reset();

Using pgbench, start 10 processes, each running queries in parallel mode:

$ pgbench --client=10 --time=5 --file=test.sql

The queries from test.sql aggregate fairly large tables, forcing the planner to choose a parallel plan in each case:

$ cat test.sql
SELECT count(*) FROM bookings;
SELECT count(*) FROM tickets;

The pool of parallel workers wouldnʼt be sufficient for the number of workers planned by all 10 clients, as evident from the statistics at the query level:

SELECT query,
       parallel_workers_to_launch AS plan,
       parallel_workers_launched AS fact
FROM pg_stat_statements
WHERE query ~ 'tickets|bookings';
             query             | plan | fact
-------------------------------+------+------
 SELECT count(*) FROM bookings |   84 |   42
 SELECT count(*) FROM tickets  |   84 |   40
(2 rows)

And at the database level:

SELECT parallel_workers_to_launch AS plan,
       parallel_workers_launched AS fact
FROM pg_stat_database
WHERE datname = current_database();
 plan | fact
------+------
  168 |   82
(1 row)

If the planned values are regularly much higher than the actual numbers, it may be a hint to reevaluate your parallel execution policies – as long as the hardware allows.

New function pg_ls_summariesdir

commit: 4e1fad378

The new function pg_ls_summariesdir will come in handy for incremental backup monitoring. It shows the contents of pg_wal/summaries, and, unlike the good old pg_ls_dir('pg_wal/summaries'), displays the file sizes and last change dates.

SELECT * FROM pg_ls_summariesdir();
                       name                       | size  |      modification      
--------------------------------------------------+-------+------------------------
 0000000100000001412CF0980000000162404D08.summary | 17164 | 2025-02-03 15:22:33+03
(1 row)

This is a good place to recall that the list of functions to display the contents of service directories is quite impressive:

\df pg_ls_*dir
                                                                       List of functions
   Schema   |          Name             | Result data type |                                    Argument data types                                     | Type
------------+---------------------------+------------------+--------------------------------------------------------------------------------------------+------
 pg_catalog | pg_ls_archive_statusdir   | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_dir                 | SETOF text       | text                                                                                       | func
 pg_catalog | pg_ls_dir                 | SETOF text       | text, boolean, boolean                                                                     | func
 pg_catalog | pg_ls_logdir              | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_logicalmapdir       | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_logicalsnapdir      | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_replslotdir         | SETOF record     | slot_name text, OUT name text, OUT size bigint, OUT modification timestamp with time zone  | func
 pg_catalog | pg_ls_summariesdir        | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_tmpdir              | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
 pg_catalog | pg_ls_tmpdir              | SETOF record     | tablespace oid, OUT name text, OUT size bigint, OUT modification timestamp with time zone  | func
 pg_catalog | pg_ls_waldir              | SETOF record     | OUT name text, OUT size bigint, OUT modification timestamp with time zone                  | func
(11 rows)

New contrib module: pg_logicalsnapinspect

commit: 7cdfeee32

The new module pg_logicalsnapinspect serves to inspect the contents of logical decoding components: for example, for debugging or research into logical replication.

It comes with just two functions for now: pg_get_logical_snapshot_meta and pg_get_logical_snapshot_info. The functions retrieve snapshot data from files in $PGDATA/pg_logical/snapshots.

Improved extension installation error messages

commit: 14e5680ee, 774171c4f, 924e03917

As an example, letʼs install the extension uom, which comes in three different versions:

SELECT name, version
FROM pg_available_extension_versions
WHERE name = 'uom';
 name | version
------+---------
 uom  | 1.0
 uom  | 1.1
 uom  | 1.2
(3 rows)

I have added an illegal command into one of my SQL installation scripts. Attempt to install the extension:

17=# CREATE EXTENSION uom VERSION '1.2';
ERROR:  division by zero

An error! But in which file?

In PostgreSQL 18, the message and context extension installation errors are significantly improved:

18=# CREATE EXTENSION uom VERSION '1.2';
ERROR:  division by zero
CONTEXT:  SQL statement "SELECT 1/0"
extension script file "uom--1.1--1.2.sql", near line 9

NOT NULL constraints in the system catalog

commit: 14e87ffa5

This is the second attempt to start recording NOT NULL constraints into pg_constraint. The first one was in PostgreSQL 17. The patch was reverted then, but the description remains accurate.

A TOAST table for pg_index

commit: b52c4fc3c

The system catalog pg_index didnʼt have a TOAST table. The column values never were sufficiently long. Now, pg_index has a TOAST table:

SELECT reltoastrelid::regclass
FROM pg_class
WHERE oid = 'pg_index'::regclass;
     reltoastrelid      
------------------------
 pg_toast.pg_toast_2610
(1 row)

The change will be useful primarily for expression indexes, where the expression can be quite long.

COPY... FROM and file_fdw: rejected row limit

commit: 4ac2a9bec, a39297ec0, 6c8f67032

Now you can not only ignore format conversion errors, but also specify in the new REJECT_LIMIT option how many erroneous rows are allowed when attempting to COPY data.

CREATE TABLE t (id int PRIMARY KEY);
COPY t FROM STDIN (on_error 'ignore', log_verbosity verbose, reject_limit 1);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> one
>> two
>> 3
>> \.
NOTICE:  skipping row due to data type incompatibility at line 2 for column "id": "one"
NOTICE:  skipping row due to data type incompatibility at line 3 for column "id": "two"
ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
CONTEXT:  COPY t, line 3, column id: "two"

Exceeding the limit results in an error.

The third commit is a similar patch for file_fdw. In the previous article, I talked about how the extension learned to handle format conversion errors. Now you can specify a REJECT_LIMIT when attempting a COPY.

LIKE support with nondetermenistic collations

commit: 85b7efa1c

With nondetermenistic collations, strings can store different bytes but still be considered equal. The following example defines a case-independent collation. For this rule, strings with letters in different cases are equal:

CREATE COLLATION ignore_case
  (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
SELECT 'postgres' = 'POSTGRES' COLLATE "ignore_case" AS "ignore_case";
 ignore_case
-------------
 t
(1 row)

However, the LIKE operation for such strings has not been supported before. Now it is:

SELECT 'PostgreSQL' LIKE 'post%' COLLATE "ignore_case";
 ?column?
----------
 t
(1 row)

See also:

TLS v1.3: cipher suites

commit: 45188c2ea, 3d1ef3a15, 6c66b7443

The new parameter ssl_tls13_ciphers defines the list of cipher suites to be used with TLS 1.3. For previous versions, the old parameter ssl_ciphers is used.

 

Thatʼs all for now. Next: the best commits of the January CommitFest.

← Back to all articles

Pavel Luzanov