•   PostgreSQL   •   By Pavel Luzanov

PostgreSQL 17: part 3 or CommitFest 2023-11

The November commitfest is ripe with new interesting features! Without further ado, let’s proceed with the review.

If you missed our July and September commitfest reviews, you can check them out here: 2023-07, 2023-09.

ON LOGIN trigger

commit: e83d1b0c

The ability to put event an trigger on a database login event is coming to PostgreSQL.

As usual, the trigger is created in two steps. First, the trigger function:

CREATE FUNCTION check_login() RETURNS event_trigger
AS $$
    IF session_user = 'postgres' THEN RETURN; END IF;

    IF to_char(current_date, 'DY') IN ('SAT','SUN')
        RAISE 'Have a nice weekend, see you on Monday!';
    END IF;
$$ LANGUAGE plpgsql;

Then the trigger itself:

    EXECUTE FUNCTION check_login();

Now, to everyone’s delight, users will not bother the administrator on weekends. :)

$ psql -U alice -d postgres

psql: error: connection to server on socket "/tmp/.s.PGSQL.5401" failed: FATAL:  Have a nice weekend, see you on Monday!
CONTEXT:  PL/pgSQL function check_login() line 6 at RAISE

See also:

Waiting for PostgreSQL 17 – Add support event triggers on authenticated login – select * from depesz;

Event triggers for REINDEX

commit: f21848de

Event triggers now work with the REINDEX command, allowing you to create triggers for ddl_command_start and ddl_command_end events.

ALTER OPERATOR: commutator, negator, hashes, merges

commit: 2b5154be

In the ALTER OPERATOR command, you can now specify a commutator and a negator for the operator being defined, if they were not specified in the CREATE OPERATOR already. You can also add support for hash join and merge join operations (HASHES and MERGES)

pg_dump --filter=dump.txt

commit: a5cf808b

The new parameter --filter allows the user to specify a file name with a list of objects to include or exclude from the dump:

$ cat dump.txt

include table      bookings
exclude table_data bookings

$ pg_dump -d demo --filter=dump.txt |grep -v -E '^SET|^SELECT|^--|^$'

CREATE TABLE bookings.bookings (
    book_ref character(6) NOT NULL,
    book_date timestamp with time zone NOT NULL,
    total_amount numeric(10,2) NOT NULL
ALTER TABLE bookings.bookings OWNER TO postgres;
COMMENT ON TABLE bookings.bookings IS 'Bookings';
COMMENT ON COLUMN bookings.bookings.book_ref IS 'Booking number';
COMMENT ON COLUMN bookings.bookings.book_date IS 'Booking date';
COMMENT ON COLUMN bookings.bookings.total_amount IS 'Total booking cost';
ALTER TABLE ONLY bookings.bookings
    ADD CONSTRAINT bookings_pkey PRIMARY KEY (book_ref);

The exact file syntax is described in the documentation.

This parameter may come in handy when the list of database objects is so long that it does not fit into the pg_dump command’s character limit.

--filter has also been added to pg_dumpall and pg_restore.

psql: displaying default privileges

commit: d1379ebf

psql commands were never a good way to view default privileges. We will use schemas in this example, although the same logic applies to any type of database object.


When a schema is created, its owner gets both its privileges, USAGE and CREATE, by default. However, if you try to check the privileges using the \dn+ command, it will return NULL, since the privileges aren’t recorded into pg_namespace.nspacl.

16=# SELECT nspacl IS NULL
FROM pg_namespace
WHERE oid = 's'::regnamespace;


16=# \pset null '(null)'
16=# \dn+ s

                  List of schemas
 Name |  Owner   | Access privileges | Description
 s    | postgres |                   |

Note how even if you set \pset null, it does not affect the NULL values being displayed. When viewing system catalog objects, psql commands used to ignore this setting.

Running any GRANT or REVOKE command will have the privileges recorded into the system catalog explicitly. The following commands grant and revoke privileges for the schema. Nothing changes as a result, except that the default privileges now appear in the table:


16=# \dn+ s

                   List of schemas
 Name |  Owner   |  Access privileges   | Description
 s    | postgres | postgres=UC/postgres |

The owner may revoke their own privileges (and grant them back again later):

16=# REVOKE ALL ON SCHEMA s FROM postgres;

At the moment, pg_namespace.nspacl is an empty aclitem[] array. An empty array does not equal NULL, but how to distinguish between the two in the command output?

16=# \dn+ s

                  List of schemas
 Name |  Owner   | Access privileges | Description
 s    | postgres |                   |

The only solution used to be to query pg_namespace.

In PostgreSQL 17 this is no longer the case. The \pset null setting now properly affects NULL values:


17=# \pset null '(null)'
17=# \dn+ s

                  List of schemas
 Name |  Owner   | Access privileges | Description
 s    | postgres | (null)            | (null)

And the absence of privileges (an empty array) is displayed with a special value (none):

17=# REVOKE ALL ON SCHEMA s FROM postgres;
17=# \dn+ s

                  List of schemas
 Name |  Owner   | Access privileges | Description
 s    | postgres | (none)            | (null)

pg_stat_statements: track statement entry timestamps and reset min/max statistics

commit: dc9f8a79

pg_stat_statements now has a new column stats_since, which records the time when collecting statistics for each operator was initiated. In addition to that, you can now reset the statistics in the min/max* columns for specific operators by calling pg_stat_statements with the parameter minmax_only. The reset time for the statistics is recorded in the minmax_stats_since column.

These new features will come in handy for monitoring systems that rely on sampling information from pg_stat_statements. In particular, they may skip collecting all the statistics before starting a snapshot.

pg_stat_checkpointer: checkpointer process statistics

commit: 96f05261, 74604a37

Buffer cache changes can be written to disk by three types of processes: background writer, checkpointer, and user backend processes. For the longest time, all the relevant statistics have been tracked in a single pg_stat_bgwriter view.

The new patch trims the number of columns in pg_stat_bgwriter down considerably. All checkpointer statistics have been moved into a new view pg_stat_checkpointer (first commit).

At the same time, the buffers_backend and buffers_backend_fsync columns were removed from pg_stat_bgwriter, since more accurate and detailed information about the backend processes can now be found in the pg_stat_io view that was introduced in PostgreSQL 16 (second commit).

pg_stats: statistics for range type columns

commit: bc3c8db8

Statistics for range type columns have traditionally been collected and stored in the pg_statistic table. However, this information was not displayed in the pg_stats view.

This patch fixes that by adding three new columns to pg_stats: range_length_histogram, range_empty_frac, and range_bounds_histogram.

Planner: exclusion of unnecessary table self-joins

commit: d3d55ce5

In a poorly composed query, a table may be joined with itself for any reason, as the SQL syntax allows it. Such queries often come from all sorts of ORMs, and the burden of optimisation often falls onto those who are powerless to change the way the queries are structured in the first place.

In PostgreSQL 17, the planner now can catch these unnecessary joins and exclude them from the query plan. The following example demonstrates an unnecessary semijoin of the table bookings with itself.

17=# EXPLAIN (costs off)
    SELECT book_ref FROM bookings
FROM bookings
WHERE book_ref IN (SELECT book_ref FROM b);

            QUERY PLAN            
 Seq Scan on bookings
   Filter: (book_ref IS NOT NULL)

The planner is justified in deciding that the table should only be scanned once.

This optimization is toggled by the parameter enable_self_join_removal. Switching it off leads to the behavior observed in the older versions:

17=# SET enable_self_join_removal = off;

17=# EXPLAIN (costs off)
    SELECT book_ref FROM bookings
FROM bookings
WHERE book_ref IN (SELECT book_ref FROM b);

                       QUERY PLAN                       
 Hash Join
   Hash Cond: (bookings.book_ref = bookings_1.book_ref)
   ->  Seq Scan on bookings
   ->  Hash
         ->  Seq Scan on bookings bookings_1

Planner: materialized CTE statistics

commit: f7816aec

Let’s try the same example, but with CTE materialization. This is how it plays out in PostgreSQL 16 (jit off):

16=# EXPLAIN (analyze,timing off)
    SELECT book_ref FROM bookings
FROM bookings
WHERE book_ref IN (SELECT book_ref FROM b);

                                                    QUERY PLAN                                                     
 Nested Loop  (cost=82058.51..83469.66 rows=1055555 width=21) (actual rows=2111110 loops=1)
   CTE b
     ->  Seq Scan on bookings bookings_1  (cost=0.00..34558.10 rows=2111110 width=7) (actual rows=2111110 loops=1)
   ->  HashAggregate  (cost=47499.98..47501.98 rows=200 width=28) (actual rows=2111110 loops=1)
         Group Key: b.book_ref
         Batches: 141  Memory Usage: 11113kB  Disk Usage: 57936kB
         ->  CTE Scan on b  (cost=0.00..42222.20 rows=2111110 width=28) (actual rows=2111110 loops=1)
   ->  Index Scan using bookings_pkey on bookings  (cost=0.43..8.37 rows=1 width=21) (actual rows=1 loops=2111110)
         Index Cond: (book_ref = b.book_ref)
 Planning Time: 0.237 ms
 Execution Time: 10847.689 ms                                                    

The planner has decided to group the CTE by book_ref before joining with the outer query (the HashAggregate node). But book_ref is the primary key of the bookings table, so the number of rows after grouping will remain at ~2 million, not at the estimated 200 rows. As a result, the planner selects a wrong join type (nested loop) for the CTE and the outer query.

In PostgreSQL 17, the planner has access to some statistics about columns from materialized CTE and can use those in the outer parts of the plan. This leads to more precise cardinality estimates. This is how the same query runs now:

                                                    QUERY PLAN                                                     
 Hash Semi Join  (cost=117601.18..222997.93 rows=2111110 width=21) (actual rows=2111110 loops=1)
   Hash Cond: (bookings.book_ref = b.book_ref)
   CTE b
     ->  Seq Scan on bookings bookings_1  (cost=0.00..34558.10 rows=2111110 width=7) (actual rows=2111110 loops=1)
   ->  Seq Scan on bookings  (cost=0.00..34558.10 rows=2111110 width=21) (actual rows=2111110 loops=1)
   ->  Hash  (cost=42222.20..42222.20 rows=2111110 width=28) (actual rows=2111110 loops=1)
         Buckets: 131072  Batches: 32  Memory Usage: 3529kB
         ->  CTE Scan on b  (cost=0.00..42222.20 rows=2111110 width=28) (actual rows=2111110 loops=1)
 Planning Time: 0.127 ms
 Execution Time: 1556.576 ms

With the unnecessary grouping gone and using hash join for the CTE and the outer query, the execution is about 7 times faster.

Planner: accessing a table with multiple clauses

commit: 5d8aa8bc

When a table is queried with multiple restriction clauses, should it always be queried in the first place? This patch introduces a small optimisation along this line of thinking.

Consider this query in PostgreSQL 16.

16=# EXPLAIN (costs off, analyze, timing off, summary off)
FROM tickets
WHERE ticket_no = '0005432' AND ticket_no = '0005000';

                           QUERY PLAN                            
 Result (actual rows=0 loops=1)
   One-Time Filter: false
   ->  Index Scan using tickets_pkey on tickets (never executed)
         Index Cond: (ticket_no = '0005432'::bpchar)

Two clauses for the tickets table contradict each other, but the planner does not immediately figure this out and proceeds to select a table access method. It concludes at a later stage that there is no need to access the table after all, but the resources spent on planning end up wasted.

Now, here is the plan for the same query in PostgreSQL 17:

           QUERY PLAN           
 Result (actual rows=0 loops=1)
   One-Time Filter: false

Both the clauses are analyzed in advance. The plan not only becomes cheaper to build, but looks much neater as well.

Index range scan optimization

commit: e0b1ee17

When a range scan is performed using a B-tree index, each scanned index page is searched for values within the given range.

Previously, PostgreSQL used to check if every value in an index page is within the range. With the new patch, it first checks the last value in a page: if the last value is within the range, so must be all other values. The more index pages are there to scan and the more expensive the comparison operator, the greater the optimization effect.

In the following example, more than 40 thousand index pages are scanned. This is the query in PostgreSQL 16:

16=# EXPLAIN (analyze, buffers, costs off, timing off)
FROM tickets
WHERE ticket_no > '0005432' AND ticket_no < '0005434';

                                     QUERY PLAN                                      
 Index Scan using tickets_pkey on tickets (actual rows=1469571 loops=1)
   Index Cond: ((ticket_no > '0005432'::bpchar) AND (ticket_no < '0005434'::bpchar))
   Buffers: shared hit=10675 read=30185
   Buffers: shared read=4
 Planning Time: 0.214 ms
 Execution Time: 683.801 ms

The same query in PostgreSQL 17 executes much faster:

                                     QUERY PLAN                                      
 Index Scan using tickets_pkey on tickets (actual rows=1469571 loops=1)
   Index Cond: ((ticket_no > '0005432'::bpchar) AND (ticket_no < '0005434'::bpchar))
   Buffers: shared hit=10690 read=30170
   Buffers: shared read=4
 Planning Time: 0.268 ms
 Execution Time: 237.177 ms

dblink, postgres_fdw: detailed wait events

commit: c789f0f6, d61f2538

Extensions dblink, postgres_fdw first started using the new interface for creating wait events. The new wait events are explained in each extension’s documentation .

Logical replication: migration of replication slots during publisher upgrade

commit: 29d0a77f

Upgrading the publisher with pg_upgrade creates an issue with logical replication. Publications migrate to the new version, but replication slots do not. This forces subscribers to synchronize data again when the publisher is upgraded.

This patch makes pg_upgrade migrate the replication slots to the new server, so the subscribers only have to adjust the connection string and then carry on collecting updates.

Replication slot use log

commit: 7c3fb505

In addition to logging replication commands, the parameter log_replication_commands now also enables logging of wal sender processes acquiring and releasing replication slots.

Analyzing these log messages may show how long replication slots remain idle, which in turn may help hunt down issues with consumers (such as replicas or logical replication subscribers).

Let’s enable log_replication_commands and make a backup using pg_basebackup. There are new messages in the log:

LOG:  acquired physical replication slot "pg_basebackup_339820"
LOG:  released physical replication slot "pg_basebackup_339820"

Unicode: new information functions

commit: a02b37fc

The function unicode_assigned returns True if all every character in a string has a Unicode code point assigned:

SELECT unicode_assigned('Hello World!');


The other two functions return the Unicode version for PostgreSQL and ICU:

SELECT unicode_version(), icu_unicode_version();

 unicode_version | icu_unicode_version
 15.1            | 14.0

New function: xmltext

commit: 526fe0d7

xmltext is a standard SQL function. It converts an input string into an XML value, properly escaping any special characters:

SELECT xmltext('<Hello & World>');

 <Привет & Мир>

AT LOCAL support

commit: 97957fdb

The construction AT TIME ZONE is used to explicitly specify a time zone when converting values. The SQL standard has an abbreviation AT LOCAL to indicate the current time zone (defined by the timezone parameter value). Now AT LOCAL can be used in PostgreSQL. The following two expressions for the Europe/Moscow timezone are now equivalent:

SELECT  now() AT TIME ZONE 'Europe/Moscow',
        now() AT LOCAL

-[ RECORD 1 ]------------------------
timezone | 2023-12-18 12:57:29.612578
timezone | 2023-12-18 12:57:29.612578

Infinite intervals

commit: 519fc1bd

The type interval now works with infinite values:

SELECT 'infinity'::interval, '-infinity'::interval;

 interval | interval  
 infinity | -infinity

This allows for some new arithmetic operations:

SELECT now() + 'infinity'::interval,
       now() - 'infinity'::interval;

 ?column? | ?column?  
 infinity | -infinity

ALTER SYSTEM with unrecognized custom parameters

commit: 2d870b4a

The ALTER SYSTEM command now can write custom parameters to postgresql.auto.conf:

=# ALTER SYSTEM SET myapp.today = '2023-12-06';

By default, only the superuser can do this, but the privileges for specific parameters can be granted with the GRANT ... ON PARAMETER command.

In fact, ALTER SYSTEM has been capable of working with custom parameters even in previous versions, but they had to be declared in-session with the SET command and stored in the hash table in memory. This strange behavior was declared a mistake and is now fixed.

Again, you can still set custom parameters in the main configuration file postgresql.conf.

Building the server from source

commit: 721856ff

Tar archives with source code will no longer contain files pre-generated with Flex, Bison, and perl utilities. There will also be no generated documentation files and man pages. Therefore, the build from the source code obtained from the Downloads section will be identical to the build from the git repository source code. This means that in any case, flex, bison and perl will be required.

This was done primarily for the sake of the assembly system meson, because it can’t compile a server from tar archives as they exist now.


That’s all I’ve got in 2023. Looking forward to the next PostgreSQL 17 commitfest in January!

← Back to all articles

Pavel Luzanov