•   PostgreSQL   •   By Pavel Luzanov

PostgreSQL 16: part 5 or CommitFest 2023-03

The end of the March Commitfest concludes the acceptance of patches for PostgreSQL 16. Let’s take a look at some exciting new updates it introduced.

I hope that this review together with the previous articles in the series (2022-07, 2022-09, 2022-11, 2023-01) will give you a coherent idea of the new features of PostgreSQL 16.

As usual, the March Commitfest introduces a ton of new changes. I’ve split them into several sections for convenience.

Monitoring

Client applications

Server administration and maintenance

Localization

Security

SQL functions and commands

Performance

Logical replication

Monitoring

pg_stat_io: input/output statistics

commit: a9c70b46, ac8d53da, 8aaa04b3

The new pg_stat_io view displays input/output statistics.

To be clear, this is disk input/output as seen by PostgreSQL. Lower level caching (OS, disk controller) is not taken into account here. In addition, only buffer cache operations are currently monitored. Transferring tables and indexes to another tablespace or WAL operations may be added in the future.

Each row of the view is identified by three columns:

  • backend_type — process type, same as in pg_stat_activity.backend_type,
  • object ― object type, such as relation (permanent tables, indexes, ...) or temp relation,
  • context ― type of operation performed on the object: normal, vacuum, bulkread, bulkwrite.

The names of the operations are intuitive, and details are listed in the documentation: pg_stat_io.

Let’s have a look at one of the rows:

SELECT *
FROM pg_stat_io
WHERE backend_type = 'client backend' AND
      object = 'relation' AND
      context = 'bulkread'
\gx
-[ RECORD 1 ]+------------------------------
backend_type | client backend
object       | relation
context      | bulkread
reads        | 427497
read_time    | 752.489
writes       | 77220
write_time   | 215.93
extends      |
extend_time  |
op_bytes     | 8192
hits         | 38683
evictions    | 94210
reuses       | 330437
fsyncs       |
fsync_time   |
stats_reset  | 2023-04-29 09:13:58.798952+03

The row shows cumulative statistics for all client backend processes that performed bulk read operations on permanent relations. Such operations include sequential scanning of large tables using a ring buffer.

Displayed statistics include buffer cache hits, evictions, and ring buffer reuses. You can also see that clients had to flush most of the buffer evictions to disk (writes).

The reads, writes and extends (relation file extension) columns show the number of performed operations. To convert to memory units, multiply the values by op_bytes. Since only buffer cache operations are monitored so far, op_bytes is always equal to the page size (8 kB). To collect execution time statistics (*_time columns), the track_io_timing parameter must be enabled.

As another example, let’s look at which processes wrote buffers from the cache to disk and compare their impacts:

SELECT backend_type, SUM(writes) blocks,
       pg_size_pretty(SUM(writes*op_bytes)) size,
       round(SUM(write_time)) "time, ms"
FROM pg_stat_io
WHERE writes > 0
GROUP BY ROLLUP (backend_type)
ORDER BY blocks;
   backend_type    | blocks |  size   | time, ms
-------------------+--------+---------+----------
 background writer |  17198 | 134 MB  |      187
 checkpointer      |  30436 | 238 MB  |      139
 background worker |  76929 | 601 MB  |      213
 autovacuum worker |  88870 | 694 MB  |      528
 client backend    | 369031 | 2883 MB |     1055
                   | 582464 | 4551 MB |     2122
(6 rows)

The two main disk writing processes, background writer and checkpointer, wrote the least. This is a clear sign that the system configuration is suboptimal. Perhaps, the buffer cache size should be increased and background writer should be set up more aggressively.

The pg_stat_bgwriter view shows similar data. The buffers_clean and buffers_checkpoint columns display how many buffers the background writer and checkpointer processes have written to disk. However, the buffers_backend column isn’t really representative. It collects data from not only client processes, but also others, including autovacuum. In addition to that, relation file extensions aren’t tracked separately (as in pg_stat_io), but rather are simply added to buffers_backend, despite the operation having nothing to do with flushing cache to disk.

See also:

Waiting for PostgreSQL 16 – Add pg_stat_io view, providing more detailed IO statistics (Hubert 'depesz' Lubaczewski)

Waiting for Postgres 16: Cumulative I/O statistics with pg_stat_io (Lukas Fittl)

Counter for new row versions moved to another page when performing an UPDATE

commit: ae4fdde1

A new column has appeared in the pg_stat_*_tables family of statistical views: n_tup_newpage_upd. Together with the existing columns n_tup_hot_upd and n_tup_upd, it helps evaluate the effectiveness of HOT optimization.

HOT optimization is done when updating non-indexed table columns. A new row version is created in the page, but no new records are created in the table indexes. This is the essence of HOT optimization. But in order for it to work, you need to have enough space in the page to place a new row version there. If the space is insufficient, the row version is created in another page and new entries are also created in all indexes. It is these row changes that are tracked by the new n_tup_newpage_upd counter.

If the counter value is too high, you may want to consider reducing the fillfactor value for the table. By reserving more page space, you increase the chances for HOT optimization to work.

Let’s create a separate database and set up pgbench:

=# CREATE DATABASE test_fillfactor;
=# \c test_fillfactor
$ pgbench -i test_fillfactor

Run pgbench for 10 seconds and look at row change statistics for the pgbench_accounts table:

$ pgbench -T 10 test_fillfactor
SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts'\gx
-[ RECORD 1 ]-----+-----------------
relname           | pgbench_accounts
n_tup_upd         | 11307
n_tup_hot_upd     | 9644
n_tup_newpage_upd | 1663

The sum of n_tup_hot_upd and n_tup_newpage_upd equals n_tup_upd. This means that during the test, all UPDATE commands did not update indexed columns, and therefore HOT optimization was possible. But in about 15% of cases, HOT optimization did not work because there was not enough space in the page for a new row version.

Let’s reset the statistics, decrease the fillfactor value and try again.

ALTER TABLE pgbench_accounts SET (fillfactor = 80);
SELECT pg_stat_reset_single_table_counters('pgbench_accounts'::regclass);
pgbench -T 10 test_fillfactor
SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts'\gx
-[ RECORD 1 ]-----+-----------------
relname           | pgbench_accounts
n_tup_upd         | 11707
n_tup_hot_upd     | 11704
n_tup_newpage_upd | 3

The result is radically different. There were only three instances where there was not enough space in the page.

Curiously, the pg_statistic system table is a candidate for decreasing fillfactor:

VACUUM ANALYZE;

SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd
FROM pg_stat_all_tables
WHERE relname = 'pg_statistic'\gx
-[ RECORD 1 ]-----+-------------
relname           | pg_statistic
n_tup_upd         | 458
n_tup_hot_upd     | 182
n_tup_newpage_upd | 276

pg_buffercache: new pg_buffercache_usage_counts function

commit: f3fa3132

A pg_buffercache_summary function was added to the pg_buffercache extension during the November Commitfest.

Now, another function is added. It shows aggregated buffer cache data subdivided by usage counts.

SELECT * FROM pg_buffercache_usage_counts();
 usage_count | buffers | dirty | pinned
-------------+---------+-------+--------
           0 |   15791 |     0 |      0
           1 |     105 |     1 |      0
           2 |      89 |     4 |      0
           3 |      22 |     1 |      0
           4 |      32 |     3 |      0
           5 |     345 |    27 |      0
(6 rows)

The main advantage of pg_buffercache_summary and pg_buffercache_usage_counts over the pg_buffercache view is speed. The new functions do not require buffer locks, so they perform much faster.

Normalization of DDL and service commands, continued

commit: daa8365a

This is a continuation of the query normalization effort in PostgreSQL 16.

This time, the affected commands are DECLARE, EXPLAIN, CREATE MATERIALIZED VIEW and CREATE TABLE AS. An especially notable feature is that normalization now takes into account any constants in the query text and substitutes them with parameters.

Let’s create three temporary tables with a constant in the AS SELECT statement:

SELECT format('create temp table tmp on commit drop as select %s', g.i)
FROM generate_series(1,3) as g(i)\gexec
SELECT 1
SELECT 1
SELECT 1

Now, verify that pg_stat_statements counts the three commands as a single one.

SELECT queryid, query, calls
FROM pg_stat_statements
WHERE query ILIKE 'create temp table%';
       queryid       |                       query                       | calls
---------------------+---------------------------------------------------+-------
 2417319575362882285 | create temp table tmp on commit drop as select $1 |     3
(1 row)

See also:

Postgres 16 highlight - Normalization of utilities in pg_stat_statements (Michael Paquier)

EXPLAIN (generic_plan): generic plan of a parameterized query

commit: 3c05284d

The idea behind this tool is as follows.

Queries with long execution times are recorded into the server log for analysis.

SHOW log_min_duration_statement;
 log_min_duration_statement
----------------------------
 100ms

An application uses the extended query protocol to send queries with parameters to the server. To imitate that, use the new command \bind in psql:

\bind ABCDEF
SELECT count(*) FROM tickets WHERE book_ref = $1;
 count
-------
     0
(1 row)

We don’t care about the result here, just that it took over 100 ms to execute and therefore was recorded in the server log.

$ tail -2 logfile
LOG:  duration: 172.195 ms  execute : SELECT count(*) FROM tickets WHERE book_ref = $1;
DETAIL:  parameters: $1 = 'ABCDEF'

Now, we can start figuring out why the query took so long to execute. To do that, we need the query plan. However, simply adding the EXPLAIN word to the query text returns an error:

EXPLAIN SELECT count(*) FROM tickets WHERE book_ref = $1;
ERROR:  there is no parameter $1
LINE 1: EXPLAIN SELECT count(*) FROM tickets WHERE book_ref = $1;
                                                              ^

The problem is that EXPLAIN cannot build plans for queries with parameters, since different parameter values can result in completely different plans.

Of course, you can manually insert the parameter value from the log and get a plan, but there could be many such queries in the log and with multiple parameters each. Automating the substitution process can be time-consuming, too.

Here is where the generic plan comes in. A query’s generic plan does not depend on the query’s parameter values. And now you can get one for your query:

EXPLAIN (generic_plan)
SELECT count(*) FROM tickets WHERE book_ref = $1;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Aggregate  (cost=65779.07..65779.08 rows=1 width=8)
   ->  Gather  (cost=1000.00..65779.07 rows=2 width=0)
         Workers Planned: 2
         ->  Parallel Seq Scan on tickets  (cost=0.00..64778.87 rows=1 width=0)
               Filter: (book_ref = $1)

The planner wants to scan the table sequentially by several processes. This means that the table is large. However, only one record will be selected. Probably, the book_ref column is missing an index that would otherwise expedite the query.

See also:

EXPLAIN (GENERIC_PLAN): New in PostgreSQL 16 (Laurenz Albe)

auto_explain: logging the query ID

commit: 9d2d9728

According to the documentation, logging queries with the auto_explain.log_verbose parameter enabled is equivalent to EXPLAIN output with the VERBOSE option. However, the query ID was not output, which was fixed:

SET compute_query_id = on;

LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_verbose = on;
SET auto_explain.log_level = 'NOTICE';

SELECT 1;
NOTICE:  duration: 0.009 ms  plan:
Query Text: SELECT 1;
Result  (cost=0.00..0.01 rows=1 width=4)
  Output: 1
Query Identifier: -7037075969593950510
 ?column?
----------
        1
(1 row)

The feature is not ported into previous PostgreSQL versions, because it might affect server log analysis tools.

PL/pgSQL: GET DIAGNOSTICS .. PG_ROUTINE_OID

commit: d3d53f95

You can now get a function’s ID from its code.

CREATE FUNCTION f (OUT func_oid oid)
AS 'BEGIN GET DIAGNOSTICS func_oid = PG_ROUTINE_OID; END;'
LANGUAGE plpgsql;

SELECT f() oid, f()::regprocedure signature;
  oid  | signature
-------+-----------
 16528 | f()

Client applications

psql: variables SHELL_ERROR and SHELL_EXIT_CODE

commit: b0d8f2d9, 31ae2aa9

Two new variables show how the last OS command has completed:

16=# \! pwd
/home/pal/pg16
16=# \echo SHELL_ERROR: :SHELL_ERROR, SHELL_EXIT_CODE: :SHELL_EXIT_CODE
SHELL_ERROR: false, SHELL_EXIT_CODE: 0

The first commit adds these variables for the \! command and commands in back quotes (``). The second one adds them to \g, \o, \w and \copy.

In some cases, the variables are intentionally left unset. These include: calling the pager program, calling the editor (\e) and setting the prompt with a command in back quotes (\set PROMPT1%`command`).

psql: \watch and the number of repetitions

commit: 00beecfe

The \watch command repeats the query until terminated. The new patch allows you to specify the number of repetitions.

16=# SELECT 1 \watch interval=1 count=2
Fri 14 Apr 2023 02:47:28 PM MSK (every 1s)

 ?column?
----------
        1
(1 row)

Fri 14 Apr 2023 02:47:29 PM MSK (every 1s)

 ?column?
----------
        1
(1 row)

16=#

psql:\df+ does not show the source code of functions

commit: 3dfae91f

The \df+ command, among other things, outputs the source code of functions, which can take up a lot of space. This makes it difficult to view the list of functions.

In PostgreSQL 16, \df+ will no longer show the source code of all functions except internal ones. Internal function code is a C language function name and always fits in one line.

You can still view function source codes with other commands: \sf and \ef.

pg_dump: support for LZ4 and zstd compression methods

commit: 0da243fe, 84adc8e2

Before PostgreSQL 16, the only compression method supported by pg_dump was gzip. The two commits add the support for LZ4 and zstd, respectively.

Backup compression is available for all formats except tar.

See also:

LZ4 and ZSTD pg_dump compression in PostgreSQL 16 (Pavlo Golub)

pg_dump and partitioned tables

commit: a563c24c

Dumping individual tables is done with the --table parameter. But if you want to dump a partitioned table with all its partitions, then you either need to explicitly list all the partitions, or use the % sign to specify a table name template that all partitions will fit. Both options are imperfect. The number of partitions may change, and partitions themselves may fit into a template poorly or the template might accidentally include other tables.

In PostgreSQL 16, the new pg_dump parameter --table-and-children allows you to leave the old approach behind. The parameter works not only for partitioned tables, but also for table inheritance.

For table exclusion, similar parameters are added: --exclude-table-and-children and --exclude-table-data-and-children.

pg_verifybackup --progress

commit: d07c2948

The pg_basebackup utility with the --progress parameter shows the backup creation progress.

The same parameter has been added to the backup verification utility pg_verifybackup. During verification, the status is updated every second, and at completion the output looks like this:

$ pg_verifybackup --progress /home/pal/pg16/backup/
2723991/2723991 kB (100%) verified
backup successfully verified

libpq: balancing connections

commit: 7f5b1981

The ability to specify multiple PostgreSQL instances during connection is not new. You can list multiple replicas under the host, hostaddr and port parameters in the connection string. Clients will try to connect to the replicas in the specified order.

psql "host=replica1,replica2,replica3"

However, with a large number of connections, the first replica in the list will experience more load than the others, and the last one may be idle at all.

To distribute connections between replicas evenly, you can shuffle the list of replicas on the application side when forming the connection string. Or you can use the new connection parameter load_balance_hosts:

psql "host=replica1,replica2,replica3 load_balance_hosts=random"

load_balance_hosts=random indicates that the list of nodes will be shuffled before attempting to connect.

Server administration and maintenance

initdb: setting configuration parameters during cluster initialization

commit: 3e51b278

The new initdb key -c (or --set) will override the value of any configuration parameter.

For the purpose of the experiment, let’s create a second instance of the server on a free port and with the wal_level parameter set to logical:

$ initdb -k -U postgres "-c wal_level=logical" "-c port=5402" -D ~/pg16/data2
$ pg_ctl start -D ~/pg16/data2 -l logfile2
$ psql -p 5402 -c "SHOW wal_level"
 wal_level
-----------
 logical

The parameters specified during initialization are stored at the end of the postgresql.conf file:

$ tail -2 ~/pg16/data2/postgresql.conf
 wal_level = logical
 port = 5402

Autovacuum: balancing I/O impact on the fly

commit: 7d71d3dd

When the autovacuum takes too long to vacuum a particularly large table, the process can be accelerated on the fly. This requires modifying the parameters autovacuum_vacuum_cost_limit and/or autovacuum_vacuum_cost_delay and updating the configuration.

In previous PostgreSQL versions, autovacuum processes could read configuration files only after they finish processing the current table. With the new patch, this is done before every pause check (autovacuum_vacuum_cost_delay).

See also:

Updating the Cost Limit On The Fly (Robert Haas)

Managing the size of shared memory for vacuuming and analysis

commit: 1cbbee03, 4830f102

In order to protect the entire shared memory pool from bulk eviction, table vacuuming uses only a fraction of it to form a ring buffer. The size of the ring buffer is relatively tiny at only 256 kB. However, when a large table needs vacuuming, performance of the whole server may suffer. For the pages in the small ring buffer to be reused, dirty pages and WAL records have to be constantly flushed to disk. Frequently synchronizing WAL to disk may become a bottleneck for the I/O system.

The first commit introduces a new parameter vacuum_buffer_usage_limit, which can be used to control the size of the ring buffer. The default value remains the same at 256 kB, but the parameter can accept values from 12 kB to 16 GB (but no more than ⅛ of the buffer cache can be used).

When running VACUUM and ANALYZE manually, you can specify the size with the BUFFER_USAGE_LIMIT option. If you set it to 0, vacuuming will be allowed to use the whole buffer cache:

VACUUM(ANALYZE, BUFFER_USAGE_LIMIT 0);

SELECT buffers_used, buffers_unused FROM pg_buffercache_summary();
 buffers_used | buffers_unused
--------------+----------------
        16384 |              0

This may be useful after uploading a large amount of new data or after bulk changes to existing tables.

When vacuuming aggressively to avoid transaction counter wraparound (the second commit), the process must execute as quickly as possible, so the shared memory limitation is disabled by default.

The ring buffer operation is now explained in the glossary under the term "Buffer Access Strategy".

VACUUM for TOAST tables only

commit: 4211fbd8

Vacuuming just a TOAST table requires the USAGE privilege for the pg_toast schema. You also have to know the name of the relevant TOAST table.

SELECT reltoastrelid::regclass FROM pg_class WHERE oid = 'tickets'::regclass;
      reltoastrelid      
-------------------------
 pg_toast.pg_toast_16443
VACUUM pg_toast.pg_toast_16443;

The new option PROCESS_MAIN lets you do all that without looking up the table name.

VACUUM (PROCESS_MAIN false) tickets;

In the same vein, the vacuumdb tool has received a new parameter --no-process-main.

The vacuum_defer_cleanup_age parameter has been removed

commit: be504a3e, 1118cd37

Setting the vacuum_defer_cleanup_age value higher than the current transaction ID may result in corruption of data. See the first commit for details.

Developing comprehensive tests and porting them to previous releases is a difficult task that may just be unnecessary. After all, the vacuum_defer_cleanup_age parameter is probably rarely used. hot_standby_feedback and replication slots that appeared later are much more convenient alternatives.

Therefore, the second commit removes the vacuum_defer_cleanup_age parameter altogether. It wasn’t a very convenient tool in the first place: too high values would result in bloating, and too low values wouldn’t do much to avoid conflicts while applying WAL records on replicas.

Even if you don’t plan on upgrading to PostgreSQL 16 any time soon, it would still be wise to take stock of all your vacuum_defer_cleanup_age in the configuration files and prepare to get rid of them.

pg_walinspect: interpretation of the end_lsn parameter

commit: 5c1b6628

The interpretation of the end_lsn parameter has been changed for pg_get_wal_records_info, pg_get_wal_stats and pg_get_wal_block_info. If no value is specified, the functions return data from start_lsn till the end of WAL.

This made it possible to remove the duplicate functions pg_get_wal_records_info_till_end_of_wal and pg_get_wal_stats_till_end_of_wal.

pg_walinspect: pg_get_wal_fpi_info → pg_get_wal_block_info

commit: 9ecb134a

The previous article mentioned the new pg_get_wal_fpi_info function of the pg_walinspect extension. The function has been significantly redesigned and renamed to pg_get_wal_block_info.

Localization

ICU: UNICODE collation

commit: 0d21d4b9

In accordance with the SQL standard, the UNICODE collation for the ICU provider has been implemented in PostgreSQL 16.

\dO unicode
                                     List of collations
   Schema   |  Name   | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic?
------------+---------+----------+---------+-------+------------+-----------+----------------
 pg_catalog | unicode | icu      |         |       | und        |           | yes

The collation uses the default sorting algorithm as described in the UNICODE technical standard.

ICU: Canonization of locales

commit: ea1db8ae, 1671f990

Locale names for the ICU provider will be converted to the canonical form in accordance with BCP 47 rules. Coincidentally, this will prevent creation of invalid collations.

CREATE COLLATION test_icu(provider=icu, locale='invalid_locale');
NOTICE:  using standard form "invalid-locale" for locale "invalid_locale"
ERROR:  ICU locale "invalid-locale" has unknown language "invalid"
HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.

To disable locale validation, you can use the new icu_validation_level parameter (second commit).

ICU: custom rules for customizing the sorting algorithm

commit: 30a53b79

The new rules option of the CREATE COLLATION command allows you to customize the sorting algorithm for specific needs.

Let’s create a collation where the seasons of the year are sorted starting from winter:

CREATE COLLATION seasons (
    provider = icu, locale = 'en', rules = '& W < Sp < Su < A'
);

WITH seasons(name) AS (
    VALUES ('Summer'), ('Winter'), ('Spring'), ('Autumn')
)
SELECT * FROM seasons ORDER BY name COLLATE seasons;
 name  
-------
 Winter
 Spring
 Summer
 Autumn
(4 rows)

Custom rules can also be specified in the CREATE DATABASE command and when running createdb and initdb.

See also:

ICU Documentation: Collation Customization

How collation works (Peter Eisentraut)

How collation of punctuation and whitespace works (Peter Eisentraut)

Security

libpq: new parameter require_auth

commit: 3a465cc6

The pg_hba.conf file is configured as follows:

SELECT rule_number rule, type, database, user_name, auth_method
FROM pg_hba_file_rules
WHERE database != '{replication}';
 rule | type  | database | user_name |  auth_method  
------+-------+----------+-----------+---------------
    1 | local | {all}    | {all}     | trust
    2 | host  | {all}    | {all}     | scram-sha-256

The new libpq parameter require_auth allows you to specify the authentication method that the server should use. You can list several methods to allow the server to use any of them:

$ psql 'host=localhost require_auth=md5,scram-sha-256' -c 'SELECT system_user'
Password for user postgres:
      system_user       
------------------------
 scram-sha-256:postgres
(1 row)

In this example, the second rule with the scram-sha-256 method should be used for TCP/IP connection. Since this method, along with md5, is listed in the client’s require_auth parameter, it is applied. (You can read more about the new system_user function in PostgreSQL 16 in the September Commitfest article.)

If the authentication method in require_auth does not match the method from pg_hba.conf, then you will not be able to connect:

$ psql 'host=localhost require_auth=md5' -c 'SELECT system_user'
psql: error: connection to server at "localhost" (127.0.0.1), port 5416 failed: auth method "md5" requirement failed: server requested SASL authentication

You can forbid specific methods from being used by listing them with a preceding exclamation mark.

$ psql 'host=localhost require_auth=!md5,!password' -c 'SELECT system_user'
Password for user postgres:
      system_user       
------------------------
 scram-sha-256:postgres
(1 row)

The special value none is used when a connection without authentication is required. In our case, this will work for the local connection under the first rule in pg_hba.conf:

$ psql 'require_auth=none' -c 'SELECT system_user'
 system_user
-------------
 
(1 row)

Instead of the require_auth parameter, you can use the PGREQUIREAUTH environment variable.

See also:

Postgres 16 highlight - require_auth for libpq (Michael Paquier)

scram_iterations: iteration counter for password encryption using SCRAM-SHA-256

commit: b5777430

In previous versions, the counter value was hard-coded as 4096.

Now, more resilient passwords can be generated by specifying a higher counter value under the new parameter scram_iterations. RFC 7677 recommends to set it to 15,000 or more.

Keep in mind that higher counter values increase security, but also the time it takes to authenticate the user.

See also:

Postgres 16 highlight - Control of SCRAM iterations (Michael Paquier)

SQL functions and commands

SQL/JSON standard support

commit: 7081ac46, 6ee30209

This big update was accepted a year ago, but was rolled back shortly before the PostgreSQL 15 release.

The patch for PostgreSQL 16 introduces support for constructor functions (JSON_ARRAY, JSON_ARRAYAGG, JSON_OBJECT, JSON_OBJECTAGG) and predicates (IS JSON [VALUE], IS JSON ARRAY, IS JSON OBJECT, IS JSON SCALAR).

The work will continue with the next major release.

New functions pg_input_error_info and pg_input_is_valid

commit: b8da37b3, 1939d262

The pg_input_is_valid function checks whether the first argument is a valid input value for the data type from the second argument:

SELECT pg_input_is_valid('value', 'jsonb');
 pg_input_is_valid
-------------------
 f

The pg_input_error_info function returns detailed information about the error:

SELECT * FROM pg_input_error_info('value', 'jsonb')\gx
-[ RECORD 1 ]--+-----------------------------------
message        | invalid input syntax for type json
detail         | Token "value" is invalid.
hint           |
sql_error_code | 22P02

The functions do a “soft” check of input values. The lack of such a function in the last release cycle is what prevented the SQL/JSON patch from releasing with PostgreSQL 15.

See also:

Waiting for PostgreSQL 16 – Add test scaffolding for soft error reporting from input functions (Hubert 'depesz' Lubaczewski)

The Daitch-Mokotoff Soundex

commit: a290378a

Daitch-Mokotoff Soundex support is added to the fuzzystrmatch extension.

CREATE EXTENSION fuzzystrmatch;

SELECT daitch_mokotoff('Holubica'),
       daitch_mokotoff('Golubitsa'),
       daitch_mokotoff('Holubica') && daitch_mokotoff('Golubitsa');
 daitch_mokotoff | daitch_mokotoff | ?column?
-----------------+-----------------+----------
 {587500,587400} | {587400}        | t

See also:

Soundexing and Genealogy by Gary Mokotoff

New functions array_shuffle and array_sample

commit: 888f2ea0

The array_shuffle function shuffles the elements of the input array, and array_sample returns an array with the specified number of randomly selected elements of the input array:

WITH a(digits) AS (
    SELECT '{1,2,3,4,5,6,7,8,9,0}'::int[]
)
SELECT array_shuffle(a.digits),
       array_sample(a.digits, 3)
FROM a;
     array_shuffle     | array_sample
-----------------------+--------------
 {1,0,9,8,2,3,5,7,4,6} | {9,1,4}

The functions can be helpful when using the Monte Carlo method.

See also:

Waiting for PostgreSQL 16 – Add array_sample() and array_shuffle() functions (Hubert 'depesz' Lubaczewski)

New aggregate function any_value

commit: 2ddab010

An SQL standard aggregate function any_value returns an arbitrary non-empty value for a group of rows.

Consider a table without a unique key in which some rows are duplicated.

CREATE TABLE t (id int);
INSERT INTO t VALUES(1),(1),(2),(3),(3) RETURNING *;
 id
----
  1
  1
  2
  3
  3
(5 rows)

You need to delete all duplicates in one query. (Similar tasks are common in job interviews.)

To select one of two duplicates, the functions min and max are commonly applied to the system column ctid. This works, but is a bit misleading, because we don’t really need a minimum or a maximum value. Any of the two will do, and the new function any_value does just that:

DELETE FROM t 
WHERE ctid IN (SELECT any_value(ctid)
               FROM t 
               GROUP BY id HAVING count(*) = 2
);
DELETE 2
SELECT * FROM t;
 id
----
  1
  2
  3
(3 rows)

COPY: inserting default values

commit: 9f8377f7

Consider a table with two mandatory date columns. If a value is missing, a date from the distant past or future is used as a placeholder.

CREATE TABLE periods (
    id int PRIMARY KEY,
    date_from date NOT NULL DEFAULT '0001-01-01',
    date_to   date NOT NULL DEFAULT '3999-12-31'
);

The table is populated from external files by means of the COPY command. The providers of the external files use 'N/A' for missing date values. You cannot affect how the files are formed.

\! cat data.txt
1    2023-04-01    2023-05-01
2    2023-04-08    N/A
3    N/A    2023-04-22

Importing such a file into the table is now made easier by the DEFAULT option of the COPY command. When a value specified under DEFAULT is matched in the data, the default value for the column is used instead:

\COPY periods FROM 'data.txt' (DEFAULT 'N/A');

SELECT * FROM periods;
 id | date_from  |  date_to   
----+------------+------------
  1 | 2023-04-01 | 2023-05-01
  2 | 2023-04-08 | 3999-12-31
  3 | 0001-01-01 | 2023-04-22
(3 rows)

timestamptz: adding and subtracting time intervals

commit: 75bd846b

The + and - operators can be used to add and subtract time intervals from values of the timestamp with time zone type (timestamptz). The result of such operations depends on the time zone set in the timezone parameter.

To make the result independent of the current time zone, you can use the new functions date_add and date_subtract. The last parameter of these functions allows you to explicitly specify the time zone for calculations:

SET timezone = 'UTC';

SELECT '2021-10-31 00:00:00+02'::timestamptz + '1 day'::interval,
  date_add('2021-10-31 00:00:00+02'::timestamptz, '1 day'::interval, 'Europe/Warsaw');
        ?column?        |        date_add        
------------------------+------------------------
 2021-10-31 22:00:00+00 | 2021-10-31 23:00:00+00

The time zone parameter is also added to the generate_series function.

Independence from session parameters allowed to declare all three functions as immutable in the time zone variant.

XML: formatting values

commit: 483bdb2a

The XMLSERIALIZE function now has a parameter INDENT for formatting output:

SELECT XMLSERIALIZE(
    DOCUMENT '<feature><name>Support [NO] INDENT option in XMLSERIALIZE()</name><author>Jim Jones</author><reviewers>Peter Smith and Tom Lane</reviewers><committer>Tom Lane</committer></feature>' AS TEXT
    INDENT);
                        xmlserialize                         
-------------------------------------------------------------
 <feature>                                                  +
   <name>Support [NO] INDENT option in XMLSERIALIZE()</name>+
   <author>Jim Jones</author>                               +
   <reviewers>Peter Smith and Tom Lane</reviewers>          +
   <committer>Tom Lane</committer>                          +
 </feature>                                                 +

pg_size_bytes: support for "B"

commit: ce1215d9

Some parameter values can be set in bytes.

SHOW log_parameter_max_length;
 log_parameter_max_length
--------------------------
 512B

You can use pg_size_bytes to get the number of bytes:

SELECT pg_size_bytes(current_setting('log_parameter_max_length'));
 pg_size_bytes
---------------
           512

In previous versions, the pg_size_bytes function will throw an error because it understands bytes, but not B.

New functions: erf, erfc

commit: d5d57414

Together with random_normal, the PostgreSQL 16 release introduces the error function erf and the complementary error function erfc. The 68-95-99.7 rule can now be viewed like this:

SELECT erf(1 / sqrt(2)) AS erf_1,
       erf(2 / sqrt(2)) AS erf_2,
       erf(3 / sqrt(2)) AS erf_3;
       erf_1        |       erf_2        |       erf_3        
--------------------+--------------------+--------------------
 0.6826894921370859 | 0.9544997361036416 | 0.9973002039367398

The erfc function returns a value close to 1 - erf().

Performance

Parallel execution of full and right hash joins

commit: 11c2d6fd

In previous versions, the query below was always executed sequentially. In PostgreSQL 16, parallel execution is now possible:

EXPLAIN (costs off)
SELECT count(*)
FROM bookings b FULL OUTER JOIN tickets t USING (book_ref);
                          QUERY PLAN                           
---------------------------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 2
         ->  Partial Aggregate
               ->  Parallel Hash Full Join
                     Hash Cond: (t.book_ref = b.book_ref)
                     ->  Parallel Seq Scan on tickets t
                     ->  Parallel Hash
                           ->  Parallel Seq Scan on bookings b

Options for the right antijoin

commit: 16dc2703

Let’s look at the plan of a query that lists all the flights for which no tickets have been issued yet.

15=# EXPLAIN
SELECT * FROM flights f
WHERE NOT EXISTS (SELECT 1 FROM ticket_flights t WHERE t.flight_id = f.flight_id);
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Hash Anti Join  (cost=291530.67..338842.08 rows=133627 width=63)
   Hash Cond: (f.flight_id = t.flight_id)
   ->  Seq Scan on flights f  (cost=0.00..4772.67 rows=214867 width=63)
   ->  Hash  (cost=153851.52..153851.52 rows=8391852 width=4)
         ->  Seq Scan on ticket_flights t  (cost=0.00..153851.52 rows=8391852 width=4)

PostgreSQL 15, the query is executed using the antijoin (the Hash Anti Join node). The hash table is built for the ticket_flights table, the larger of the two tables from the query. Creation of a hash table requires some time and memory, which is reflected in the high startup cost of the query relative to the total cost.

In PostgreSQL 16, the planner also considers the antijoin where the table positions are reversed. If the reverse plan turns out to be cheaper, it will be selected instead.

16=# EXPLAIN         
SELECT * FROM flights f
WHERE NOT EXISTS (SELECT 1 FROM ticket_flights t WHERE t.flight_id = f.flight_id);
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Hash Right Anti Join  (cost=9767.51..288502.03 rows=134347 width=63)
   Hash Cond: (t.flight_id = f.flight_id)
   ->  Seq Scan on ticket_flights t  (cost=0.00..153850.08 rows=8391708 width=4)
   ->  Hash  (cost=4772.67..4772.67 rows=214867 width=63)
         ->  Seq Scan on flights f  (cost=0.00..4772.67 rows=214867 width=63)

The modified plan node is now called Hash Right Anti Join.

The Merge Join has received a similar modification.

Relation extension mechanism rework

commit: 31966b15, 00d1e02b

When records are added into a table frequently, the table files have to be extended by appending new pages. To prevent multiple clients from adding new pages at the same time, a relation extension lock is used. Such locks may significantly affect performance when multiple clients are actively inserting records.

The idea behind this update is to have only the bare minimum of operations done while the lock is held, and perform the rest without the lock.

Behind this modest summary is a tremendous amount of work which resulted in 13 separate patches, of which I’ve listed just two.

As a result, not only applications with frequent record insertions by multiple clients will benefit, but also single thread insertion which will be performed at a lower cost.

BRIN indexes don’t block HOT updates

commit: 19d8e230

HOT updates are not applied if at least one column modified by the UPDATE command is indexed by any type of index.

However, BRIN indexes do not contain references to table rows, therefore it is safe to use HOT updates on columns with such indexes. With the new patch, BRIN indexes no longer affect HOT optimization:

CREATE INDEX flights_bi ON flights USING brin(actual_departure);
SELECT pg_stat_reset_single_table_counters('flights'::regclass);

UPDATE flights SET actual_departure = actual_departure + '5 min'::interval
WHERE flight_id = 1;

SELECT n_tup_hot_upd, n_tup_newpage_upd
FROM pg_stat_all_tables
WHERE relname = 'flights';
 n_tup_hot_upd | n_tup_newpage_upd
---------------+-------------------
             1 |                 0

One of the counters ticks up.

The patch has been accepted into the PostgreSQL 15 already before, but was rolled back before the release. This is the second attempt.

postgres_fdw: aborting transactions on remote servers in parallel mode

commit: 983ec230

PostgreSQL 15 introduced the ability to asynchronously commit transactions affecting foreign tables connected via postgres_fdw.

Now, with the new parallel_abort parameter for the external server, transactions can be aborted asynchronously as well.

force_parallel_mode → debug_parallel_query

commit: 5352ca22

The parameter force_parallel_mode is now called debug_parallel_query.

This is done because many users mistakingly enable this parameter, thinking that it will increase query execution speed. In fact, the result is quite the opposite!

When switched on, the parameter makes the server always prefer the parallel plan, even if it is the slower one. The parameter should be used for query debugging, whenever the developer wants to figure out why the server prefers a sequential plan over its parallel equivalent. The new name reflects what the parameter does more precisely and should not provoke users to switch it on based on the name alone.

See also:

New old “debug_parallel_query” setting in PostgreSQL 16 (Pavlo Golub)

Direct I/O (for developers only)

commit: faeedbce

PostgreSQL works with files through the operating system, not directly. This is an effort to introduce the ability to input/output data and WAL directly. The work continues, so no useful new features so far.

There is a new configuration parameter io_direct, but you should avoid switching it on. The documentation honestly states that doing so will decrease performance, because it would disable the operating system’s prefetching, and PostgreSQL’s own prefetching has not been developed yet. The parameter is here for the developers and is intended for testing purposes only.

Logical replication

Logical replication from a physical replica

commit: 0fdab27a

This patch makes it possible to subscribe to a physical replica of a publication server rather than to the server itself.

The wal_level parameter must be set to logical both on the primary server and the physical replica for this to work:

replica=# SELECT pg_is_in_recovery(), current_setting('wal_level');
 pg_is_in_recovery | current_setting
-------------------+-----------------
 t                 | logical

Create a publication on the primary server:

primary=# CREATE TABLE t (id int primary key);
primary=# INSERT INTO t VALUES (42);

primary=# CREATE PUBLICATION pub FOR TABLE t;

Make sure that both the table and the publication are present on the replica:

replica=# SELECT * FROM t;
 id
----
 42
(1 row)
replica=# \dRp+
                              Publication pub
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
 postgres | f          | t       | t       | t       | t         | f
Tables:
    "public.t"

Now, on a separate server (subscriber), subscribe to the publication. In the connection string, specify the replica that is located on port 5401:

subscriber=# CREATE TABLE t (id int primary key);
subscriber=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=5401' PUBLICATION pub;

This command should create a replication slot on the physical replica. If the primary server is inactive at this moment, it might take some time. You can speed things up by executing the following command on the primary server:

primary=# SELECT pg_log_standby_snapshot();

The replica will “wake up” and the slot will be created just on it:

replica=# SELECT slot_name, active FROM pg_replication_slots WHERE slot_name = 'sub';
 slot_name | active
-----------+--------
 sub       | t
(1 row)
primary=# SELECT slot_name, active FROM pg_replication_slots WHERE slot_name = 'sub';
 slot_name | active
-----------+--------
(0 rows)

As for logical replication, it is already working:

subscriber=# SELECT * FROM t;
 id
----
 42
(1 row)

See also:

Postgres 16 highlight: Logical decoding on standby (Bertrand Drouvot)

Using non-unique indexes with REPLICA IDENTITY FULL

commit: 89e46da5

If a table in a publication does not have a primary or a unique key, then REPLICA IDENTITY FULL is used as the replication ID. On the subscriber side, the search for modified and deleted rows is done via sequential scan.

With the new patch, logical replication will be able to use the non-unique index to identify rows on the subscriber that have been modified or deleted. Index selection cannot be controlled. If the table has several indexes, one of them will be selected arbitrarily. Only indexes of B-tree type which are not partial and include at least one column of the table (not entirely functional) can be used.

Initial synchronization in binary format

commit: ecb69652

To speed up replication, you can enable the parameter binary for the subscription so that the publishing server sends changes in binary format:

CREATE SUBSCRIPTION .. WITH (binary);

Before PostgreSQL 16, this affected just the format the changes were replicated in, while during initial synchronization, only the textual format was used to transmit the data. Now the COPY command that performs the initial synchronization will also work in binary format.

Privileges for creating subscriptions and applying changes

commit: c3afe8cf, 1e10d49b, 48267598

The first commit creates a predefined role pg_create_subscription. Members of this role can create subscriptions (and replication slots on the publishing server) without superuser privileges.

For security reasons, the subscription owner must switch to the table owner role for any INSERT, UPDATE, DELETE and TRUNCATE operations (second commit). Therefore, the subscription owner must have the right to execute the necessary SET ROLE commands.

The old behavior, when the subscription owner has the rights to apply changes, can be enabled by the new subscription parameter run_as_owner (third commit):

CREATE SUBSCRIPTION .. WITH (run_as_owner);

Committing changes in parallel mode (for developers only)

commit: c3afe8cf, 1e10d49b, 9f2213a7

Multiple processes will be able to apply changes on the subscriber at the same time. This will speed up the processing of large transactions received from the publication server.

The feature is not implemented in full yet, so no details so far. However, there is a hint in the patch: a new parameter logical_replication_mode in the development section.


That’s all for now. Nothing new will get into PostgreSQL 16 at this point. Looking forward to the release in the Autumn!

← Back to all articles

Pavel Luzanov