PostgreSQL 18: part 5 or CommitFest 2025-03
September 25th marks the release of PostgreSQL 18. This article covers the March CommitFest and concludes the series covering the new features of the upcoming update. This article turned out quite large, as the last March CommitFest is traditionally the biggest and richest in new features.
You can find previous reviews of PostgreSQL 18 CommitFests here: 2024-07, 2024-09, 2024-11, 2025-01.
Client and Server Applications
- pg_dump[all]/pg_restore: dump and restore statistics
- Collect statistics after server upgrade
- pg_upgrade --swap: migrate directories to a new cluster
- pg_combinebackup --link: hard links instead of copying files
- pg_dump[all], pg_restore: --no-policies
- pg_createsubscriber: enable two_phase for all subscriptions
- pg_createsubscriber: remove publications on subscriber
- pg_createsubscriber: create subscriptions for every database on the publication server
- psql: pipeline mode
- psql: current connection information
- psql: set the default time interval for \watch
- psql: \dx displays the default extension version
Monitoring
- NUMA: monitoring tools for non-uniform memory access architectures
- pg_stat_get_backend_wal: WAL statistics for a specific process
- EXPLAIN: actual rows shown to two decimal places
- EXPLAIN: an interface for adding more options to the command
- Log failed lock attempts
- Log session time-to-connect
- log_line_prefix: local server IP address
- pg_stat_statements: normalize commands with lists of constants in IN
- Additional WAL buffer overflow monitoring tools
- Track vacuum and analyze delays
[Auto]vacuum and Analysis
- vacuum_truncate: manage the truncation of empty pages at the end of a table
- More frequent autovacuuming of dead tuples in large tables
- Autovacuum triggers sooner after new rows are inserted
- Eager freezing to amortize aggressive vacuum
Performance
- Asynchronous input/output
- io_combine_limit: increase maximum allowed limit to 1MB
- Read stream interface usage
- BTree indexes: skipping the condition on the index’s leading column
- GiST indexes: fast index creation using the sortsupport method
- Planner: replacing IN (VALUES ...) with array search
- Planner: eliminating redundant self-joins
Procedural Languages
Replication
- multiple_unique_conflicts: another logical replication conflict
- idle_replication_slot_timeout: canceling inactive slots by timeout
- max_active_replication_origins: maximum number of replication origins
Security
- Support for the OAuth 2.0 authorization protocol
- dblink: using pass-through SCRAM authentication
- pgcrypto: support for sha256crypt and sha512crypt password hashing algorithms
- pgcrypto: support for CFB mode for the AES encryption standard
- ALTER DEFAULT PRIVILEGES and large objects
- libpq: the sslkeylogfile parameter
Server
- Temporal primary, unique, and foreign keys
- Virtual computed columns
- NOT NULL integrity constraints: creation without validation, enabling and disabling inheritance
- Integrity constraints: NOT ENFORCED
- file_copy_method: method for copying files when creating a database
- extension_control_path: location of extension control files
- Nondeterministic collations: support for substring search functions
- The gamma and lgamma functions
- Conversion of integer types to bytea and back
- The pg_get_loaded_modules function: information about libraries loaded into shared memory
- pg_buffercache: evict tables or the entire cache
- amcheck: GIN index verification
SQL commands and built-in functions
- CREATE FOREIGN TABLE LIKE
- COPY ... TO: materialized view support
- The json_strip_nulls function removes empty values in arrays
- New array_sort function
pg_dump[all]/pg_restore: dump and restore statistics
commit: ce207d2a7, dbe6bd434, 779972e53, b391d882f, bde2fb797,650ab8aaf, 1fd1bd871, 34eb2a80d
The pg_dump
and pg_dumpall
utilities can now dump basic statistics for tables, columns, and indexes. And pg_restore
can now restore those statistics right back. By “basic statistics” here I mean the statistics stored in the pg_statistic
catalog. Extended statistics are still dumped via CREATE STATISTICS
and still have to be rebuilt upon recovery.
Both pg_dump
and pg_dumpall
get three new mutually exclusive options:
$ pg_dump --with-statistics | --no-statistics | --statistics-only
For backward compatibility, statistics are not dumped by default.
Let’s test it. Here’s a table:
CREATE TABLE tab (id int PRIMARY KEY);
INSERT INTO tab VALUES (1),(2),(3);
ANALYZE tab;
Run pg_dump
with the --statistics-only
option and grep the lines related to statistics:
$ pg_dump -t tab --statistics-only |egrep -v '^($|--|SET|.*set_config)'
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '190000'::integer,
'schemaname', 'public',
'relname', 'tab',
'relpages', '1'::integer,
'reltuples', '3'::real,
'relallvisible', '0'::integer,
'relallfrozen', '0'::integer
);
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
'version', '190000'::integer,
'schemaname', 'public',
'relname', 'tab',
'attname', 'id',
'inherited', 'f'::boolean,
'null_frac', '0'::real,
'avg_width', '4'::integer,
'n_distinct', '-1'::real,
'histogram_bounds', '{1,2,3}'::text,
'correlation', '1'::real
);
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '190000'::integer,
'schemaname', 'public',
'relname', 'tab_pkey',
'relpages', '2'::integer,
'reltuples', '3'::real,
'relallvisible', '0'::integer,
'relallfrozen', '0'::integer
);
The new function pg_restore_relation_stats
is called for tables and indexes, and pg_restore_attribute_stats
for table statistics.
For pg_restore
, the same three options are available:
$ pg_restore --with-statistics | --no-statistics | --statistics-only
But here the default option is --with-statistics
: everything that was dumped should be restored.
Regenerate statistics after server upgrade
commit: 1fd1bd871, 9c03c8d18, edba754f0, d5f1b6a75
The pg_upgrade
utility uses pg_dump
and pg_restore
to transfer the data schema to the new cluster. This patch makes pg_upgrade
run pg_dump
with the --with-statistics
option by default, making basic statistics transfer to the upgraded cluster.
This makes the recommendation to refresh statistics on all objects after an upgrade obsolete. However, the old cluster may have had extended statistics present. These should be refreshed.
Item 15 of the pg_upgrade manual recommends regenerating missing statistics after an upgrade:
$ vacuumdb --all --analyze-in-stages --missing-stats-only
--missing-stats-only
is the new option. It makes vacuumdb
analyze only the objects with missing statistics.
And only then run:
$ vacuumdb --all --analyze-only
This refreshes the cumulative statistics counters, which are used to trigger autoanalyze.
If for some reason statistics do not need to be migrated, you can run pg_upgrade
with the --no-statistics
parameter.
pg_upgrade --swap: migrate directories to a new cluster
commit: 626d7236b
Several pg_upgrade
options govern how data files are transferred from the old cluster to the new one: --copy
, --copy-file-range
, --clone
, --link
.
A new parameter --swap
adds a transfer method that is probably the fastest possible. Instead of copying files or generating new links, --swap
moves whole data directories. The more databases and objects in them (and therefore files), the greater the gain in transfer speed.
A word of caution: as soon as the catalog migration process begins, the old cluster will no longer be usable. Make sure you have a backup ready.
This patch is another addition to the optimization effort for upgrading clusters with a very large number of databases and objects. I reviewed previous optimization patches here and here.
pg_combinebackup --link: hard links instead of copying files
commit: 99aeb8470
pg_combinebackup
gets a new option --link
. It creates hard links in the full backup pointing to the files in the source backups. It helps save disk space, as well as save time on copying files.
The drawback is that any changes to the recovered copy (for example, after starting the server) will affect the original backup. Likewise, changes to the source backups will affect the recovered full backup.
pg_dump[all], pg_restore: --no-policies
commit: cd3c45125
The new --no-policies
option for pg_dump
and pg_dumpall
omits the row level security commands from the dump. Alternatively, pg_restore
with this option will skip such commands if they are present in the dump.
pg_createsubscriber: enable two_phase for all subscriptions
commit: e117cfb2f
The new pg_createsubscriber
option --enable-two-phase
switches the parameter two_phase on for all created subscriptions.
You can switch the parameter on manually after pg_createsubscriber
is done, but you will have to disable subscriptions first, which is not always convenient.
pg_createsubscriber: remove publications on subscriber
commit: e5aeed4b8
After running pg_createsubscriber
, the subscriber retains publications that were transferred while the subscriber was still a physical replica. If the publications are not needed, the new pg_createsubscriber
option --clean=publications
will remove them.
The option --clean
may get other arguments in the future, but for now only publications
is supported.
pg_createsubscriber: create subscriptions for every database on the publication server
commit: fb2ea12f4
When launching pg_createsubscriber
with the option -d
, you can list the publication server databases for which you want to create subscriptions.
If you want to replicate all databases, you can use the new option --all
instead. It will create a subscription for every database on the server, except for template databases and those with connections disabled.
psql: pipeline mode
commit: 41625ab8e, 3ce357584, 17caf6644, 2cce0fe44
libpq can run in pipeline mode since PostgreSQL 14. This mode allows sending multiple SQL commands to the server without waiting for each one to complete first. The results can be retrieved at a later time. When there is significant network latency between the client and the server, pipeline mode offers considerable optimization by reducing round-trip wait time.
The patch adds psql commands to facilitate pipeline mode.
Let’s demonstrate this. First, add artificial lag:
$ sudo tc qdisc add dev lo root netem delay 500ms
The testing script nopipeline.sql
sends three simple queries within a single transaction:
$ cat pg18/nopipeline.sql
BEGIN;
SELECT clock_timestamp();
SELECT clock_timestamp();
SELECT clock_timestamp();
COMMIT;
Time it:
$ time psql -X -t -h localhost -f pg18/nopipeline.sql
BEGIN
2025-06-20 15:14:09.353107+03
2025-06-20 15:14:10.353771+03
2025-06-20 15:14:11.35475+03
COMMIT
real 0m7,015s
user 0m0,001s
sys 0m0,006s
Another script pipeline.sql
runs nopipeline.sql
between the pipeline mode commands:
$ cat pg18/pipeline.sql
\startpipeline
\ir nopipeline.sql
\flushrequest
\getresults
\endpipeline
The total execution time is noticeably reduced:
$ time psql -X -t -h localhost -f pg18/pipeline.sql
BEGIN
2025-06-20 15:14:20.610045+03
2025-06-20 15:14:20.610109+03
2025-06-20 15:14:20.610149+03
COMMIT
real 0m4,014s
user 0m0,001s
sys 0m0,006s
The more small commands are executed in pipeline mode, the greater the optimization.
psql: current connection information
commit: bba2fbc62
The \conninfo
output has been reworked and now displays much more detailed information:
\conninfo
Connection Information
Parameter | Value
----------------------+----------
Database | demo
Client User | postgres
Socket Directory | /tmp
Server Port | 5401
Options |
Protocol Version | 3.0
Password Used | false
GSSAPI Authenticated | false
Backend PID | 28907
TLS Connection | false
Superuser | on
Hot Standby | off
(12 rows)
psql: set the default time interval for \watch
commit: 1a759c832
By default, the \watch
command query is repeated each 2 seconds. If you wanted a different interval, specifying it within the \watch
was the only way. Now, you can set it in the new psql
variable WATCH_INTERVAL
.
\echo :WATCH_INTERVAL
2
\set WATCH_INTERVAL 1
\t
SELECT to_char(now(), 'SS')\watch count=3
17
18
19
psql: \dx displays the default extension version
commit: d696406a9
After installing a new version of an extension in the file system, it is important to run ALTER EXTENSION ... UPDATE
. The new column Default version
in \dx
output now directly shows if the installed version is different from the default:
\dx
List of installed extensions
Name | Version | Default version | Schema | Description
---------+---------+-----------------+------------+------------------------------
plpgsql | 1.0 | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)
NUMA: monitoring tools for non-uniform memory access architectures
commit: 65c298f61, 8cc139bec, ba2a3c230
The patches add monitoring tools for architectures with non-uniform memory access (NUMA). So far, only for Linux. The PostgreSQL server must be built with the option --with-libnuma
.
There is a function to check if it was:
SELECT pg_numa_available();
pg_numa_available
-------------------
t
(1 row)
The pg_shmem_allocations_numa
view shows the shared memory distribution across NUMA nodes:
SELECT * FROM pg_shmem_allocations_numa LIMIT 3;
name | numa_node | size
------------------+-----------+--------
commit_timestamp | 0 | 274432
multixact_member | 0 | 270336
multixact_offset | 0 | 139264
(3 rows)
Another patch adds the pg_buffercache_numa
view to the pg_buffercache
extension: this one shows the distribution of shared memory buffers:
CREATE EXTENSION pg_buffercache;
SELECT * FROM pg_buffercache_numa LIMIT 3;
bufferid | os_page_num | numa_node
----------+-------------+-----------
1 | 0 | 0
1 | 1 | 0
2 | 2 | 0
(3 rows)
pg_stat_get_backend_wal: WAL statistics for a specific process
commit: 76def4cdd
The previous article talked about the new function pg_stat_get_backend_io
. It returned input/output statistics for a specific client process. This patch adds a sister function pg_stat_get_backend_wal
for client process WAL statistics. The function returns a single record in the pg_stat_wal
format:
SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid())
\gx
-[ RECORD 1 ]----+-------
wal_records | 161
wal_fpi | 48
wal_bytes | 228461
wal_buffers_full | 0
stats_reset |
One interesting application is queries joining pg_stat_activity
and pg_stat_get_backend_wal
. For example, here’s a query showing the most active client processes that write to WAL:
SELECT *
FROM pg_stat_activity a,
pg_stat_get_backend_wal(a.pid) b
WHERE a.backend_type = 'client backend'
ORDER BY b.wal_bytes DESC;
The statistics are automatically deleted when the client process ends. Alternatively, they can be reset manually with pg_stat_reset_backend_stats
.
EXPLAIN: actual rows shown to two decimal places
During execution plan analysis, you sometimes need to calculate the cardinality of individual plan nodes. The calculation is done by multiplying the number of actual rows by the number of loops.
In EXPLAIN ANALYZE
, the actual rows count was always rounded to an integer. This may affect calculations significantly in some cases, such as 0.49 rounding down to 0. The planner itself uses the precise actual value, but it was not reflected in the plan.
The new patch makes the actual rows count display with two decimal places for all plan nodes:
EXPLAIN (analyze, costs off, buffers off, summary off, timing off)
SELECT count(*) FROM tickets;
QUERY PLAN
--------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1.00 loops=1)
-> Gather (actual rows=3.00 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1.00 loops=3)
-> Parallel Seq Scan on tickets (actual rows=983285.67 loops=3)
(6 rows)
EXPLAIN: an interface for adding more options to the command
commit: c65bc2e1d, 50ba65e73, 8d5ceb113
Different options for the EXPLAIN
command affect what query plan information is displayed in the output. Internally, the planner works with much more information that remains hidden but may be useful in certain cases.
Now, extension developers get an interface for creating new options for the command that would display additional information about the query plan.
pg_overexplain has been added to contrib
as an example of such an extension. It adds two new EXPLAIN
options: debug
and range_table
. The extension does not create any database objects, so no need to run CREATE EXTENSION
. Just load the module in your session:
LOAD 'pg_overexplain';
Now you can use the new options:
EXPLAIN (debug)
SELECT * FROM bookings;
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on bookings (cost=0.00..34558.10 rows=2111110 width=21)
Disabled Nodes: 0
Parallel Safe: true
Plan Node ID: 0
PlannedStmt:
Command Type: select
Flags: canSetTag
Subplans Needing Rewind: none
Relation OIDs: 16422
Executor Parameter Types: none
Parse Location: 0 to end
(11 rows)
EXPLAIN (range_table)
SELECT * FROM bookings;
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on bookings (cost=0.00..34558.10 rows=2111110 width=21)
Scan RTI: 1
RTI 1 (relation, in-from-clause):
Eref: bookings (book_ref, book_date, total_amount)
Relation: bookings
Relation Kind: relation
Relation Lock Mode: AccessShareLock
Permission Info Index: 1
Unprunable RTIs: 1
(9 rows)
To find out exactly what additional information these two options output, check out the extension documentation or the source code.
Log failed lock attempts
commit: 6d376c3b0
The new parameter log_lock_failures enables extensive logging of failed attempts to obtain locks with the SELECT ... NOWAIT
command.
Start a session and lock a row:
BEGIN;
SELECT pg_current_xact_id(), pg_backend_pid();
pg_current_xact_id | pg_backend_pid
--------------------+----------------
981 | 112875
(1 row)
SELECT * FROM bookings WHERE book_ref = '000004' FOR UPDATE NOWAIT;
book_ref | book_date | total_amount
----------+------------------------+--------------
000004 | 2016-08-13 15:40:00+03 | 55800.00
(1 row)
Start another session and try to lock the same row:
SET log_lock_failures = on;
BEGIN;
SELECT pg_current_xact_id(), pg_backend_pid();
pg_current_xact_id | pg_backend_pid
--------------------+----------------
982 | 113146
(1 row)
SELECT * FROM bookings WHERE book_ref = '000004' FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row in relation "bookings"
ROLLBACK;
The failed lock attempt is meticulously logged:
\! tail -5 logfile
2025-08-07 11:06:16.882 MSK [113146] LOG: process 113146 could not obtain ShareLock on transaction 981
2025-08-07 11:06:16.882 MSK [113146] DETAIL: Process holding the lock: 112875, Wait queue: .
2025-08-07 11:06:16.882 MSK [113146] STATEMENT: SELECT * FROM bookings WHERE book_ref = '000004' FOR UPDATE NOWAIT;
2025-08-07 11:06:16.882 MSK [113146] ERROR: could not obtain lock on row in relation "bookings"
2025-08-07 11:06:16.882 MSK [113146] STATEMENT: SELECT * FROM bookings WHERE book_ref = '000004' FOR UPDATE NOWAIT;
Log session time-to-connect
commit: 9219093ca, 18cd15e70, cb1456423
The parameter log_connections controls logging of new session connections. Previously, the values were boolean (on and off). Now the parameter accepts a list of values, each responsible for various aspects of connection logging: receipt
, authentication
, authorization
, setup_durations
, or all
.
If the value is an empty string, no session connection logging will occur. The first three values together (receipt
, authentication
, authorization
) achieve the behavior identical to the previous boolean value on
.
The new value setup_durations
will log the time it takes to connect the client, including the backend process startup. This should make troubleshooting slow connection attempts easier.
A log example (log_connections = 'all'
):
2025-08-07 12:13:05.100 MSK [115446] LOG: connection received: host=[local]
2025-08-07 12:13:05.101 MSK [115446] LOG: connection authenticated: user="postgres" method=trust (/home/pal/master/data/pg_hba.conf:117)
2025-08-07 12:13:05.101 MSK [115446] LOG: connection authorized: user=postgres database=demo application_name=psql
2025-08-07 12:13:05.102 MSK [115446] LOG: connection ready: setup total=2.234 ms, fork=0.413 ms, authentication=0.208 ms
For backward compatibility, the boolean values on
, off
, true
, false
, yes
, no
, 1
, 0
are still supported, but have been deprecated.
log_line_prefix: local server IP address
commit: 3516ea768
The parameter listen_addresses takes one or many addresses that the server will accept connections at. If you need to know which address the client has connected to, you can add the new escape sequence %L
to the log_line_prefix parameter.
pg_stat_statements: normalize commands with lists of constants in IN
Before PostgreSQL 18, each of these three queries occupied a separate row in pg_stat_statements
. The only difference between them is the number of constants after the IN
operator.
SELECT pg_stat_statements_reset();
SELECT * FROM flights WHERE flight_id IN (1,2);
SELECT * FROM flights WHERE flight_id IN (1,2,3);
SELECT * FROM flights WHERE flight_id IN (1,2,3,4);
Now, only one command remains in the statements view. The list of constants is truncated to just one, and the following comment notes that there may have been more constants in the actual query.
SELECT queryid, query, calls
FROM pg_stat_statements
WHERE query ~ 'flights'\gx
-[ RECORD 1 ]-----------------------------------------------------
queryid | 7334393817289890276
query | SELECT * FROM flights WHERE flight_id IN ($1 /*, ... */)
calls | 3
Additional WAL buffer overflow monitoring tools
commit: eaf502747, ce5bcc4a9, 320545bfc, 6a8a7ce47
The pg_stat_wal
view with the wal_buffers_full
column appeared way back in PostgreSQL 14. The column says how many times the data was flushed to disk to avoid a buffer overflow. This is an important metric for tuning the wal_buffers value.
The wal_buffers_full
counter is now available at lower levels as well: in pg_stat_statements
, VACUUM
and ANALYZE
with verbose
option, and in EXPLAIN
with wal
option:
CREATE TABLE bookings_copy (LIKE bookings);
EXPLAIN (wal, analyze, summary off, buffers off, timing off, costs off)
INSERT INTO bookings_copy SELECT * FROM bookings;
QUERY PLAN
-------------------------------------------------------------
Insert on bookings_copy (actual rows=0.00 loops=1)
WAL: records=2111110 bytes=164490318 buffers full=18464
-> Seq Scan on bookings (actual rows=2111110.00 loops=1)
(3 rows)
Track vacuum and analyze delays
The patches add a new configuration parameter for tracking delays during vacuum and analyze operations:
\dconfig+ track_cost_delay_timing
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
-------------------------+-------+------+-----------+-------------------
track_cost_delay_timing | off | bool | superuser |
(1 row)
By default, the commands VACUUM
and ANALYZE
run without delays. Autovacuum and autoanalyze, on the other hand, have a delay built in.
\dconfig *vacuum_cost_(delay|limit)
List of configuration parameters
Parameter | Value
------------------------------+-------
autovacuum_vacuum_cost_delay | 2ms
autovacuum_vacuum_cost_limit | -1
vacuum_cost_delay | 0
vacuum_cost_limit | 200
(4 rows)
When tweaking cost_limit and cost_delay, you can now take into account the ratio of total execution time to delay time.
There are multiple ways to get the delay information:
- There is a new column
delay_time
inpg_stat_progress_vacuum
andpg_stat_progress_analyze
views. - Both
VACUUM
andANALYZE
with theverbose
option output adelay time
string. - A
delay time
string is recorded to the server log, under vacuum and analyze operations.
Let’s enable delay tracking and full autovacuum logging:
ALTER SYSTEM SET track_cost_delay_timing = on;
ALTER SYSTEM SET log_autovacuum_min_duration = 0;
SELECT pg_reload_conf();
Create a new non-empty table, and autovacuum will process it:
CREATE TABLE tickets_copy AS SELECT * FROM tickets;
SELECT 2949857
After a while, the delay time
records for the new table appeared in the server log:
2025-06-30 16:47:54.144 MSK [211678] LOG: automatic vacuum of table "demo.bookings.tickets_copy": index scans: 0
...
delay time: 11564.377 ms
...
2025-06-30 16:47:55.488 MSK [211678] LOG: automatic analyze of table "demo.bookings.tickets_copy"
delay time: 622.403 ms
...
vacuum_truncate: manage the truncation of empty pages at the end of a table
commit: 0164a0f9e
By default, the vacuum process truncates the empty pages at the ends of tables, reducing the file sizes. Some utilities even use this behavior to combat bloating. But there is a downside. When truncating, the vacuum process briefly exclusively locks the table. With frequent updates (for example, in queue tables), access to the table may be hindered, including queries on replicas.
PostgreSQL 18 introduces a new parameter vacuum_truncate. It enables (by default) or disables empty page truncation for all tables, affecting both manual and autovacuum.
Before, there were two ways to disable the truncation: run the vacuum manually with the option disabled:
VACUUM(truncate off);
or disable the parameter truncate for specific tables:
ALTER TABLE таблица SET (vacuum_truncate=off);
More frequent autovacuuming of dead tuples in large tables
commit: 306dc520b
Autovacuum triggers when the number of dead tuples in a table reaches the number defined as:
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × number_of_rows_in_the_table
This works great for small and medium tables, but with very large tables, the time between autovacuums may be quite significant. This is not always desirable.
To make autovacuum trigger for large tables more often, a new parameter autovacuum_vacuum_max_threshold has been added, and the calculation now looks like this:
least(
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × number_of_rows_in_the_table,
autovacuum_vacuum_max_threshold
)
The old formula works until its result does not exceed the new autovacuum_vacuum_max_threshold value. After that moment, autovacuum_vacuum_max_threshold is used.
\dconfig+ autovacuum_vacuum_[m|s|t]*
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
---------------------------------+-----------+---------+---------+-------------------
autovacuum_vacuum_max_threshold | 100000000 | integer | sighup |
autovacuum_vacuum_scale_factor | 0.2 | real | sighup |
autovacuum_vacuum_threshold | 50 | integer | sighup |
(3 rows)
With the default settings, the new parameter is applied to tables larger than 500 million rows.
Autovacuum triggers sooner after new rows are inserted
Autovacuum is triggered not only by the growing number of dead tuples, but also when new rows are inserted. Tables with data constantly added get autovacuumed when the number of new rows since the last autovacuum reaches:
autovacuum_vacuum_insert_threshold + autovacuum_vacuum_insert_scale_factor × number_of_rows_in_the_table
As the table grows, this autovacuum trigger will fire less and less frequently. To increase the frequency on larger tables, the formula was changed. Now it uses the ratio of new rows to the number of unfrozen rows in the table (more precisely, to the number of unfrozen pages).
autovacuum_vacuum_insert_threshold + autovacuum_vacuum_insert_scale_factor × number_of_unfrozen_pages_in_the_table
The other patch adds a new column relallfrozen
to pg_class
, displaying the value conveniently.
Eager freezing to amortize aggressive vacuum
commit: 052026c9b
Regular, non-aggressive vacuuming only skips pages with all-visible yet unfrozen tuples (marked all_visible
in the Visibility Map and not marked all_frozen
). For workload types with few changes (for example, data is mostly added), such pages can accumulate quickly. These pages will remain unfrozen until aggressive vacuuming comes around (configured by vacuum_freeze_table_age).
Aggressive vacuuming checks all unfrozen pages, visible or not. It’s highly likely that many of these pages have been evicted not only from the buffer cache, but also from the OS file cache. In that case, they will have to be read from disk, leading to a significant drop in performance.
To facilitate aggressive vacuuming, regular vacuuming now also goes through all_visible
pages and tries to make each one all_frozen
. This is referred to as eager vacuuming.
To avoid having to freeze all qualifying pages at once, eager vacuuming stops checking all_visible
pages when the number of successfully frozen pages reaches 20% (this ratio is hard-coded) of all_visible
-but-not-all_frozen
pages.
Eager vacuuming also leaves all_visible
pages alone if the counter for failed attempts to make pages all_frozen
exceeds the new parameter vacuum_max_eager_freeze_failure_rate. By default, this is 3% of the total number of pages in a table:
\dconfig+ vacuum_max_eager_freeze_failure_rate
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
--------------------------------------+-------+------+---------+-------------------
vacuum_max_eager_freeze_failure_rate | 0.03 | real | user |
(1 row)
Set it to 0
to disable eager vacuuming altogether.
These changes should help increase the speed of aggressive vacuuming by letting it skip more pages already marked as all_frozen
.
Asynchronous input/output
commit: fdd146a8e, 93bc3d75d, 46250cdcb, 60f566b4f, 2a5e709e7, ae3df4b34, f4d0730bb, b27f8637e, 12ce89fd0, 047cba7fa, 50cb7505b, c325a7633, 247ce06b8, 55b454d0e, da7226993, ...
Asynchronous I/O in PostgreSQL is now a reality. The method of asynchronous operations is defined by the new parameter io_method
:
\dconfig+ io_method
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
-----------+--------+------+------------+-------------------
io_method | worker | enum | postmaster |
(1 row)
There are three methods available in total.
The first method, worker
, is available on all platforms and is used by default. Asynchronous operations are performed by separate processes. The number of processes is defined by io_workers (3 by default):
SELECT pid
FROM pg_stat_activity
WHERE backend_type = 'io worker';
pid
-------
17630
17631
17632
(3 rows)
The second method, io_uring
, is available on Linux and provides maximum efficiency there. The server must be built with the option --with-liburing
.
The third method is sync
and stands for synchronous mode. This is the traditional synchronous behavior.
It’s hard to estimate the performance gain right now. We will have to see how asynchronous I/O behaves in production environments first.
You can check out the technical description of the implementation in this README.md.
io_combine_limit: increase maximum allowed limit to 1MB
First introduced in PostgreSQL 17, the read stream interface is controlled by the parameter io_combine_limit. Its default value is 128KB, and the maximum used to be 256KB.
The new patches don’t change the default value, but increase the maximum to 1MB. It cannot exceed the value of the new parameter io_max_combine_limit, which cannot be changed without restarting the server.
\dconfig+ io_*combine_limit
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
----------------------+-------+---------+------------+-------------------
io_combine_limit | 128kB | integer | user |
io_max_combine_limit | 128kB | integer | postmaster |
(2 rows)
In fact, io_combine_limit may exceed io_max_combine_limit, but the smaller of the two values will be used anyway.
Read stream interface usage
commit: 32acad7d1, 925682260, c3e775e60, c5c239e26, 69273b818, e215166c9, 2b73a8cd3, d9c7911e1, 043799fa0
The read stream interface is now used not only for sequential scanning and analysis (as in PostgreSQL 17), but also for the following operations:
- Table and index vacuuming (Btree, GiST, SP-GiST)
- Bitmap heap scanning
- Buffer cache pre-warming (
pg_prewarm
) - Relation verification (the
verify_heapam
function of theamcheck
module)
BTree indexes: skipping the condition on the index’s leading column
commit: 9a2e2a285, 92fe23d93, 0fbceae84
It is better to see this optimization right away than to explain it beforehand.
In the sales
table, data can be analyzed in two dimensions: by departments (deptno_id
) and by products (product_id
):
CREATE TABLE sales (
deptno_id int,
product_id int,
amount numeric(20,2)
);
INSERT INTO sales
SELECT random(1,100) AS deptno_id,
random(1,1000) AS product_id,
random (0.01, 10_000.00) AS amount
FROM generate_series(1, 10_000_000);
To speed up queries, a multicolumn index has been created with deptno_id
in the first position:
CREATE INDEX idx ON sales (deptno_id, product_id);
VACUUM ANALYZE sales;
Such an index is ideal for queries where you need to filter data by departments and products simultaneously. It can also be used in queries only by departments.
But will the index be used in queries by product only? Let’s try:
EXPLAIN (analyze, costs off, summary off, timing off)
SELECT COUNT(*)
FROM sales
WHERE product_id = 42;
QUERY PLAN
------------------------------------------------------------------------
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=506
-> Index Only Scan using idx on sales (actual rows=9887.00 loops=1)
Index Cond: (product_id = 42)
Heap Fetches: 0
Index Searches: 102
Buffers: shared hit=506
The index is used! But how is this possible?
Imagine that another condition is implicitly added to the WHERE
clause:
WHERE product_id = 42 AND deptno_id = ANY(list of all possible values)
This is how PostgreSQL has been able to run indexes on columns like deptno_id
since version 17.
First, a list of all possible values of the deptno_id
column is collected. It’s just a matter of traversing the index, since the column is in the first position. Then it takes the first value from the list and searches the index for all rows where product_id = 42
. After that, it takes the second value in the list and again searches the index for rows with the desired product. And so on for each element in the list of departments,
scanning the index multiple times. The exact number of passes through the index is shown in the plan in the separate line Index Searched
. In our example, this is 102: 100 for the number of departments and another 2 apparently for building the department list.
Of course, if the index began with the product_id
column, it would be scanned only once and this would be more efficient. But to do this you need to create a second index, which incurs significant overhead. However, even with multiple index scans, the query read only 506 pages, while without the index it would have had to read the entire table, which is two orders of magnitude more:
SELECT relpages FROM pg_class WHERE oid = 'sales'::regclass;
relpages
----------
54055
(1 row)
GiST indexes: fast index creation using the sortsupport method
The ability to speed up GiST index creation using the sortsupport method first appeared in PostgreSQL 14. To do this, you need to define a support comparison function for two values for the corresponding operator classes. In PostgreSQL 14, such a function was created for point_ops
operator class, which made it possible to significantly speed up index creation for the point
type.
In PostgreSQL 18, support comparison functions were added for the operator classes used in the btree_gist
extension, as well as for range_ops
. Since the sortsupport method for creating indexes is the fastest, it is used by default:
CREATE TABLE t
AS
SELECT g.id,
format('[%s,%s]', -1*random(0,1000000), random(0,1000000))::int4range AS r
FROM generate_series(1,1_000_000) AS g(id);
CREATE INDEX ON t USING gist(id, r);
Time: 762,247 ms
If the quality of the index has deteriorated, you can create the index as before using the buffering method, explicitly specifying it in the index parameters:
CREATE INDEX ON t USING gist(id, r) WITH (buffering=on);
Time: 17579,742 ms (00:17,580)
Planner: replacing IN (VALUES ...) with array search
commit: c0962a113
PostgreSQL 17 plan:
EXPLAIN (costs off)
SELECT * FROM flights
WHERE flight_id IN (VALUES(1),(2),(3));
QUERY PLAN
------------------------------------------------------
Nested Loop
-> HashAggregate
Group Key: "*VALUES*".column1
-> Values Scan on "*VALUES*"
-> Index Scan using flights_pkey on flights
Index Cond: (flight_id = "*VALUES*".column1)
(6 rows)
In PostgreSQL 18, the IN (VALUES ...)
construct will be replaced by array search, which in many cases is more efficient:
QUERY PLAN
--------------------------------------------------------
Index Scan using flights_pkey on flights
Index Cond: (flight_id = ANY ('{1,2,3}'::integer[]))
(2 rows)
Planner: eliminating redundant self-joins
commit: fc069a3a6
The new optimization allows the planner to avoid redundant scans of the table if the query uses a self-join on a unique integrity constraint. A simple example:
CREATE TABLE t (id int PRIMARY KEY);
EXPLAIN (costs off)
SELECT *
FROM t a1 JOIN t a2 USING (id)
WHERE a1.id > 0 OR a2.id < 1000;
QUERY PLAN
-------------------------------------
Seq Scan on t a2
Filter: ((id > 0) OR (id < 1000))
(2 rows)
The optimization can be controlled by the enable_self_join_elimination parameter, which is enabled by default:
\dconfig enable_self_join_elimination
List of configuration parameters
Parameter | Value
------------------------------+-------
enable_self_join_elimination | on
(1 row)
If the parameter is disabled, we will get a plan with two table scans:
QUERY PLAN
------------------------------------------------
Hash Join
Hash Cond: (a1.id = a2.id)
Join Filter: ((a1.id > 0) OR (a2.id < 1000))
-> Seq Scan on t a1
-> Hash
-> Seq Scan on t a2
(6 rows)
PL/pgSQL: passing named values of cursor parameters using =>
commit: 246dedc5d
In previous PostgreSQL versions, when passing cursor parameters by name, only the syntax name := value
was allowed, although for subprogram parameters two name-value separators are possible :=
and =>
.
Now both separators are available for cursor parameters as well:
DO $$ DECLARE
cur CURSOR(x int) FOR
SELECT g.x
FROM generate_series(1,100) AS g(x)
WHERE g.x = cur.x;
BEGIN
OPEN cur (x => 42); CLOSE cur;
OPEN cur (x := 42); CLOSE cur;
END;
$$;
Support of the =>
separator will facilitate migration from Oracle PL/SQL, where this separator is used for passing parameters by name.
multiple_unique_conflicts: another logical replication conflict
commit: 73eba5004
Classification of logical replication conflicts appeared in PostgreSQL 18, as described in the September CommitFest article. Conflicts are described in the documentation, in the pg_stat_subscription_stats
view there are conflict counters, and messages about occurred conflicts are logged in the server log.
Now the list of conflicts has grown. The new conflict is multiple_unique_conflicts.
It occurs when, upon applying the row, it turns out that not just one but several unique constraints are violated at once. This information makes it easier to resolve the conflict in one go.
idle_replication_slot_timeout: canceling inactive slots by timeout
commit: ac0e33136
The new parameter idle_replication_slot_timeout helps cancel inactive replication slots. After the timeout occurs, the next checkpoint process cancels the inactive slots.
max_active_replication_origins: maximum number of replication origins
commit: 04ff636cb
The maximum number of replication origins was determined by the same parameter as the number of replication slots: max_replication_slots. However, replication slots are used on the publisher when creating publications, while replication origins are used and needed for subscriptions on the subscriber.
The new parameter max_active_replication_origins controls the maximum number of replication origins, whereas max_replication_slots still limits the maximum number of replication slots.
\dconfig+ max_active_replication_origins
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
--------------------------------+-------+---------+------------+-------------------
max_active_replication_origins | 10 | integer | postmaster |
(1 row)
Support for the OAuth 2.0 authorization protocol
commit: b3f0be788
Support for the OAuth authorization protocol version 2.0 has been added. To use it, you need to set up the new oauth authorization and authentication method in pg_hba.conf
.
This approach centralizes authorization and access management in one place, an external service, without the need to enter passwords when connecting.
Clients working via libpq
can set the appropriate parameters for authorization via OAuth. To verify authorization, you need to connect a verification module (set in the oauth_validator_libraries parameter). Verification modules are not shipped with the PostgreSQL server, so you will need to develop them yourself according to the interface described in the documentation, or use third-party solutions.
dblink: using pass-through SCRAM authentication
commit: 3642df265
The previous article already discussed using the new use_scram_passthrough parameter to connect to external databases through postgres_fdw
without specifying a password in the USER MAPPING
settings.
A similar capability has also been added in the dblink
extension. To set up pass-through SCRAM authentication in dblink_fdw
, use the similar parameter use_scram_passthrough. See the link above for configuration examples.
pgcrypto: support for sha256crypt and sha512crypt password hashing algorithms
commit: 749a9e20c
The pgcrypto
extension’s crypt
and get_salt
functions now support SHA-2-based algorithms sha256crypt and sha512crypt:
WITH cte AS (
SELECT crypt('Hello, World!', gen_salt('sha256crypt')) AS passwd
)
SELECT cte.passwd,
crypt('Hello, World!', cte.passwd) = cte.passwd AS match
FROM cte\gx
-[ RECORD 1 ]-----------------------------------------------------------------------
passwd | $5$rounds=5000$O57oLxAqYK9wqn9m$qgCp3qJAoeS7ufx9cmX5ZYqfk2x5cO3/FtRCpQYa6D/
match | t
pgcrypto: support for CFB mode for the AES encryption standard
commit: 9ad1b3d01
The encrypt
and decrypt
functions of the pgcrypto
extension now support Cipher Feedback Mode (CFB):
WITH cte AS (
SELECT encrypt('Hello, World!', '0123456789', 'aes-cfb') AS encrypted_str
)
SELECT encode(
decrypt(cte.encrypted_str, '0123456789', 'aes-cfb'),
'escape'
)
FROM cte;
encode
---------------
Hello, World!
(1 row)
ALTER DEFAULT PRIVILEGES and large objects
commit: 0d6c47766
Default privileges for large objects can now be configured using the ALTER DEFAULT PRIVILEGES
command:
ALTER DEFAULT PRIVILEGES GRANT SELECT ON LARGE OBJECTS TO public;
SELECT lo_create(42);
lo_create
-----------
42
(1 row)
\lo_list+
Large objects
ID | Owner | Access privileges | Description
----+----------+----------------------+-------------
42 | postgres | =r/postgres +|
| | postgres=rw/postgres |
(1 row)
libpq: the sslkeylogfile parameter
commit: 2da74d8d6
A new connection parameter sslkeylogfile of the libpq library sets the name of the file where information about TLS connection secrets is recorded in the NSS format.
This parameter is intended for debugging the client’s interaction with the server.
Temporal primary, unique, and foreign keys
commit: fc0438b4e, 89f908a6d, 34768ee36
Working with temporal data is described in the SQL standard. In PostgreSQL 17, the first attempt was made to implement temporal integrity constraints. But shortly before the release, support for primary, unique, and foreign keys was reverted. This is the second attempt.
When created, a temporal primary key must include a time-range column with the keywords WITHOUT OVERLAPS
:
CREATE EXTENSION btree_gist;
CREATE TABLE prices(
product_id int,
price numeric,
during daterange,
PRIMARY KEY (product_id, during WITHOUT OVERLAPS)
);
In the prices
table, each row is identified by the combination of product_id
and the date range in during
. This ensures that the same product cannot have different prices at the same point in time.
INSERT INTO prices VALUES
(1, 42, '[2024-01-01, 2025-01-01)'),
(1, 43, '[2025-01-01, 2026-01-01)');
INSERT 0 2
INSERT INTO prices VALUES
(1, 44, '[2025-09-01, 2026-01-01)');
ERROR: conflicting key value violates exclusion constraint "prices_pkey"
DETAIL: Key (product_id, during)=(1, [2025-09-01,2026-01-01)) conflicts with existing key (product_id, during)=(1, [2025-01-01,2026-01-01)).
Such a constraint can also be implemented without temporal keys, with the EXCLUDE
integrity constraint. To include an additional column in the constraint, the btree_gist
extension is required. Temporal keys also require the btree_gist
extension, and internally an EXCLUDE
constraint is used, as seen from the error message.
A temporal primary or unique key can be referenced by a temporal foreign key. The keyword PERIOD
is used in the definition of such a key. In the following example, a product may not be included in a commercial offer if the date range in the offer does not fall within the date range of the price table.
CREATE TABLE offers(
offer_id int,
product_id int,
during daterange,
PRIMARY KEY (offer_id, product_id),
FOREIGN KEY (product_id, PERIOD during)
REFERENCES prices (product_id, PERIOD during)
);
INSERT INTO offers VALUES
(1, 1, '[2025-09-01, 2026-03-01)');
ERROR: insert or update on table "offers" violates foreign key constraint "offers_product_id_during_fkey"
DETAIL: Key (product_id, during)=(1, [2025-09-01,2026-03-01)) is not present in table "prices".
So far, the full SQL standard functionality for temporal keys has not been implemented. The work continues. Perhaps we will see new capabilities in future versions.
Virtual computed columns
Computed columns were introduced back in PostgreSQL 12. The following construct is used to define them:
GENERATED ALWAYS AS (expr) STORED
The STORED
keyword indicates that when a row is inserted or updated, the column value is computed and written to disk.
The SQL standard also provides another way to define computed columns: VIRTUAL
, where the value is not stored on disk but is calculated when the column value is accessed. This method is now supported in PostgreSQL 18.
Let’s create a virtual column that extracts the timestamp from the primary key value:
CREATE TABLE t (
id uuid PRIMARY KEY,
creation_date timestamptz
GENERATED ALWAYS AS (uuid_extract_timestamp(id)) VIRTUAL
);
In many cases this can be considered the record creation time:
INSERT INTO t
SELECT uuidv7()
FROM generate_series(1,3);
SELECT * FROM t;
id | creation_date
--------------------------------------+----------------------------
01990953-f898-7c70-926e-2812a5972378 | 2025-09-02 10:28:42.136+03
01990953-f899-7177-9035-ceeff8f87ea5 | 2025-09-02 10:28:42.137+03
01990953-f899-71f6-8bde-52e6b1c7df39 | 2025-09-02 10:28:42.137+03
(3 rows)
The creation_date
column takes up no space on disk, but it can be useful for data analysis.
Virtual computed columns also have disadvantages. For example, they cannot be indexed.
NOT NULL integrity constraints: creation without validation, enabling and disabling inheritance
NOT NULL
integrity constraints are now recorded in pg_constraint
. Therefore, they can be referred to by name. What does it do for us?
The constraint can be created without checking existing records (first commit):
CREATE TABLE t (id int);
INSERT INTO t VALUES (null);
ALTER TABLE t ADD CONSTRAINT id_nn NOT NULL id NOT VALID;
\d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
id | integer | | not null | | plain | | |
Not-null constraints:
"id_nn" NOT NULL "id" NOT VALID
Access method: heap
And later perform the check, correcting the data if necessary:
ALTER TABLE t VALIDATE CONSTRAINT id_nn;
ERROR: column "id" of relation "t" contains null values
UPDATE t SET id = 1;
UPDATE 1
ALTER TABLE t VALIDATE CONSTRAINT id_nn;
ALTER TABLE
The INHERIT
property can be set and removed (second commit). When set, the constraint will be added to all child tables, if any.
ALTER TABLE t ALTER CONSTRAINT id_nn INHERIT;
After inheritance is disabled, the corresponding constraints on the child tables will remain, but as independent constraints not linked to the parent table.
ALTER TABLE t ALTER CONSTRAINT id_nn NO INHERIT;
You can also drop a NOT NULL
constraint by name:
ALTER TABLE t DROP CONSTRAINT id_nn;
Integrity constraints: NOT ENFORCED
commit: ca87c415e, eec0040c4, b663b9436
According to the SQL standard, an integrity constraint can be declared as NOT ENFORCED
. This capability has been added in PostgreSQL 18 for foreign keys and CHECK
constraints.
CREATE TABLE t (
id int PRIMARY KEY,
parent_id int REFERENCES t NOT ENFORCED
);
Internal triggers implementing the foreign key will not be created, and for new records (unlike NOT VALID
) no checks will be performed.
You can add a new disabled constraint:
ALTER TABLE t ADD CONSTRAINT id_check CHECK (id > 0) NOT ENFORCED;
\d t
Table "public.t"
Column | Type | Collation | Nullable | Default
-----------+---------+-----------+----------+---------
id | integer | | not null |
parent_id | integer | | |
Indexes:
"t_pkey" PRIMARY KEY, btree (id)
Check constraints:
"id_check" CHECK (id > 0) NOT ENFORCED
Foreign-key constraints:
"t_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES t(id) NOT ENFORCED
Referenced by:
TABLE "t" CONSTRAINT "t_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES t(id) NOT ENFORCED
You can enable a foreign key constraint; this will create all necessary internal triggers, check all rows in the table, and the constraint will take effect:
ALTER TABLE t ALTER CONSTRAINT t_parent_id_fkey ENFORCED;
However, this will not work with CHECK
constraints:
ALTER TABLE t ALTER CONSTRAINT id_check ENFORCED;
ERROR: cannot alter enforceability of constraint "id_check" of relation "t"
To enable it, you will need to drop it and recreate it.
During the implementation, an issue arose with partitioned tables. Constraints created as NOT ENFORCED
are at the same time marked as NOT VALID
. For partitioned tables, foreign keys with NOT VALID
were not supported. This has been fixed (third commit).
file_copy_method: method for copying files when creating a database
commit: f78ca6f3e
In PostgreSQL 15, it became possible to choose the strategy for creating a new database. Two strategies are available:
WAL_LOG
is a strategy based on recording changes in the WAL and is well suited for creating databases from small template databases, for example,template1
. This strategy is used by default.FILE_COPY
is copying files; suitable for cloning large databases.
In PostgreSQL 18, for the FILE_COPY
strategy, in the new parameter file_copy_method you can specify the method of copying files.
\dconfig+ file_copy_method
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
------------------+-------+------+---------+-------------------
file_copy_method | copy | enum | user |
(1 row)
The default value is COPY
, which corresponds to the old behavior. The new CLONE
value uses the system call copy_file_range
on Linux and FreeBSD or copyfile
on macOS. Support for other operating systems may be added in the future. The CLONE
value will be useful for file systems that support copy-on-write (COW) mechanisms.
The file_copy_method parameter is used not only when creating a database with the CREATE DATABASE
command:
CREATE DATABASE ... STRATEGY=FILE_COPY
but also when moving the database to another tablespace:
ALTER DATABASE ... SET TABLESPACE ...
extension_control_path: location of extension control files
commit: 4f7f7b037
The new extension_control_path parameter specifies the directory where to look for extension control files. This may be useful if you need to place extensions outside the PostgreSQL installation, for example when testing extensions.
\dconfig+ extension_control_path
List of configuration parameters
Parameter | Value | Type | Context | Access privileges
------------------------+---------+--------+-----------+-------------------
extension_control_path | $system | string | superuser |
(1 row)
The default value $system
points to the previous location in the SHAREDIR
directory:
SELECT setting || '/extension' FROM pg_config() WHERE name = 'SHAREDIR';
?column?
--------------------------------------------
/home/pal/master/share/postgresql/extension
(1 row)
Nondeterministic collations: support for substring search functions
commit: 329304c90
For strings with non-deterministic collations in PostgreSQL 18, you can use not only the LIKE search, but also the substring search implemented in the functions position
, strpos
, replace
, split_part
, string_to_array
, string_to_table
.
CREATE COLLATION ignore_case
(provider = icu, locale = 'und-u-ks-level2', deterministic = false);
SELECT strpos('PostgreSQL' COLLATE "ignore_case", 'sql');
strpos
--------
8
(1 row)
The gamma and lgamma functions
commit: a3b6dfd41
Functions gamma and lgamma are now available in SQL.
SELECT g.x, gamma(g.x), lgamma(g.x),
factorial(g.x-1), ln(gamma(g.x))
FROM generate_series(1,5) AS g(x);
x | gamma | lgamma | factorial | ln
---+-------+--------------------+-----------+--------------------
1 | 1 | 0 | 1 | 0
2 | 1 | 0 | 1 | 0
3 | 2 | 0.6931471805599453 | 2 | 0.6931471805599453
4 | 6 | 1.791759469228055 | 6 | 1.791759469228055
5 | 24 | 3.1780538303479458 | 24 | 3.1780538303479458
(5 rows)
Conversion of integer types to bytea and back
commit: 6da469bad
Integer types (smallint
, int
, and bigint
) can be converted to bytea
. Reverse conversion is also supported:
SELECT 42::bytea::int;
int4
------
42
(1 row)
The pg_get_loaded_modules function: information about libraries loaded into shared memory
Some extensions do not have an SQL interface; their functionality is hidden in the library loaded into shared memory. One example is auto_explain
.
For such extensions, it is difficult to understand which version is currently being used. The new pg_get_loaded_modules
function helps answer this question by showing the loaded libraries and their versions:
LOAD 'auto_explain';
SELECT * FROM pg_get_loaded_modules();
module_name | version | file_name
--------------+---------+-----------------
auto_explain | 18beta1 | auto_explain.so
(1 row)
pg_buffercache: evict tables or the entire cache
commit: dcf7e1697
In PostgreSQL 17, the pg_buffercache
extension got the pg_buffercache_evict
function, allowing you to evict any buffer from the shared buffer cache. But to evict all buffers of a table or index, or to clear the cache entirely, you had to call the function repeatedly, which was not so much laborious as it is slow. And for tests and experiments, such functionality can be very useful.
To accelerate mass cache emptying, the pg_buffercache
extension adds the functions pg_buffercache_evict_relation
and pg_buffercache_evict_all
, which evict all unpinned buffers of the specified relation or the entire cache, respectively.
Let’s run a query on the tickets
table and see how many buffers it occupies in the cache:
CREATE EXTENSION pg_buffercache;
SELECT count(*) FROM tickets;
count
---------
2949857
(1 row)
SELECT count(*)
FROM pg_buffercache
WHERE relfilenode = (
SELECT relfilenode
FROM pg_class
WHERE oid = 'tickets'::regclass
);
count
-------
282
(1 row)
Let’s evict all buffers of the table from the cache:
SELECT * FROM pg_buffercache_evict_relation('tickets'::regclass);
buffers_evicted | buffers_flushed | buffers_skipped
-----------------+-----------------+-----------------
282 | 0 | 0
(1 row)
SELECT count(*)
FROM pg_buffercache
WHERE relfilenode = (
SELECT relfilenode
FROM pg_class
WHERE oid = 'tickets'::regclass
);
count
-------
0
(1 row)
It is very likely that in the next version of our upcoming DBA2 course, the new extension functionality will find a use in buffer cache operations demonstrations.
amcheck: GIN index verification
commit: 14ffaece0
The gin_index_check
function has been added to the amcheck extension to check GIN indexes.
CREATE FOREIGN TABLE LIKE
commit: 302cf1575
You can now define the structure of an foreign table using the LIKE
keyword. The structure will be copied from the local table. This is convenient for working with foreign data wrappers that do not support the IMPORT FOREIGN SCHEMA
command.
For example, if we want to load booking data into the bookings
table via file_fdw
, we can create an intermediate foreign table by running:
CREATE EXTENSION file_fdw;
CREATE SERVER srv FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE ft_bookings
(LIKE bookings INCLUDING all EXCLUDING comments)
SERVER srv OPTIONS (program 'some_program');
\d ft_bookings
Foreign table "bookings.ft_bookings"
Column | Type | Collation | Nullable | Default | FDW options
--------------+--------------------------+-----------+----------+---------+-------------
book_ref | character(6) | | not null | |
book_date | timestamp with time zone | | not null | |
total_amount | numeric(10,2) | | not null | |
Server: srv
FDW options: (program 'some_program')
Among other things,INCLUDING
supports: comments
, constraints
, defaults
, generated
, statistics
, and all
. The list is smaller than for regular tables, because not all properties make sense for foreign tables.
COPY ... TO: materialized view support
commit: 534874fac
To flush data from materialized views, the COPY
command had to use a subquery. Something like this:
COPY (SELECT * FROM mat_view) TO ...
Now COPY TO
can work directly with materialized views:
COPY mat_view TO ...
The json_strip_nulls function removes empty values in arrays
commit: 4603903d2
The json[b]_strip_nulls
function removes all NULL
fields from a json. But if the field is an array, then the array elements which are NULL
remain. Here, the field f2
is deleted, but the NULL
value in the array element of the field f3
remains:
SELECT json_strip_nulls('{"f1":1, "f2":null, "f3":[1,2,null]}');
json_strip_nulls
--------------------------
{"f1":1,"f3":[1,2,null]}
(1 row)
The new parameter strip_in_arrays
allows you to remove NULL
from array elements as well:
SELECT json_strip_nulls('{"f1":1, "f2":null, "f3":[1,2,null]}',
strip_in_arrays=>true
);
json_strip_nulls
---------------------
{"f1":1,"f3":[1,2]}
(1 row)
New array_sort function
As the name implies, the function sorts arrays of any type. Additional parameters allow sorting in reverse order or putting NULL
values at the beginning of the array.
SELECT array_sort(ARRAY[42.2,null,21.1,5,10],
descending=>true,
nulls_first=>true
);
array_sort
-----------------------
{NULL,42.2,21.1,10,5}
(1 row)
And the second commit optimizes array sorting.
That’s all for the last PostgreSQL 18 CommitFest. Time to start exploring the new features of PostgreSQL 19.