Home > Blog

April 29 • PostgreSQL • By Pavel Luzanov

PostgreSQL 18: part 4 or CommitFest 2025-01

PostgreSQL

Source:
habr.com

Subscribe to blog

Source:
habr.com

Subscribe to blog

We continue to follow the news about PostgreSQL 18. The January CommitFest brings in some notable improvements to monitoring, as well as other new features.

You can find previous reviews of PostgreSQL 18 CommitFests here: 2024-07, 2024-09, 2024-11.

EXPLAIN (analyze): buffers on by default
pg_stat_io: input/output statistics in bytes instead of pages
pg_stat_io: WAL statistics
pg_stat_get_backend_io: I/O statistics for a specific process
VACUUM(verbose): visibility map information
Total vacuum and analysis time per table
autovacuum: change the number of workers without restarting the server
psql: connection service information
psql: expanded display for \d* commands
psql: leakproof flag in \df* output
jsonb: null conversion to other types
MD5 encryption algorithm: end-of-support preparations
New function uuidv7
postgres_fdw: SCRAM authentication without storing the password
passwordcheck: minimum password length
New function casefold and pg_unicode_fast collation
DML commands: RETURNING with OLD and NEW
to_number: convert a string of roman numberals to numeric

EXPLAIN (analyze): buffers on by default

commit: c2a4078eb

After a lengthy discussion, the decision to display buffer information in EXPLAIN ANALYZE by default has prevailed.

Time to get used to the more verbose output:

EXPLAIN (analyze, costs off)
SELECT * FROM bookings;

                               QUERY PLAN                               
------------------------------------------------------------------------
 Seq Scan on bookings (actual time=0.086..119.142 rows=2111110 loops=1)
   Buffers: shared hit=160 read=13287
 Planning:
   Buffers: shared hit=57
 Planning Time: 0.543 ms
 Execution Time: 180.270 ms
(6 rows)

However, a part of the commit message notes the discord regarding the decision and even suggests a rollback before the official release in case of a major uproar.

pg_stat_io: input/output statistics in bytes instead of pages

commit: f92c854cf

The pg_stat_io view gets three new columns: read_bytes, write_bytes, and extend_bytes. They track the volumes of read, write, and extend operations in bytes. The operation counts still reside under reads, writes, extends.

Previously, to get the volumes, you had to take the page size from op_bytes and multiply it by the reads, writes, or extends count. Since there are separate columns for the volumes, the op_bytes column is removed.

This patch enables tracking additional I/O operations that are unrelated to the buffer cache in the pg_stat_io view: for example, WAL operations, where the volume of an operation is unrelated to the page size.

Speaking of which, even for buffer cache operations, itʼs not one-page-size-fits-all. The parameter io_combine_limit (introduced in PostgreSQL 17) allows grouping some read operations into larger chunks: sequential scan, analyze, and some others.

Letʼs test io_combine_limit with sequential scan. Run a tiny psql script:

$ cat pg18/pg_stat_io.sql

\o /dev/null
SELECT pg_stat_reset_shared('io');
SET io_combine_limit = :'io_combine_limit';

CREATE TABLE bookings_copy
AS SELECT * FROM bookings;
DROP TABLE bookings_copy;
\o
SELECT context,
       reads, read_bytes,
       writes, write_bytes,
       extends, extend_bytes
FROM pg_stat_io
WHERE backend_type = 'client backend' AND
      object = 'relation' AND
      (reads > 0 OR writes > 0);

The script resets statistics, sets the io_combine_limit, then executes CREATE TABLE AS SELECT. Then, the bookings table is scanned sequentially. The last query returns the I/O statistics.

Run the script with the default io_combine_limit:

$ psql -f pg18/pg_stat_io.sql -v io_combine_limit='128kB'

  context  | reads | read_bytes | writes | write_bytes | extends | extend_bytes
-----------+-------+------------+--------+-------------+---------+--------------
 bulkread  |   789 |   99753984 |      0 |           0 |         |             
 bulkwrite |     0 |          0 |  11456 |    93847552 |     217 |    110624768
(2 rows)

When the io_combine_limit is doubled, the number of read operations is just about halved:

$ psql -f pg18/pg_stat_io.sql -v io_combine_limit='256kB'

  context  | reads | read_bytes | writes | write_bytes | extends | extend_bytes
-----------+-------+------------+--------+-------------+---------+--------------
 bulkread  |   396 |   99491840 |      0 |           0 |         |             
 bulkwrite |     0 |          0 |  11456 |    93847552 |     217 |    110624768
(2 rows)

pg_stat_io: WAL statistics

commit: a051e71e2

Thanks to the patch above, I/O statistics are no longer tied to the page size, making it possible to start tracking WAL statistics in pg_stat_io.

WAL operations are tracked across various processes, and within two contexts:

normal – operations with existing WAL segments,
init – operations that include initializing a new WAL segment.

For example:

SELECT backend_type, object, context,
       writes, write_bytes, pg_size_pretty(round(write_bytes/writes)) bytes_per_op
FROM pg_stat_io
WHERE object = 'wal' AND writes > 0;

   backend_type    | object | context | writes | write_bytes | bytes_per_op
-------------------+--------+---------+--------+-------------+--------------
 client backend    | wal    | init    |      2 |    33554432 | 16 MB
 client backend    | wal    | normal  | 207883 |  2198847488 | 10 kB
 autovacuum worker | wal    | init    |      4 |    67108864 | 16 MB
 autovacuum worker | wal    | normal  |  25595 |  1446313984 | 55 kB
 background worker | wal    | normal  |  19012 |   155746304 | 8192 bytes
 background writer | wal    | normal  |      1 |        8192 | 8192 bytes
 checkpointer      | wal    | init    |      1 |    16777216 | 16 MB
 checkpointer      | wal    | normal  |     12 |     8495104 | 691 kB
 walwriter         | wal    | normal  |    314 |   452460544 | 1407 kB
(9 rows)

pg_stat_get_backend_io: I/O statistics for a specific process

commit: 9aea73fc6

The new function pg_stat_get_backend_io returns I/O statistics for a client backend. It takes the process ID and returns a set of rows in the pg_stat_io format.

For example:

SELECT object, context,
       reads, read_bytes,
       writes, write_bytes
FROM pg_stat_get_backend_io(pg_backend_pid())
WHERE reads > 0 OR writes > 0;

  object  | context | reads | read_bytes | writes | write_bytes
----------+---------+-------+------------+--------+-------------
 relation | normal  |   543 |    4489216 |      0 |           0
 relation | vacuum  | 76666 | 1122197504 |  34301 |   280993792
 wal      | normal  |     0 |          0 |    256 |   283475968
(3 rows)

The function only works with client backends. Statistics accumulate for as long as the process runs, then flush when the process completes. You can flush the statistics manually with pg_stat_reset_backend_stats.

By combining pg_stat_get_backend_io with pg_stat_activity, powerful insights can be gained: not just which process runs the longest, but which one of the currently running processes touches the disk most.

VACUUM(verbose): visibility map information

commit: dc6acfd91, 4b565a198

A newly created table does not have a visibility map yet, so both all-visible and all-frozen counters are at 0.

CREATE EXTENSION pg_visibility;

CREATE TABLE bookings_copy WITH (autovacuum_enabled = off)
AS SELECT * FROM bookings;

SELECT * FROM pg_visibility_map_summary('bookings_copy'::regclass);

 all_visible | all_frozen
-------------+------------
           0 |          0
(1 row)

VACUUM (verbose) output now includes a new row visibility map. It shows the counter values after vacuuming and the number of all-visible pages (the number in parentheses, 0 in this case). This provides a count of all pages marked as all-visible during vacuuming.

VACUUM(freeze, verbose) bookings_copy;

INFO:  aggressively vacuuming "demo.bookings.bookings_copy"
INFO:  finished vacuuming "demo.bookings.bookings_copy": index scans: 0
pages: 0 removed, 13504 remain, 13504 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 2111110 remain, 0 are dead but not yet removable
removable cutoff: 964, which was 0 XIDs old when operation ended
new relfrozenxid: 964, which is 1 XIDs ahead of previous value
frozen: 13447 pages from table (99.58% of total) had 2111110 tuples frozen
visibility map: 13447 pages set all-visible, 13447 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 269.712 MB/s, avg write rate: 269.806 MB/s
buffer usage: 15623 hits, 11456 reads, 11460 dirtied
WAL usage: 26898 records, 4 full page images, 5963212 bytes
system usage: CPU: user: 0.19 s, system: 0.03 s, elapsed: 0.33 s
VACUUM

Now all pages are frozen and visible to all transactions.

SELECT * FROM pg_visibility_map_summary('bookings_copy'::regclass);

 all_visible | all_frozen
-------------+------------
       13447 |      13447
(1 row)

Total vacuum and analysis time per table

commit: 30a6ed0ce

The pg_stat_all_table view gets four new columns starting with total: total_vacuum_time, total_autovacuum_time, total_analyze_time, and total_autoanalyze_time. These columns track vacuum, autovacuum, analyze and autoanalyze times for each table.

Since tables already track operation counts, together these metrics give the average operation execution time statistics per table. This query returns the table that took the most time to vacuum and analyze:

SELECT relname,
       vacuum_count,
       total_vacuum_time,
       total_vacuum_time/nullif(vacuum_count,0) avg_vacuum_time,
       autovacuum_count,
       total_autovacuum_time,
       total_autovacuum_time/nullif(autovacuum_count,0) avg_autovacuum_time,
       analyze_count,
       total_analyze_time,
       total_analyze_time/nullif(analyze_count,0) avg_analyze_time,
       autoanalyze_count,
       total_autoanalyze_time,
       total_autoanalyze_time/nullif(autoanalyze_count,0) avg_autoanalyze_time
FROM pg_stat_all_tables
ORDER BY total_autovacuum_time + total_vacuum_time +
         total_analyze_time + total_autoanalyze_time DESC
LIMIT 1
\gx

-[ RECORD 1 ]----------+------------------
relname                | ticket_flights
vacuum_count           | 3
total_vacuum_time      | 2992
avg_vacuum_time        | 997.3333333333334
autovacuum_count       | 0
total_autovacuum_time  | 0
avg_autovacuum_time    |
analyze_count          | 2
total_analyze_time     | 594
avg_analyze_time       | 297
autoanalyze_count      | 0
total_autoanalyze_time | 0
avg_autoanalyze_time   |

The times are in milliseconds. Clearly, vacuuming and analysis in this test database do not take long.

autovacuum: change the number of workers without restarting the server

commit: c758119e5

The number of autovacuum workers can now be changed without needing to restart the server afterwards. You can just set the new number in autovacuum_max_workers and reload the configuration files, as long as the new number does not exceed the number of allocated slots set by the new parameter autovacuum_worker_slots. Changing the number of slots, however, does require a restart.

\dconfig+ autovacuum*worker*

                      List of configuration parameters
        Parameter        | Value |  Type   |  Context   | Access privileges
-------------------------+-------+---------+------------+-------------------
 autovacuum_max_workers  | 3     | integer | sighup     |
 autovacuum_worker_slots | 16    | integer | postmaster |
(2 rows)

psql: connection service information

commit: 477728b5d

The connection service file pg_service.conf is used to connect to various databases without the need to enter the credentials manually every time.

$ grep -A 6 '^\[replica\]' ~/.pg_service.conf

[replica]
# Primary server replica
host=localhost
port=5433
user=postgres
dbname=demo
options=-c search_path=bookings

Any application using libpq can connect by the service name replica. In psql, the service name is now stored in the variable SERVICE:

$ psql service=replica -c '\echo :SERVICE'

replica

You can also put the service name into the psql prompt with the %s sequence.

psql: expanded display for \d* commands

commit: 00f4c2959

In order for \d* commands to display expanded output, the mode had to be turned on before running the command, and turned off afterwards. It was inconvenient to type every time, and the mode change notifications were unnecessary.

\x \du+ \x

Expanded display is on.
List of roles
-[ RECORD 1 ]-----------------------------------------------------------
Role name   | postgres
Attributes  | Superuser, Create role, Create DB, Replication, Bypass RLS
Description |

Expanded display is off.

Now, commands that return object lists support the x flag that enables the expanded display mode for the current command only:

\dux+

List of roles
-[ RECORD 1 ]-----------------------------------------------------------
Role name   | postgres
Attributes  | Superuser, Create role, Create DB, Replication, Bypass RLS
Description |

The behavior is similar to the expanded mode flag \gx for queries. The x and + flags can come in any order.

\dt+x tickets

List of tables
-[ RECORD 1 ]-+----------
Schema        | bookings
Name          | tickets
Type          | table
Owner         | postgres
Persistence   | permanent
Access method | heap
Size          | 386 MB
Description   | Tickets

psql: leakproof flag in \df* output

commit: 2355e5111

The commands \df+, \do+, \dAo+, and \dC+ now have the leakproof flag in their output. The new flag x shows just that:

\df+x bookings.now

List of functions
-[ RECORD 1 ]-------+--------------------------------------------------------
Schema              | bookings
Name                | now
Result data type    | timestamp with time zone
Argument data types |
Type                | func
Volatility          | immutable
Parallel            | unsafe
Owner               | postgres
Security            | invoker
Leakproof?          | no
Access privileges   |
Language            | sql
Internal name       |
Description         | Point in time according to which the data are generated

jsonb: null conversion to other types

commit: a5579a90a

After converting a JSON null to the SQL text type, the jsonb element becomes a string: 'null'. However, conversion to another scalar type, for example int or float, causes an error:

17=# WITH t AS (
    SELECT '{"a": null}'::jsonb col
)
SELECT t.col['a']::int  null_as_int,
       t.col['a']::text null_as_text
FROM t;

ERROR:  cannot cast jsonb null to type integer

This patch fixes that, converting JSON nulls to SQL NULLs properly:

18=# WITH t AS (
    SELECT '{"a": null}'::jsonb col
)
SELECT t.col['a']::int  null_as_int,
       t.col['a']::text null_as_text
FROM t;

 null_as_int | null_as_text
-------------+--------------
             | null
(1 row)

MD5 encryption algorithm: end-of-support preparations

commit: db6a4a985

The MD5-encrypted password security issues are notorious. Therefore, this algorithm will not be supported in PostgreSQL 19 and later.

In PostgreSQL 18, MD5 still works, but it has been declared obsolete in the documentation, and attempting to create an MD5-encrypted password shows a warning:

SET password_encryption = 'md5';
CREATE ROLE alice LOGIN PASSWORD 'secret';

WARNING:  setting an MD5-encrypted password
DETAIL:  MD5 password support is deprecated and will be removed in a future release of PostgreSQL.
HINT:  Refer to the PostgreSQL documentation for details about migrating to another password type.
CREATE ROLE

You can disable the warning with the parameter md5_password_warnings, but that should not delay your transition away from MD5.

New function uuidv7

commit: 78c5e141e

The existing function gen_random_uuid returns a UUID version 4. The values are generated randomly, and sorting them does not indicate the order they came in:

SELECT g.x, gen_random_uuid()
from generate_series(1,5) AS g(x)
ORDER BY 2;

 x |           gen_random_uuid            
---+--------------------------------------
 4 | 00743ce3-1e99-4cf6-8d74-6c057018b7d6
 3 | 26d9a9bc-f7a0-4c91-8afe-945960f2f59b
 5 | 384869c0-8d7d-4201-945c-ee339b62e6b5
 1 | c970dce9-d883-440a-8f41-0d47cc818328
 2 | dd5cc4ed-adcc-4dbe-8926-d2222b826316
(5 rows)

The UUIDv7 specification says that the values must begin with a Unix-epoch timestamp in milliseconds, therefore, generated values will increase in order. The new function uuidv7 ensures this monotonous growth within the scope of a given client backend:

SELECT g.x, uuidv7()
from generate_series(1,5) AS g(x)
ORDER BY 2;

 x |                uuidv7                
---+--------------------------------------
 1 | 01953813-d813-7c2f-81b0-9f74e90928b8
 2 | 01953813-d813-7cb9-b5bb-1df8003f07b6
 3 | 01953813-d813-7ce1-b13b-4f87ceb29651
 4 | 01953813-d813-7d11-a4e6-cc1f343741b8
 5 | 01953813-d813-7d37-94b0-bb2977fb3f4f
(5 rows)

In addition, the function uuid_extract_timestamp returns the time and date when a UUID was created, and uuidv7 itself can shift the time forward by a given interval:

SELECT g.x,
       uuid_extract_timestamp(
         uuidv7(format('%s day',g.x)::interval)
       )
from generate_series(-2,2) AS g(x)
ORDER BY 2;

 x  |   uuid_extract_timestamp   
----+----------------------------
 -2 | 2025-02-22 16:22:06.566+03
 -1 | 2025-02-23 16:22:06.566+03
  0 | 2025-02-24 16:22:06.566+03
  1 | 2025-02-25 16:22:06.566+03
  2 | 2025-02-26 16:22:06.566+03
(5 rows)

See also:

Waiting for PostgreSQL 18 – Add UUID version 7 generation function.(Hubert ʻdepeszʼ Lubaczewski)

postgres_fdw: SCRAM authentication without storing the password

commit: 761c79508

For authentication by password on a remote server to function, user passwords had to be explicitly set in the user mapping configuration (CREATE USER MAPPING). To avoid storing passwords openly, the new remote server and table parameter use_scram_passthrough is introduced.

Create a role alice on a remote server with a password encrypted with the scram-sha-256 algorithm. In pg_hba.conf, the only authentication method allowed for alice is SCRAM.

remote=# SELECT * FROM pg_hba_file_rules LIMIT 1

-[ RECORD 1 ]------------------------------------------------
rule_number | 1
file_name   | /home/pal/master/pg18/remote_server/pg_hba.conf
line_number | 1
type        | host
database    | {all}
user_name   | {alice}
address     | 127.0.0.1
netmask     | 255.255.255.255
auth_method | scram-sha-256
options     |
error       |

This is what postgres_fdw configuration on the local server will look like:

local=# CREATE EXTENSION postgres_fdw;

Enable the new parameter use_scram_passthrough:

local=# CREATE SERVER remote_server
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (
    host 'localhost',
    port '5400',
    dbname 'postgres',
    use_scram_passthrough 'true'
);

alice will also be accessing the remote server. We will set the password for alice in a bit. Note that you do not need to specify aliceʼs remote server password here:

local=# CREATE ROLE alice LOGIN;
local=# CREATE USER MAPPING FOR alice
  SERVER remote_server
  OPTIONS (
     user 'alice'
);
local=# GRANT USAGE ON FOREIGN SERVER remote_server TO alice;

Now, for the local alice, set the same password as the remote one. You need to copy the entire encrypted password string from the remote serverʼs pg_authid:

remote=# SELECT rolpassword FROM pg_authid WHERE rolname = 'alice';

... and put it in as the local aliceʼs password:

local=# ALTER ROLE alice PASSWORD '...';

Now, the local alice can create and access remote tables without entering a password.

passwordcheck: minimum password length

commit: f7e1b3828

The new parameter for the passwordcheck extension sets the minimum accepted password length.

SET passwordcheck.min_password_length = 8;
ALTER ROLE postgres PASSWORD '123';

ERROR:  password is too short
DETAIL:  password must be at least "passwordcheck.min_password_length" (8) bytes long

Transmitting passwords as plain text is unsafe, so most client applications can encrypt passwords before sending them to the server: for example, the \password command in psql. However, with pre-encrypted passwords, the passwordcheck.min_password_length parameter will not work.

New function casefold and pg_unicode_fast collation

commit: bfc599206, d3d098316, 286a365b9

casefold is a case-independent search function that works when simply converting the strings to upper or lower case does not suffice. Depending on collation, several lowercase variants can exist for a single character.

Here is an example for the Eszett:

SELECT c, lower(c), casefold(c), upper(c)
FROM (VALUES('ß' collate "pg_unicode_fast")) AS t(c);

 c | lower | casefold | upper
---+-------+----------+-------
 ß | ß     | ss       | SS
(1 row)

The new collation pg_unicode_fast is used here. It matches the SQL standard ucs_basic collation more closely.

DML commands: RETURNING with OLD and NEW

commit: 80feb727c

In the INSERT, UPDATE, DELETE, and MERGE commands, the RETURNING clause can now return both old values (before executing the command) and new ones, using OLD and NEW, respectively (similarly to old and new values in row level triggers).

CREATE TABLE t(
    id integer PRIMARY KEY,
    s text
);

Since the old values are always NULL in INSERT, returning them serves no purpose.

INSERT INTO t
VALUES (1,'aaa'), (2, 'bbb')
RETURNING old.*, new.*;

 id | s | id |  s  
----+---+----+-----
    |   |  1 | aaa
    |   |  2 | bbb
(2 rows)

For DELETE, there are no NEW values:

DELETE FROM t
WHERE id > 1
RETURNING old.*, new.*;

 id |  s  | id | s
----+-----+----+---
  2 | bbb |    |
(1 row)

But with UPDATE, being able to access both states may come in handy for change tracking:

UPDATE t SET s = 'ccc'
WHERE id = 1
RETURNING old.*, new.*,
          format('%s->%s', old.s, new.s) AS change;

 id |  s  | id |  s  |  change  
----+-----+----+-----+----------
  1 | aaa |  1 | ccc | aaa->ccc
(1 row)

Same goes for INSERT with ON CONFLICT:

INSERT INTO t
VALUES (1,'ddd'), (3,'eee')
ON CONFLICT(id) DO UPDATE SET s = EXCLUDED.s
RETURNING to_json(old.*) AS old,
          to_json(new.*) AS new;

        old         |        new         
--------------------+--------------------
 {"id":1,"s":"ccc"} | {"id":1,"s":"ddd"}
                    | {"id":3,"s":"eee"}
(2 rows)

to_number: convert a string of roman numberals to numeric

commit: 172e6b3ad

Converting an integer to roman numerals has been possible for a while now with the help of to_char. Now, the new function to_number can do the reverse:

SELECT to_char(42, 'RN'),
       to_number('XLII', 'RN');

     to_char     | to_number
-----------------+-----------
            XLII |        42
(1 row)

This is all for now. A review of the last, March CommitFest for PostgreSQL 18 is coming up soon!

← Back to all articles