Home > Blog

November 14, 2022 • PostgreSQL • By Pavel Luzanov

PostgreSQL 16: part 2 or CommitFest 2022-09

PostgreSQL

Source:
habr.com

Subscribe to blog

Source:
habr.com

Subscribe to blog

It’s official! PostgreSQL 15 is out, and the community is abuzz discussing all the new features of the fresh release.

Meanwhile, the October CommitFest for PostgreSQL 16 had come and gone, with its own notable additions to the code.

If you missed the July CommitFest, our previous article will get you up to speed in no time.

Here are the patches I want to talk about:

SYSTEM_USER function
Frozen pages/tuples information in autovacuum's server log
pg_stat_get_backend_idset returns the actual backend ID
Improved performance of ORDER BY / DISTINCT aggregates
Faster bulk-loading into partitioned tables
Optimized lookups in snapshots
Bidirectional logical replication
pg_auth_members: pg_auth_members: role membership granting management
pg_auth_members: role membership and privilege inheritance
pg_receivewal and pg_recvlogical can now handle SIGTERM

SYSTEM_USER function

commit: 0823d061

The SYSTEM_USER function is a part of the SQL standard which has not been implemented in PostgreSQL so far. It displays information about the system user connected to the database server and the authentification method used:

SELECT session_user, system_user;

 session_user | system_user  
--------------+--------------
 alice        | peer:student

The example shows that alice has connected to the database as student (OS username) using peer authentification. For trust authentification, the function always returns NULL.

These connection details are also obtainable from the server log (if the parameter log_connections is on).

Frozen pages/tuples information in autovacuum's server log

commit: d977ffd9

The autovacuum log and the VACUUM (verbose) output now display additional information on the number of frozen pages and tuples in a given table.

CREATE TEMP TABLE t AS SELECT 42;

VACUUM (freeze, verbose) t;

INFO:  aggressively vacuuming "demo.pg_temp_3.t"
INFO:  finished vacuuming "demo.pg_temp_3.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total)
tuples: 0 removed, 1 remain, 0 are dead but not yet removable
removable cutoff: 945, which was 0 XIDs old when operation ended
new relfrozenxid: 945, which is 1 XIDs ahead of previous value
frozen: 1 pages from table (100.00% of total) had 1 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 91.912 MB/s, avg write rate: 0.000 MB/s
buffer usage: 5 hits, 4 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 188 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM

See the line that starts with frozen:.

pg_stat_get_backend_idset returns the actual backend ID

commit: d7e39d72

Consider a system that runs several concurrent sessions. The sessions work with temporary tables, which are stored in temporary schemas. The list of temporary schemas:

SELECT n.nspname,
       regexp_replace(n.nspname, '^pg_temp_', '')::int AS backend_id
FROM   pg_namespace n                                               	 
WHERE  n.nspname ~ '^pg_temp_';

  nspname  | backend_id
-----------+------------
 pg_temp_5 |      	5
 pg_temp_3 |      	3
 pg_temp_4 |      	4
(3 rows)

Let me explain this query. The backend_id column lists backend IDs. The system uses them to name temporary schemas. Backend IDs may be reused, and a new backend may get the same name as a recently terminated one. When a backend terminates, its temporary schemas don't get deleted, but simply remain unused until a new backend with the same ID is created, at which point they are assigned to it.

But how do you draw a line between a temporary schema and a backend process? You may need to do that, for example, to terminate a hung up session when objects in a temporary schema prevent freezing, potentially leading to transaction ID wraparound.

The updated function pg_stat_get_backend_idset and related statistics functions can help you out with that. Before, the function returned indexes from the array of open sessions as the backend IDs. However, an index value for a session may change after opening or terminating a session and rebuilding statistics. The new pg_stat_get_backend_idset now reliably returns actual backend session IDs, which can be linked to backend_id from the query above.

This allows us to retrieve a list of backend IDs and temporary schemas:

WITH tmp AS (
	SELECT n.nspname,                                          	 
           regexp_replace(n.nspname, '^pg_temp_', '')::int AS backend_id
	FROM   pg_namespace n
	WHERE  n.nspname ~ '^pg_temp_'
)
SELECT pg_stat_get_backend_pid(backend_id) AS pid,
       nspname
FROM   tmp
       LEFT JOIN pg_stat_get_backend_idset() AS b(backend_id)
           USING (backend_id);

  pid   |  nspname  
--------+-----------
 336661 | pg_temp_3
 349872 | pg_temp_4
        | pg_temp_5
(3 rows)

Also note that the pg_temp_5 schema is not currently used.

Improved performance of ORDER BY / DISTINCT aggregates

commit: 1349d279

Aggregates with ORDER BY and DISTINCT can now use indexes to skip sorting tuples when the inputs are known to be pre-sorted.

Example:

SELECT flight_no,
       array_agg(status ORDER BY status),
       array_agg(DISTINCT status)
FROM flights
GROUP BY flight_no;

In both cases, array_agg has to sort the statuses for each flight. Now let’s create an index:

CREATE INDEX ON flights (flight_no, status);

The PostgreSQL 16 planner knows that the input from the index comes in pre-sorted. Therefore, the statuses from each group can be passed to the transition function right away, without the need to wait for them all to be fetched and sorted.

As a result, the operation executes on my laptop 1.5 times faster in PostgreSQL 16 than an identical operation in PostgreSQL 15.

Faster bulk-loading into partitioned tables

commit: 3592e0ff

The patch increases bulk-loading and update speeds for partitioned tables. Operations that load records into the same partition will spend less time looking for specific partitions for each record. When 16 (the value is hard-coded) records in a row are loaded into the same partition, the partition’s name is cached, and if the following records must go to the same partition, they are processed faster.

The optimization works for RANGE and LIST partitions, but it’s of no use for HASH partitioned tables.

Developer testing shows a 15–20% decrease in execution time.

Optimized lookups in snapshots

commit: 37a6e5df, b6ef1675, 82739d4a

The patch optimizes linear searches of arrays (first commit) for x86-64 (second commit) and ARM (third commit). The new algorithm using SIMD instructions was applied to snapshot->xip array search. At a large number (hundreds) of concurrent writers, it significantly increases the visibility check speed, which is an overall performance boost.

Bidirectional logical replication

commit: 36628396

Bidirectional replication occurs when server A subscribes to table changes from server B, and server B subscribes to the same table changes from server A. As a result, all changes made on either server will be applied to both.

Previous PostgreSQL releases didn't support this replication setup. Originally, a subscriber replicated all changes made to the publisher, be it SQL commands or replicated changes. When server A runs an SQL command, server B receives and runs the command as well. The same changes, then, are sent to the first server, resulting in an infinite replication loop. To avoid it, PostgreSQL returns an error when attempting to create a second subscription that will result in a loop.

To implement bidirectional replication, the developers needed a way to apply only changes made by SQL commands but not those from replication. And they found one!

PostgreSQL can distinguish between the two sources of changes, as all logical replication changes are written into WAL together with their origins. Now, when you create a subscription, you can set the publisher to not publish any changes with replication origins.

Let’s set up bidirectional replication between node_1 and node_2, both located on the same server, on ports 5401 and 5402.

wal_level must allow logical replication on both nodes:

node_1=# SHOW wal_level;

wal_level 
-----------
 logical
(1 row)

Let’s create a table and a publication. The node column in the primary key may look redundant, but it’s helpful for avoiding replication conflicts.

node_1=# CREATE TABLE test (
             node text DEFAULT 'node_1',
             id int,
             PRIMARY KEY (node, id)
         );
node_1=# CREATE PUBLICATION pub1 FOR TABLE test;

Next, let’s create a table with an identical structure on the second node and a publication for it:

node_2=# CREATE TABLE test (
             node text DEFAULT 'node_2',
             id int,
             PRIMARY KEY (node, id)
         );
node_2=# CREATE PUBLICATION pub2 FOR TABLE test;

Subscribe node_1 to the node_2 publication, and vice versa.

node_1=# CREATE SUBSCRIPTION sub1_pub2
             CONNECTION 'port=5402 user=postgres dbname=postgres'
             PUBLICATION pub2
             WITH (copy_data = false, origin = none);
node_2=# CREATE SUBSCRIPTION sub2_pub1
             CONNECTION 'port=5401 user=postgres dbname=postgres'
             PUBLICATION pub1
             WITH (copy_data = false, origin = none);

A few things to pay attention to:

origin = none ― this is a new subscription parameter that handles replication cycles. It tells the server to only send out the changes that don't originate from replication (with a nonempty origin), i.e. changes made by SQL commands. The second possible parameter value, any, is the default. It enables the behavior as seen in the previous versions.
copy_data = false ― disable automatic synchronization. Also set when both tables are empty, which is the case in our example.
The tables must not be modified until we complete the setup.

Let’s add a row into each table:

node_1=# INSERT INTO test (id) VALUES (1);
node_2=# INSERT INTO test (id) VALUES (1);

Verify the contents.

node_1=# SELECT * FROM test ORDER BY 1,2;

 node   | id 
--------+----
 node_1 |  1
 node_2 |  1
(2 rows)

node_2=# SELECT * FROM test ORDER BY 1,2;

 node   | id 
--------+----
 node_1 |  1
 node_2 |  1
(2 rows)

The replication works!

Let’s add another node at port 5403. The tables must remain unchanged during setup.

Create a table and a publication:

node_3=# CREATE TABLE test (
             node text DEFAULT 'node_3',
             id int,
             PRIMARY KEY (node, id)
         );
node_3=# CREATE PUBLICATION pub3 FOR TABLE test;

Subscribe the first two nodes to the publication.

node_1=# CREATE SUBSCRIPTION sub1_pub3
             CONNECTION 'port=5403 user=postgres dbname=postgres'
             PUBLICATION pub3
             WITH (copy_data = false, origin = none);
node_2=# CREATE SUBSCRIPTION sub2_pub3
             CONNECTION 'port=5403 user=postgres dbname=postgres'
             PUBLICATION pub3
             WITH (copy_data = false, origin = none);

Subscribe the third node to the first two.

node_3=# CREATE SUBSCRIPTION sub3_pub1
             CONNECTION 'port=5401 user=postgres dbname=postgres'
             PUBLICATION pub1
             WITH (copy_data = true, origin = none);

WARNING:  subscription "sub3_pub1" requested copy_data with origin = NONE but might copy data that had a different origin
DETAIL:  Subscribed publication "pub1" is subscribing to other publications.
HINT:  Verify that initial data copied from the publisher tables did not come from other origins.
NOTICE:  created replication slot "sub3_pub1" on publisher
CREATE SUBSCRIPTION

Note the subscription parameters: copy_data = true, origin = none. CREATE SUBSCRIPTION returns a warning that the third node is going to copy all data from the first one, including the changes with the replication origin. This is exactly what we want for the initial synchronization of the third node: to copy the two records.

There is no need to repeat the initial synchronization when subscribing to the second node.

node_3=# CREATE SUBSCRIPTION sub3_pub2
             CONNECTION 'port=5402 user=postgres dbname=postgres'
             PUBLICATION pub2
             WITH (copy_data = false, origin = none);

The setup is complete. Let’s add a record to the third node:

node_3=# INSERT INTO test (id) VALUES (1);

Verify the contents on all nodes.

node_1=# SELECT * FROM test ORDER BY 1,2;

 node   | id 
--------+----
 node_1 |  1
 node_2 |  1
 node_3 |  1
(3 rows)

node_2=# SELECT * FROM test ORDER BY 1,2;

 node   | id 
--------+----
 node_1 |  1
 node_2 |  1
 node_3 |  1
(3 rows)

node_3=# SELECT * FROM test ORDER BY 1,2;

 node   | id 
--------+----
 node_1 |  1
 node_2 |  1
 node_3 |  1
(3 rows)

A multimaster out of the box!

pg_auth_members: role membership granting management

commit: ce6b672e, 6566133c

To demonstrate this, let’s take three basic roles with LOGIN privileges:

CREATE ROLE alice LOGIN;
CREATE ROLE bob LOGIN;
CREATE ROLE charlie LOGIN;

First, let’s observe how the “old” PostgreSQL 15 behaves, then switch to 16.

Let’s grant alice membership in bob. The GRANTED BY clause of the GRANT command allows to specify which role grants the membership. This information is recorded in the system catalog:

15=# GRANT bob TO alice GRANTED BY charlie;

15=# SELECT roleid::regrole,
            member::regrole,
            grantor::regrole,
            *
FROM pg_auth_members
WHERE roleid::regrole::text !~ '^pg_';

 roleid | member | grantor | roleid | member | grantor | admin_option
--------+--------+---------+--------+--------+---------+--------------
 bob	| alice  | charlie |  24590 |  24589 |   24591 | f

Seeing the pg_auth_members record alone, one may wonder how it was possible for charlie to grant alice membership in bob. To do that, charlie must be a member of bob and have the privilege that allows granting the membership to other roles (WITH ADMIN OPTION). And charlie isn't even a member of bob.

We know that GRANT was run by superuser (postgres), which skips these sorts of privilege checks. But the system catalog record shows that charlie granted alice membership in bob while not having the appropriate privileges, which is not what happened.

But that is not all. What if we drop the role charlie altogether?

15=# DROP ROLE charlie;

DROP ROLE

The role is no more. But what about alice’s membership in bob, which charlie has granted?

15=# SELECT roleid::regrole,
            member::regrole,
            grantor::regrole,
            *
FROM pg_auth_members
WHERE roleid::regrole::text !~ '^pg_';

 roleid | member | grantor | roleid | member | grantor | admin_option
--------+--------+---------+--------+--------+---------+--------------
 bob	| alice  | 24591   |  24590 |  24589 |   24591 | f

The membership is still there, and so is grantor info! 24591 is the OID of the dropped charlie role. Now we have nonsense in pg_auth_members, and, what is worse, the OID may now be reused when creating another role. It is improbable, but not impossible to end up with a record that shows some random new role as the grantor for alice’s membership in bob.

Let’s move on to PostgreSQL 16.

After creating the three roles, let’s try to grant alice membership in bob with charlie as the grantor:

16=# GRANT bob TO alice GRANTED BY charlie;

ERROR:  grantor must have ADMIN OPTION on "bob"

The command returns an error because charlie must be a member of bob and must have the privileges necessary to grant the membership. All as it should be.

16=# GRANT bob TO charlie WITH ADMIN OPTION;

GRANT ROLE

16=# GRANT bob TO alice GRANTED BY charlie;

GRANT ROLE

Now alice is successfully granted the membership.

Let’s try to drop the role charlie:

16=# DROP ROLE charlie;

ERROR:  role "charlie" cannot be dropped because some objects depend on it
DETAIL:  privileges for membership of role alice in role bob

Or try to revoke ADMIN OPTION:

16=# REVOKE ADMIN OPTION FOR bob FROM charlie ;

ERROR:  dependent privileges exist
HINT:  Use CASCADE to revoke them too.

You can only revoke ADMIN OPTION if you add CASCADE, and you can't drop the role. The role charlie now has a dependency because it has granted a role membership. The dependency uses the new OID column in pg_auth_members, and the dependency itself is recorded in pg_shdepend. Let’s have a look at it:

16=# SELECT oid,
            roleid::regrole,
            member::regrole,
            grantor::regrole
FROM pg_auth_members
WHERE oid = 16572;

  oid  | roleid | member | grantor
-------+--------+--------+---------
 16572 | bob	| alice  | charlie

16=# SELECT d.classid::regclass,
            (SELECT m FROM pg_auth_members m WHERE m.oid = d.objid) AS objid,
            d.refclassid::regclass,
            d.refobjid::regrole
FROM pg_shdepend d\gx

-[ RECORD 1 ]-----------------------------
classid	   | pg_auth_members
objid  	   | (16572,16569,16568,16570,f,t)
refclassid | pg_authid
refobjid   | charlie

Granting membership on behalf of third-party roles may not be the most demanded feature, but nevertheless PostgreSQL 16 will make the behavior more logical and predictable.

pg_auth_members: role membership and privilege inheritance

commit: e3ce2de0, 48a257d4

While the previous patch dealt with membership granters, this one concerns grantees.

Let’s look at PostgreSQL 15 first:

15-postgres=# CREATE ROLE alice LOGIN;
15-postgres=# CREATE ROLE bob LOGIN;

15-postgres=# GRANT bob TO alice;

alice is granted membership in bob so that it can use bob’s privileges.

15-postgres=# CREATE TABLE t AS SELECT 42 AS col;
15-postgres=# GRANT SELECT ON t TO bob;

Both bob and alice can see the table contents:

15-alice=> SELECT * FROM t;

 col
-----
  42
(1 row)

This works because the role alice has the INHERIT attribute (all roles have it by default), which makes alice inherit all bob’s privileges automatically.

15-alice=> SELECT rolname, rolinherit FROM pg_roles WHERE rolname = 'alice';

 rolname | rolinherit
---------+------------
 alice   | t

If this attribute is disabled, alice will have to switch roles in order to access the table:

15-postgres=# ALTER ROLE alice NOINHERIT;

15-alice=> SELECT * FROM t;

ERROR:  permission denied for table t

15-alice=> SET ROLE bob;
15-alice=> SELECT * FROM t;

 col
-----
  42
(1 row)

Let’s say that this is the desired behavior.

Now, consider that we want to grant alice some system monitoring capabilities. There’s a predefined role pg_read_all_settings, which has the privileges required to view system configuration parameters. We grant alice membership in it.

15-postgres=# GRANT pg_read_all_settings TO alice;

15-alice=> SHOW data_directory;

ERROR:  must be superuser or have privileges of pg_read_all_settings to examine "data_directory"

The lack of the INHERIT attribute makes alice unable to view the parameters without explicitly switching roles.

15-alice=> SET ROLE pg_read_all_settings;
15-alice=> SHOW data_directory;

   data_directory    
---------------------
 /home/pal/pg15/data

But switching into the role pg_read_all_settings is a major hassle. We want alice to have all pg_read_all_settings privileges by default, but we also want to have them switch to bob explicitly when bob’s privileges are required.

And the new patch allows us to achieve that!

Let’s reproduce the situation in PostgreSQL 16:

16-postgres=# CREATE ROLE alice LOGIN;
16-postgres=# CREATE ROLE bob LOGIN;

16-postgres=# CREATE TABLE t AS SELECT 42 AS col;
16-postgres=# GRANT SELECT ON t TO bob;

The GRANT syntax has been extended. Now, when granting a role membership, you can specify whether or not the grantee inherits the role’s privileges:

16-postgres=# GRANT bob TO alice WITH INHERIT FALSE;
16-postgres=# GRANT pg_read_all_settings TO alice WITH INHERIT TRUE;

Here, alice may use pg_read_all_settings privileges right away but needs to switch to bob to run the get_ultimate_answer function:

16-alice=> SHOW data_directory;

   data_directory    
---------------------
 /home/pal/pg16/data

16-alice=> SELECT * FROM t;

ERROR:  permission denied for table t

16-alice=> SET ROLE bob;
16-alice=> SELECT * FROM t;

 col
-----
  42
(1 row)

GRANT data is stored in pg_auth_members. Now it also includes the inherit_option value:

16-postgres=# SELECT roleid::regrole, member::regrole, grantor::regrole,
                     admin_option, inherit_option
FROM pg_auth_members
WHERE member = 'alice'::regrole;

        roleid        | member | grantor  | admin_option | inherit_option
----------------------+--------+----------+--------------+----------------
 bob                  | alice  | postgres | f            | f
 pg_read_all_settings | alice  | postgres | f            | t

But what about the INHERIT attribute? Now it is only used in GRANT whenever the WITH INHERIT clause is not present. In this example, alice has the INHERIT attribute set, which means that the following command will record it in pg_auth_members:

16-postgres=# GRANT pg_read_all_data TO alice;

The default value will change if the attribute is revoked.

16-postgres=# ALTER ROLE alice NOINHERIT;
16-postgres=# GRANT pg_write_all_data TO alice;

16-postgres=# SELECT roleid::regrole, member::regrole, grantor::regrole,
                     admin_option, inherit_option
FROM pg_auth_members
WHERE member = 'alice'::regrole;

        roleid        | member | grantor  | admin_option | inherit_option
----------------------+--------+----------+--------------+----------------
 bob                  | alice  | postgres | f            | f
 pg_read_all_settings | alice  | postgres | f            | t
 pg_read_all_data     | alice  | postgres | f            | t
 pg_write_all_data    | alice  | postgres | f            | f

Important takeaway: the privilege inheritance behavior in PostgreSQL 16 changes. The INHERIT no longer affects privilege inheritance and only affects subsequent GRANT commands without WITH INHERIT.

The other commit is for the ALTER DEFAULT PRIVILEGES command. Now you need the INHERIT attribute to be able to modify other roles' privileges. In our example, alice can't use ALTER DEFAULT PRIVILEGES to manage privileges for objects within bob’s schema.

pg_receivewal and pg_recvlogical now can handle SIGTERM

commit: 8b60db77

pg_receivewal and pg_recvlogical can run as daemons. When they do, systemd can only send SIGTERM to stop them, but both tools expect SIGINT to stop cleanly and just terminate on SIGTERM. This is not an issue in itself, but when compression for pg_receivewal is enabled, it only dumps data on disk on a clean exit.

So the developers ended up teaching the tools to recognize SIGTERM properly.

This is all for now. Looking forward to the results of the November CommitFest!

← Back to all articles