•   PostgreSQL   •   By Pavel Luzanov

PostgreSQL 16: part 3 or CommitFest 2022-11

We continue to follow the news of the upcoming PostgreSQL 16. The third CommitFest concluded in early December. Let's look at the results.

If you missed the previous CommitFests, check out our reviews: 2022-07, 2022-09.

Here are the patches I want to talk about:

meson: a new source code build system

commit: e6927270, 4c72102e

Meson support was first introduced in a commit way back in September. Several dozen more commits followed. Several dozens more followed. In early December, a documentation section describing the new build process was committed. There is also a wiki page that explains, among other things, what was behind the decision to introduce the new building tool. One of the reasons is to simplify building the server for Windows systems and to be able to eventually stop supporting the PostgreSQL's own build tools from the /src/tools/msvc entirely.

The new build system will be available alongside with autoconf and make, there is no plans to replace the good old tools.

However, I specifically used Meson to build the fresh PostgreSQL 16 testing environment for this article. On Ubuntu, the building went as follows:

$ git clone git://git.postgresql.org/git/postgresql.git
$ cd postgresql
$ meson setup build --prefix=/home/pal/pg16 -Dpgport=5416
$ cd build
$ ninja
$ ninja install

I left everything on default, except for the installation location and the server port.

All what's left is to initialize the cluster and start the server, which you do the same way regardless of the build tool used.

Documentation: a new chapter on transaction processing

commit: 66bc9d2d

A new chapter has been added to the documentation: 74. Transaction processing. It lives in the Internals section and gives a brief overview of what virtual and real transaction identifiers are, how they are shown in pg_locks, and how subtransactions and two-phase transactions work.

psql: \d+ indicates foreign partitions in a partitioned table

commit: bd95816f

For a partitioned table, the \d+ command will mark sections that are foreign tables with the word FOREIGN.

\d+ data
                                    Partitioned table "public.data"
  Column   | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 data_year | text |           |          |         | extended |             |              |
Partition key: LIST (data_year)
Partitions: data_2020 FOR VALUES IN ('2020'), FOREIGN,
        	data_2021 FOR VALUES IN ('2021'), FOREIGN,
        	data_2022 FOR VALUES IN ('2022')

psql: extended query protocol support

commit: 5b66de34

Until now, psql has not directly supported the extended query protocol. Therefore, parameterized queries could only be performed indirectly, such as with the PREPARE statement.

Now the \bind command has been added to psql to bind values to the parameters of the next query. Here's how it works:

SET log_statement='all';

\bind 42 'Answer: '

SELECT $2||$1;
  ?column?  
------------
Answer: 42
(1 row)
\! tail -2 logfile
2022-12-05 12:58:05.924 MSK [16566] LOG:  execute : SELECT $2||$1;
2022-12-05 12:58:05.924 MSK [16566] DETAIL:  parameters: $1 = '42', $2 = 'Answer: '

Only a fraction of the capabilities of the extended query protocol has been implemented, but the initiative is very welcome.

Predicate locks on materialized views

commit: 43351557

Predicate locks are used to ensure consistency for transactions with the SERIALIZABLE isolation level. They were never designed to update materialized views with the REFRESH MATERIALIZED VIEW command, because it exclusively locks the view in order to prevent any concurrent access anomalies.

But when updating with the CONCURRENTLY parameter and without predicate locks, transactions with the SERIALIZABLE isolation level could experience write skew. Now the problem is fixed.

Tracking last scan time of indexes and tables

commit: c0374718

In addition to counting table and index scans, the cumulative statistics system now records time and date of the last recorded scan. This data is stored in the last_seq_scan and last_idx_scan columns in pg_stat_all_tables and the last_idx_scan column in pg_stat_all_indexes.

SELECT seq_scan, last_seq_scan, idx_scan, last_idx_scan
FROM   pg_stat_all_tables
WHERE  relid = 'tickets'::regclass;
 seq_scan |         last_seq_scan         | idx_scan |         last_idx_scan    	 
----------+-------------------------------+----------+-------------------------------
       23 | 2022-12-07 15:17:57.261575+03 |        2 | 2022-12-05 14:59:53.564968+03
(1 row)
SELECT indexrelname, idx_scan, last_idx_scan
FROM pg_stat_all_indexes
WHERE relid = 'tickets'::regclass;
     indexrelname     | idx_scan |         last_idx_scan    	 
----------------------+----------+-------------------------------
 tickets_pkey         |        2 | 2022-12-05 14:59:53.564968+03
 tickets_book_ref_idx |        0 |
(2 rows)

It can be helpful for analyzing index and table use if no dedicated monitoring system is implemented.

pg_buffercache: a new function pg_buffercache_summary

commit: 2589434a

The pg_buffercache extension now can display aggregated buffer cache data:

CREATE EXTENSION pg_buffercache;

SELECT * FROM pg_buffercache_summary()\gx
-[ RECORD 1 ]--+------------------
buffers_used   | 2544
buffers_unused | 13840
buffers_dirty  | 38
buffers_pinned | 0
usagecount_avg | 2.183176100628931

We used to obtain this data by aggregating the pg_buffercache view, but the new function does not lock the buffers, therefore it is cheaper and can be easily integrated into monitoring systems.

walsender displays the database name in the process status

commit : af205152

walsender is the process that handles replication protocol connections. For physical replication or backup, walsender connects to an instance, but not to a specific database. When you create a logical replication subscription, however, the publisher creates a logical replication slot, and walsender connects to the database where the publication is created.

The patch adds the name of the database to walsender process description in case of logical replication, which makes monitoring those processes more convenient.

16(sub)=# CREATE SUBSCRIPTION sub CONNECTION 'port=5416 user=postgres dbname=demo' PUBLICATION pub;
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION
16(pub)=# \! ps -o pid,command --ppid `head -n 1 /home/pal/pg16/data/postmaster.pid`
	PID COMMAND
  38514 postgres: checkpointer
  38515 postgres: background writer
  38517 postgres: walwriter
  38518 postgres: autovacuum launcher
  38519 postgres: logical replication launcher
  38522 postgres: postgres demo [local] idle
  38662 postgres: walsender postgres demo [local] START_REPLICATION

Here, the walsender process status displays the demo database name.

Reducing the WAL overhead of freezing tuples

commit: 9e540599

Peter Geoghegan is working on proactive tuple freezing. The purpose is clear: to avoid transaction counter wraparound, you need to start freezing in advance. However, an earlier freeze leads to a larger WAL volume and increases the associated overhead (WAL replication and archiving).

In this patch, Peter proposed to form WAL freezing records more compactly (~5 times so).

Reduced power consumption when idle

commit: cd4329d9, 05a7be93

When the main server is idle, the startup process on physical replicas still wakes up every 5 seconds to check if the file specified in the promote_trigger_file has appeared. But there are other mechanisms for promoting a replica: pg_ctl promote and the pg_promote function. To reduce resource consumption for such wake-ups, the promote_trigger_file parameter was removed.

Similar work was done with the walreceiver process, which woke up 10 times a second, checking whether it needed to do something. Now, the process calculates the time of its next start in advance.

postgres_fdw: batch mode for COPY

commit: 97da4824

Batch-inserting records into foreign tables first became possible in the 14th release. The batch size is set by the batch_size parameter at the foreign server or table level. However, it was implemented only for the INSERT command. This patch adds support for batch paste mode for the COPY FROM command.

Let's compare the insertion speed by different commands. To do this, I first create an empty bookings table in the postgres database, into which we will insert data.

postgres=# CREATE TABLE public.bookings (
	book_ref char(6),
	book_date timestamptz,
	total_amount numeric(10,2)
);

In the demo database in the same cluster, let's create a foreign table for the bookings table.

CREATE EXTENSION postgres_fdw;

CREATE SERVER srv
	FOREIGN DATA WRAPPER postgres_fdw
	OPTIONS (dbname 'postgres', batch_size '1');

CREATE USER MAPPING FOR postgres
	SERVER srv
	OPTIONS (user 'postgres');

CREATE FOREIGN TABLE bookings_remote (
    book_ref char(6),
    book_date timestamptz,
    total_amount numeric(10,2)
) SERVER srv 
    OPTIONS (schema_name 'public', table_name 'bookings');

The batch size is set at the server level as 1, which means that the batch mode is disabled and records are sent one at a time.

Let's dump the bookings table contents from the demo database into a file:

\COPY bookings TO 'bookings.txt'

Now, upload the file contents into the foreign table and measure the time:

\timing on
\COPY bookings_remote FROM 'bookings.txt'
COPY 2111110
Time: 57327,152 ms (00:57,327)

It took almost a minute to insert a little more than two million rows. Set the batch size to 100 and repeat the test:

TRUNCATE bookings_remote;
ALTER SERVER srv OPTIONS (SET batch_size '100');

\COPY bookings_remote FROM 'bookings.txt'
COPY 2111110
Time: 8780,000 ms (00:08,780)

The insertion speed has increased by more than 6 times!

Curiously, the INSERT command works even faster in batch mode:

TRUNCATE bookings_remote;
INSERT INTO bookings_remote SELECT * FROM bookings;
INSERT 0 2111110
Time: 6669,504 ms (00:06,670)

So when it comes to insertion with the COPY command, there's room for optimization yet! Also, it would be great to teach postgres_fdw to use batch mode to modify and delete rows, not just read and insert.

Modernizing the GUC infrastructure

commit: f13b2088, 3057465a, 407b50f2, 9c911ec0

This series of commits optimizes storage and speeds up access to configuration parameters, including user-generated ones.

The optimization is most evident in the example that Tom Lane gave in the letter that started the discussion:

do $$
    begin
        for i in 1..10000 loop
        perform set_config('foo.bar' || i::text, i::text, false);
    end loop;
end $$;

On my computer, creating ten thousand user parameters in PostgreSQL 15 took about six seconds, and after applying the patches, the time was reduced to ~ 30 milliseconds.

Hash index build optimization

commit: e09d7a12, d09dbeb9

When building hash indexes, the values were sorted only by the bucket number. If you additionally sort by value, then inserting subsequent values is faster.

The first commit was made in July, and the second in November. By now, the hash index build speed increased by 5-15%.

MAINTAIN ― a new privilege for table maintenance

commit: 60684dd8

A number of maintenance actions for tables and materialized views are not regulated by privileges and are usually available only to the object owner and the superuser.

PostgreSQL 16 introduces the MAINTAIN privilege, which can be granted on materialized views and tables, including system ones:

GRANT MAINTAIN ON pg_class, pg_attribute, pg_type TO alice;

The \dp output uses the m character for this privilege:

\dp pg_catalog.pg_class
                                     Access privileges
   Schema   |   Name   | Type  |     Access privileges      | Column privileges | Policies
------------+----------+-------+----------------------------+-------------------+----------
 pg_catalog | pg_class | table | postgres=arwdDxtm/postgres+|                   |
            |          |       | =r/postgres               +|                   |
            |          |       | alice=m/postgres           |                   |

What actions are granted by this privilege? These include executing the commands ANALYZE, VACUUM (including VACUUM FULL), CLUSTER, REINDEX, REFRESH MATERIALIZED VIEW and LOCK TABLE.

In addition, a predefined pg_maintain role has been added. Membership in this role grants the MAINTAIN privilege on all relations in the database. This allows you to grant maintenance privileges to a regular role that does not have the rights to perform DDL and DML operations in the database otherwise.

The initial patch proposed the addition of two privileges for VACUUM and ANALYZE, but eventually the community settled on the MAINTAIN privilege with more extensive maintenance capabilities.

SET ROLE: better role change management

commit: 3d14e171

This patch continues the effort to build a more robust roles and privileges management system. The previous article talked about granting membership in roles and inheriting privileges.

The new patch introduces switching to another role with the SET ROLE command. Suppose we decide to include the alice role into pg_read_all_data with privilege inheritance enabled:

postgres=# GRANT pg_read_all_data TO alice WITH INHERIT TRUE;

The alice role can now use pg_read_all_data privileges. But nothing is stopping alice from switching role to pg_read_all_data and starting to create objects on its behalf:

postgres=# \c - alice
You are now connected to database "postgres" as user "alice".
alice=> CREATE TABLE t (id int);
CREATE TABLE
alice=> \dt t
            List of relations
 Schema | Name | Type  |      Owner       
--------+------+-------+------------------
 public | t    | table | pg_read_all_data

Actually, there is. To create objects, the pg_read_all_data role must have the CREATE privilege in some schema, such as in public. And starting from PostgreSQL 15, the pseudo-role public no longer has this privilege. As for the example above, I had to run the following command in advance:

postgres=# GRANT CREATE ON SCHEMA public TO public;

Nevertheless, switching to another role may be undesirable in certain situations. If we provided membership in a role with INHERIT TRUE, then why switch at all? In the new version, this can be prohibited by the same GRANT command:

postgres=# GRANT pg_read_all_data TO alice WITH INHERIT TRUE, SET FALSE;

Alice can now use pg_read_all_data privileges, but will not be able to switch to it.

alice=> SET ROLE pg_read_all_data;
ERROR:  permission denied to set role "pg_read_all_data"

The ability to switch to a role is stored in the set_option column of the pg_auth_members table, next to inherit_option and admin_option.

SELECT roleid::regrole, member::regrole, grantor::regrole,
       admin_option, inherit_option, set_option
FROM pg_auth_members
WHERE member = 'alice'::regrole\gx
-[ RECORD 1 ]--+-----------------
roleid         | pg_read_all_data
member         | alice
grantor        | postgres
admin_option   | f
inherit_option | t
set_option     | f

Support for file inclusion directives in pg_hba.conf and pg_ident.conf

commit: a54b658c

The configuration files pg_hba.conf and pg_ident.conf, as well as postgresql.conf, now support the directives to include other files: include, include_if_exists, include_dir.

Columns containing the file name and rule/map number have been added to pg_hba_file_rules and pg_ident_file_mappings:

SELECT * FROM pg_hba_file_rules
WHERE file_name LIKE '%hba_ident_test.conf' \gx
-[ RECORD 1 ]---------------------------------
rule_number | 1
file_name   | /home/pal/pg/hba_ident_test.conf
line_number | 1
type        | local
database.   | {all}
user_name   | {alice}
address     |
netmask     |
auth_method | peer
options     | {map=m1}
error       |
SELECT * FROM pg_ident_file_mappings
WHERE file_name LIKE '%ident_test.conf' \gx
-[ RECORD 1 ]-----------------------------
map_number  | 1
file_name   | /home/pal/pg/ident_test.conf
line_number | 1
map_name.   | m1
sys_name.   | student
pg_username | alice
error       |

Regular expressions support in pg_hba.conf

commit: 8fea8683, a9039713, fc579e11

pg_hba.conf supports regular expressions for the user name and the database name. If any of these fields starts with a slash (/), then the value is considered a regular expression.

 

This is all for now. Looking forward to the results of the January CommitFest!

← Back to all articles

Pavel Luzanov