Thank you for your feedback. Could you please try to increase the number of clients (-c pgbench option) up to 20 or more? It seems, I forgot to specify it.
I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch.
To reproduce the problem:
Setup logical replication from master to replica with subscription parameter twophase = true.
Create some intermediate load on the master (use pgbench with custom sql with prepare+commit)
Optionally switch off the replica for some time (keep load on master).
Switch on the replica and wait until it reaches the master.
The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load.
I tried this setup and I do see that the logical subscriber does reach the master in a short time. I'm not sure what I'm missing. I stopped the logical subscriber in between while pgbench was running and then started it again and ran the following:
postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; sent_lsn | pg_current_wal_lsn -----------+-------------------- 0/6793FA0 | 0/6793FA0 <=== caught up (1 row)
cat test.sql SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$;