Thread: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
From
Dmitry Koterov
Date:
Hi.
replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()")
Suppose on master, I run a multi-query using PQexec and save the value returned by pg_current_wal_insert_lsn:
master_lsn = query(master, "INSERT INTO some VALUES (...); SELECT pg_current_wal_insert_lsn()")
Then I run a PQexec query on a replica and save the value returned by pg_last_wal_replay_lsn:
replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()")
The question to experts in PG internals: is it guaranteed that, as long as replica_lsn >= master_lsn (GREATER OR EQUAL, not just greater), then a subsequent read from replica will always return me the inserted record (i.e. the replica is up to date), considering noone updates/deletes in that table?
I'm asking, because according to some hints in the docs, this should be true. But for some reason, we have to use "greater" (not "greater or equals") condition in the real code, since with just ">=" the replica doesn't sometimes read the written data.
Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
From
Kyotaro Horiguchi
Date:
I'm not sure this fits -hackers.. At Tue, 2 Aug 2022 18:57:41 -0700, Dmitry Koterov <dmitry.koterov@gmail.com> wrote in > Hi. > > Suppose on master, I run a *multi-query* using PQexec and save the value > returned by pg_current_wal_insert_lsn: > > master_lsn = query(master, "INSERT INTO some VALUES (...); SELECT > pg_current_wal_insert_lsn()") > > Then I run a PQexec query on a replica and save the value returned by > pg_last_wal_replay_lsn: > > replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()") > > The question to experts in PG internals: *is it guaranteed that, as long as > replica_lsn >= master_lsn (GREATER OR EQUAL, not just greater), then a > subsequent read from replica will always return me the inserted record* > (i.e. the replica is up to date), considering noone updates/deletes in that > table? https://www.postgresql.org/docs/devel/libpq-exec.html > The command string can include multiple SQL commands (separated by > semicolons). Multiple queries sent in a single PQexec call are > processed in a single transaction, unless there are explicit > BEGIN/COMMIT commands included in the query string to divide it into > multiple transactions. If the query() runs PQexec() with the same string, the call to pg_current_wal_insert_lsn() is made before the insert is commited. That behavior can be emulated on psql. (The backslash before semicolon is crucial. It lets the connected queries be sent in a single PQexec()) =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E5038 (1 row) =# insert into t values(0)\; select pg_current_wal_lsn(); INSERT 0 1 pg_current_wal_lsn -------------------- 0/68E5038 (1 row) =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E50A0 (1 row) $ pg_waldump -s'0/68E5038' -e'0/68E50A0' $PGDATA/pg_wal/000000010000000000000006 rmgr: Heap len (rec/tot): 59/ 59, tx: 770, lsn: 0/068E5038, prev 0/068E5000, desc: INSERT off 15 flags0x00, blkref #0: rel 1663/5/16424 blk 0 rmgr: Transaction len (rec/tot): 34/ 34, tx: 770, lsn: 0/068E5078, prev 0/068E5038, desc: COMMIT 2022-08-0315:49:43.749158 JST So, the replica cannot show the inserted data at the LSN the function returned. If you explicitly ended transaction before pg_current_wal_insert_lsn() call, the expected LSN would be returned. =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E75C8 (1 row) =# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/68E7958 $ pg_waldump -s'0/68E75C8' -e'0/68E7958' $PGDATA/pg_wal/000000010000000000000006 prev 0/068E7590, desc: INSERT off 22 flags 0x00, blkref #0: rel 1663/5/16424 blk 0 FPW rmgr: Transaction len (rec/tot): 34/ 34, tx: 777, lsn: 0/068E7930, prev 0/068E75C8, desc: COMMIT 2022-08-0316:09:13.516498 JST > I'm asking, because according to some hints in the docs, this should be > true. But for some reason, we have to use "greater" (not "greater or > equals") condition in the real code, since with just ">=" the replica > doesn't sometimes read the written data. Thus the wrong LSN appears to have caused the behavior. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
From
Dmitry Koterov
Date:
Thank you for the detailed explanation!
I doubt many people from -general would actually be able to provide such info since the spirit of that list is to find work-arounds for problems and questions at user level rather than dig into the details on how something actually works.
It's worth adding to the documentation, with that exact example BTW:
(I can try submitting a docs PR if you think it's a good idea).
Also, when I said that we use PQexec, I did it just for an illustration: in practice we use the node-postgres JS library which sends multi-statement protocol messages. So - transaction wise - it works the same way as PQexec with multiple queries, but it returns responses for ALL queries in the batch, not just for the last one (very convenient BTW, saves on network round-trip latency). This mode is fully supported by PG wire protocol: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT
On Wed, Aug 3, 2022 at 12:32 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
<snip>
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E5038
(1 row)
=# insert into t values(0)\; select pg_current_wal_lsn();
INSERT 0 1
pg_current_wal_lsn
--------------------
0/68E5038
(1 row)
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E50A0
(1 row)
<snip>
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E75C8
(1 row)
=# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/68E7958