I'm not sure this fits -hackers..
At Tue, 2 Aug 2022 18:57:41 -0700, Dmitry Koterov <dmitry.koterov@gmail.com> wrote in
> Hi.
>
> Suppose on master, I run a *multi-query* using PQexec and save the value
> returned by pg_current_wal_insert_lsn:
>
> master_lsn = query(master, "INSERT INTO some VALUES (...); SELECT
> pg_current_wal_insert_lsn()")
>
> Then I run a PQexec query on a replica and save the value returned by
> pg_last_wal_replay_lsn:
>
> replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()")
>
> The question to experts in PG internals: *is it guaranteed that, as long as
> replica_lsn >= master_lsn (GREATER OR EQUAL, not just greater), then a
> subsequent read from replica will always return me the inserted record*
> (i.e. the replica is up to date), considering noone updates/deletes in that
> table?
https://www.postgresql.org/docs/devel/libpq-exec.html
> The command string can include multiple SQL commands (separated by
> semicolons). Multiple queries sent in a single PQexec call are
> processed in a single transaction, unless there are explicit
> BEGIN/COMMIT commands included in the query string to divide it into
> multiple transactions.
If the query() runs PQexec() with the same string, the call to
pg_current_wal_insert_lsn() is made before the insert is commited.
That behavior can be emulated on psql. (The backslash before semicolon
is crucial. It lets the connected queries be sent in a single
PQexec())
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E5038
(1 row)
=# insert into t values(0)\; select pg_current_wal_lsn();
INSERT 0 1
pg_current_wal_lsn
--------------------
0/68E5038
(1 row)
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E50A0
(1 row)
$ pg_waldump -s'0/68E5038' -e'0/68E50A0' $PGDATA/pg_wal/000000010000000000000006
rmgr: Heap len (rec/tot): 59/ 59, tx: 770, lsn: 0/068E5038, prev 0/068E5000, desc: INSERT off 15
flags0x00, blkref #0: rel 1663/5/16424 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 770, lsn: 0/068E5078, prev 0/068E5038, desc: COMMIT
2022-08-0315:49:43.749158 JST
So, the replica cannot show the inserted data at the LSN the function
returned. If you explicitly ended transaction before
pg_current_wal_insert_lsn() call, the expected LSN would be returned.
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E75C8
(1 row)
=# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/68E7958
$ pg_waldump -s'0/68E75C8' -e'0/68E7958' $PGDATA/pg_wal/000000010000000000000006
prev 0/068E7590, desc: INSERT off 22 flags 0x00, blkref #0: rel 1663/5/16424 blk 0 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 777, lsn: 0/068E7930, prev 0/068E75C8, desc: COMMIT
2022-08-0316:09:13.516498 JST
> I'm asking, because according to some hints in the docs, this should be
> true. But for some reason, we have to use "greater" (not "greater or
> equals") condition in the real code, since with just ">=" the replica
> doesn't sometimes read the written data.
Thus the wrong LSN appears to have caused the behavior.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center