Thread: Duplicated LSN in ReorderBuffer
Hi hackers,
I believe we found a bug in logical decoding. It only occures with
casserts enabled. It was originally discovered and reproduced by Murat
Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got:
#0 0x00007facc66ef82f in raise () from /usr/lib/libc.so.6
#1 0x00007facc66da672 in abort () from /usr/lib/libc.so.6
#2 0x0000000000ac4ebf in ExceptionalCondition (
conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)",
errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38 "reorderbuffer.c",
lineNumber=680) at assert.c:54
#3 0x00000000008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at reorderbuffer.c:680
#4 0x00000000008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609, create=true,
is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559
#5 0x00000000008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128, xid=65609,
lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295, combocid=4294967295)
at reorderbuffer.c:2098
#6 0x00000000008b096b in SnapBuildProcessNewCid (builder=0x25d0158, xid=65610,
lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781
#7 0x000000000089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0) at decode.c:382
#8 0x000000000089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8, record=0x25ba378)
at decode.c:125
#9 0x00000000008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at logical.c:492
#10 0x00000000008b9c3d in CreateReplicationSlot (cmd=0x257be20) at walsender.c:957
#11 0x00000000008baa60 in exec_replication_command (
cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531
#12 0x0000000000937230 in PostgresMain (argc=1, argv=0x25233a8, dbname=0x2523380 "postgres",
username=0x24f23c8 "zilder") at postgres.c:4245
#13 0x0000000000881453 in BackendRun (port=0x251a900) at postmaster.c:4431
#14 0x0000000000880b4f in BackendStartup (port=0x251a900) at postmaster.c:4122
#15 0x000000000087cbbe in ServerLoop () at postmaster.c:1704
#16 0x000000000087c34a in PostmasterMain (argc=3, argv=0x24f0330) at postmaster.c:1377
#17 0x00000000007926b6 in main (argc=3, argv=0x24f0330) at main.c:228
After viewing coredump we see that
`prev_first_lsn == cur_txn->first_lsn`
The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs
with the same LSN, but different transaction ids: subxid and top-level
xid. See FIX part below.
STEPS TO REPRODUCE
------------------
We were able reproduce it on 10, 11 and on master branch. Postgres was
configured as:
./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12
Additional options in postgresql.conf:
wal_level='logical'
max_connections=1000
max_replication_slots=100
max_wal_senders=100
max_logical_replication_workers=100
pgbench scripts:
$ cat create_table.sql
BEGIN;
SAVEPOINT p1;
CREATE temp TABLE t_t (id INT) ON COMMIT DROP;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK;
$ cat create_slot.sql
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY;
SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(), 'pgoutput', true);
SELECT pg_drop_replication_slot('test' || pg_backend_pid());
ROLLBACK;
Run in parallel terminals:
$HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50
$HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50
It may take some time. On my local machine it breaks in few seconds.
FIX?
----
Can't say that i have enough understanding of what's going on in the
logical decoding code. But the one thing i've noticed is inconsistency
of xids used to make ReorderBufferTXNByXid() call:
1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is
called with subtransaction id; it actually creates ReorderBufferTXN
and adds it to reorder buffer's hash table and toplevel_by_lsn list;
2. second, within ReorderBufferXidSetCatalogChanges() it uses same
subxid to lookup the ReorderBufferTXN that was created before,
successfully;
3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction
id instead for lookup; it cannot find xid in hash table and tries to
add a new record with the same LSN. And it fails since this LSN is
already in toplevel_by_lsn list.
Attached is a simple patch that uses subxid instead of top-level xid
in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
i'm not sure that this is a valid change. Can someone please verify it
or maybe suggest a better solution for the issue?
Best regards,
Ildar
I believe we found a bug in logical decoding. It only occures with
casserts enabled. It was originally discovered and reproduced by Murat
Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got:
#0 0x00007facc66ef82f in raise () from /usr/lib/libc.so.6
#1 0x00007facc66da672 in abort () from /usr/lib/libc.so.6
#2 0x0000000000ac4ebf in ExceptionalCondition (
conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)",
errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38 "reorderbuffer.c",
lineNumber=680) at assert.c:54
#3 0x00000000008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at reorderbuffer.c:680
#4 0x00000000008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609, create=true,
is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559
#5 0x00000000008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128, xid=65609,
lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295, combocid=4294967295)
at reorderbuffer.c:2098
#6 0x00000000008b096b in SnapBuildProcessNewCid (builder=0x25d0158, xid=65610,
lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781
#7 0x000000000089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0) at decode.c:382
#8 0x000000000089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8, record=0x25ba378)
at decode.c:125
#9 0x00000000008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at logical.c:492
#10 0x00000000008b9c3d in CreateReplicationSlot (cmd=0x257be20) at walsender.c:957
#11 0x00000000008baa60 in exec_replication_command (
cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531
#12 0x0000000000937230 in PostgresMain (argc=1, argv=0x25233a8, dbname=0x2523380 "postgres",
username=0x24f23c8 "zilder") at postgres.c:4245
#13 0x0000000000881453 in BackendRun (port=0x251a900) at postmaster.c:4431
#14 0x0000000000880b4f in BackendStartup (port=0x251a900) at postmaster.c:4122
#15 0x000000000087cbbe in ServerLoop () at postmaster.c:1704
#16 0x000000000087c34a in PostmasterMain (argc=3, argv=0x24f0330) at postmaster.c:1377
#17 0x00000000007926b6 in main (argc=3, argv=0x24f0330) at main.c:228
After viewing coredump we see that
`prev_first_lsn == cur_txn->first_lsn`
The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs
with the same LSN, but different transaction ids: subxid and top-level
xid. See FIX part below.
STEPS TO REPRODUCE
------------------
We were able reproduce it on 10, 11 and on master branch. Postgres was
configured as:
./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12
Additional options in postgresql.conf:
wal_level='logical'
max_connections=1000
max_replication_slots=100
max_wal_senders=100
max_logical_replication_workers=100
pgbench scripts:
$ cat create_table.sql
BEGIN;
SAVEPOINT p1;
CREATE temp TABLE t_t (id INT) ON COMMIT DROP;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK;
$ cat create_slot.sql
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY;
SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(), 'pgoutput', true);
SELECT pg_drop_replication_slot('test' || pg_backend_pid());
ROLLBACK;
Run in parallel terminals:
$HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50
$HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50
It may take some time. On my local machine it breaks in few seconds.
FIX?
----
Can't say that i have enough understanding of what's going on in the
logical decoding code. But the one thing i've noticed is inconsistency
of xids used to make ReorderBufferTXNByXid() call:
1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is
called with subtransaction id; it actually creates ReorderBufferTXN
and adds it to reorder buffer's hash table and toplevel_by_lsn list;
2. second, within ReorderBufferXidSetCatalogChanges() it uses same
subxid to lookup the ReorderBufferTXN that was created before,
successfully;
3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction
id instead for lookup; it cannot find xid in hash table and tries to
add a new record with the same LSN. And it fails since this LSN is
already in toplevel_by_lsn list.
Attached is a simple patch that uses subxid instead of top-level xid
in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
i'm not sure that this is a valid change. Can someone please verify it
or maybe suggest a better solution for the issue?
Best regards,
Ildar
Attachment
On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote: > Attached is a simple patch that uses subxid instead of top-level xid > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but > i'm not sure that this is a valid change. Can someone please verify it > or maybe suggest a better solution for the issue? Hello Ildar, I hope someone more familiar with this code than me can comment, but while going through the Commitfest CI results I saw this segfault with your patch: https://travis-ci.org/postgresql-cfbot/postgresql/builds/555184304 At a glance, HistoricSnapshotGetTupleCids() returned NULL in HeapTupleSatisfiesHistoricMVCC(), so ResolveCminCmaxDuringDecoding() blew up. -- Thomas Munro https://enterprisedb.com
On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote: > > Attached is a simple patch that uses subxid instead of top-level xid > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but > > i'm not sure that this is a valid change. Can someone please verify it > > or maybe suggest a better solution for the issue? > I've reproduced this issue with script Ildar provided. I don't find out the root cause yet and I'm not sure the patch takes a correct way to fix this. In my environment, I got the following pg_waldump output and the logical decoding failed at 0/019FA058 when processing NEW_CID. 90489 rmgr: Transaction len (rec/tot): 38/ 38, tx: 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998: subxacts: 1999 90490 rmgr: Standby len (rec/tot): 405/ 405, tx: 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000 latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858 1914 1982 1958 1840 1920 1926 1992 1962 1 90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938 90491 rmgr: Heap2 len (rec/tot): 60/ 60, tx: 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo: 4294967295 90492 rmgr: Heap len (rec/tot): 127/ 127, tx: 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags 0x00, blkref #0: rel 1663/12678/2615 blk 11 I thought that the logical decoding doesn't create ReorderBufferTXN of xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of xid=1999 beforehand. But what actually happen is that it skips NEW_CID since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries for xid = 1999 and xid = 1998 as top-level transactions when processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999 and ReorderBufferAddNewTupleCids creates xid = 1998). And therefore it got the assertion failure when adding ReorderBufferTXN of xid = 1998. I'll look into this more deeply tomorrow. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jul 8, 2019 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote: > > > Attached is a simple patch that uses subxid instead of top-level xid > > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but > > > i'm not sure that this is a valid change. Can someone please verify it > > > or maybe suggest a better solution for the issue? > > > > I've reproduced this issue with script Ildar provided. I don't find > out the root cause yet and I'm not sure the patch takes a correct way > to fix this. > > In my environment, I got the following pg_waldump output and the > logical decoding failed at 0/019FA058 when processing NEW_CID. > > 90489 rmgr: Transaction len (rec/tot): 38/ 38, tx: > 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998: > subxacts: 1999 > 90490 rmgr: Standby len (rec/tot): 405/ 405, tx: > 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000 > latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954 > 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858 > 1914 1982 1958 1840 1920 1926 1992 1962 1 > 90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930 > 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938 > 90491 rmgr: Heap2 len (rec/tot): 60/ 60, tx: > 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel > 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo: > 4294967295 > 90492 rmgr: Heap len (rec/tot): 127/ 127, tx: > 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags > 0x00, blkref #0: rel 1663/12678/2615 blk 11 > > I thought that the logical decoding doesn't create ReorderBufferTXN of > xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of > xid=1999 beforehand. But what actually happen is that it skips NEW_CID > since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet > and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing > RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries > for xid = 1999 and xid = 1998 as top-level transactions when > processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999 > and ReorderBufferAddNewTupleCids creates xid = 1998). I think the cause of this bug would be that a ReorderBufferTXN entry of sub transaction is created as top-level transaction. And this happens because we skip to decode ASSIGNMENT during the state of snapshot builder < SNAPBUILD_FULL. @@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, */ ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn); - ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn, + ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn, xlrec->target_node, xlrec->target_tid, xlrec->cmin, xlrec->cmax, xlrec->combocid); The above change in the proposed patch changes SnapBuildProcessNewCid so that it passes sub transaction id instead of top transaction id to ReorderBufferAddNewTupleCids that adds a (relfilenode, tid) -> (cmin, cmax) mapping to the transaction. But I think the fix is not correct since as the comment of ReorderBufferTXN describes, the mappings are always assigned to the top-level transaction. in reorderbuffer.h, /* * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples. * Those are always assigned to the toplevel transaction. (Keep track of * #entries to create a hash of the right size) */ dlist_head tuplecids; uint64 ntuplecids; Instead, I wonder if we can decode ASSIGNMENT even when the state of snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the ReorderBufferTXN entries of both top transaction and sub transaction are created properly before we decode NEW_CID. Attached patch do that. In my environment the issue seems to be fixed but I'm still not confident that this is the right fix. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On 2019-Jul-09, Masahiko Sawada wrote: > I think the cause of this bug would be that a ReorderBufferTXN entry > of sub transaction is created as top-level transaction. And this > happens because we skip to decode ASSIGNMENT during the state of > snapshot builder < SNAPBUILD_FULL. That explanation seems to make sense. > Instead, I wonder if we can decode ASSIGNMENT even when the state of > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the > ReorderBufferTXN entries of both top transaction and sub transaction > are created properly before we decode NEW_CID. Yeah, that seems a sensible remediation to me. I would reduce the scope a little bit -- only create the assignment in the BUILDING state, and skip it in the START state. I'm not sure that it's possible to get assignments while in START state that are significant (I'm still trying to digest SnapBuildFindSnapshot). I would propose the attached. Andres, do you have an opinion on this? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, Petr, Simon, see the potential issue related to fast forward at the bottom. On 2019-07-26 18:46:35 -0400, Alvaro Herrera wrote: > On 2019-Jul-09, Masahiko Sawada wrote: > > > I think the cause of this bug would be that a ReorderBufferTXN entry > > of sub transaction is created as top-level transaction. And this > > happens because we skip to decode ASSIGNMENT during the state of > > snapshot builder < SNAPBUILD_FULL. > > That explanation seems to make sense. Yea. The comment that "in the assignment case we'll not decode those xacts" is true, but it misses that we *do* currently process XLOG_HEAP2_NEW_CID records for transactions that started before reaching FULL_SNAPSHOT. Thinking about it, it was not immediately clear to me that it is necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the cid mapping when decoding content of the transaction that the XLOG_HEAP2_NEW_CID record was about - which will not happen if it started before SNAPBUILD_FULL. Except that they also are responsible for signalling that a transaction performed catalog modifications (cf ReorderBufferXidSetCatalogChanges() call), which in turn is important for SnapBuildCommitTxn() to know whether to include that transaction needs to be included in historic snapshots. So unless I am missing something - which is entirely possible, I've had this code thoroughly swapped out - that means that we only need to process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions with relevant catalog changes, that don't have any invalidations messages. After thinking about it for a bit, that's not guaranteed however. For one, even for system catalog tables, looking at CacheInvalidateHeapTuple() et al there can be catalog modifications that create neither a snapshot invalidation message, nor a catcache one. There's also the remote scenario that we possibly might be only modifying a toast relation. But more importantly, the only modified table could be a user defined catalog table (i.e. WITH (user_catalog_table = true)). Which in all likelihood won't cause invalidation messages. So I think currently it is required to process NEW_ID records - although we don't need to actually execute the ReorderBufferAddNewTupleCids() etc calls. Perhaps the right fix for the future would actually be to not rely on on NEW_ID for recognizing transactions as such, but instead have an xact.c marker that signals whether a transaction performed catalog modifications. Hm, need to think more about this. > > Instead, I wonder if we can decode ASSIGNMENT even when the state of > > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the > > ReorderBufferTXN entries of both top transaction and sub transaction > > are created properly before we decode NEW_CID. > > Yeah, that seems a sensible remediation to me. That does seems like a reasonable approach. I can see two alternatives: 1) Change SnapBuildProcessNewCid()'s ReorderBufferXidSetCatalogChanges() call to reference the toplevel xid. That has the disadvantage that if the subtransaction that performed DDL rolls back, the main transaction will still be treated as a catalog transaction - i have a hard time seeing that being common, however. That'd then also require SnapBuildProcessNewCid() in SNAPBUILD_FULL_SNAPSHOT to return before processing any data assigned to subtransactions. Which would be good, because that's currently unnecessarily stored in memory. 2) We could simply assign the subtransaction to the parent using ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's caller. That ought to also fix the bug I also has the advantage that we can save some memory in transactions that have some, but fewer than the ASSIGNMENT limit subtransactions, because it allows us to avoid having a separate base snapshot for them (c.f. ReorderBufferTransferSnapToParent()). Like 1) that could be combined with adding an early return when < SNAPBUILD_FULL_SNAPSHOT, after ReorderBufferXidSetCatalogChanges(), but I don't think it'd be required for correctness in contrast to 1). Both of these would have the advantage that we only would track additional information for transactions that have modified the catalog, whereas the proposal to process ASSIGNMENT earlier, would mean that we additionally track all transactions with more than 64 children. So provided that I didn't mis-analyze here, I think both of my alternatives are preferrable? I think 2) is simpler? > /* > - * No point in doing anything yet, data could not be decoded anyway. It's > - * ok not to call ReorderBufferProcessXid() in that case, except in the > - * assignment case there'll not be any later records with the same xid; > - * and in the assignment case we'll not decode those xacts. > + * If the snapshot isn't yet fully built, we cannot decode anything, so > + * bail out. > + * > + * However, it's critical to process XLOG_XACT_ASSIGNMENT records even > + * when the snapshot is being built: it is possible to get later records > + * that require subxids to be properly assigned. > */ I think I would want this comment to be slightly more expansive. It's not exactly obvious why such records would exist, at least to me. I can't quite come up with something much shorter than the above braindump right now however. I'll try to come up with something more concise. Probably worthwhile to add somewhere, even if we go for one of my alternative proposals. This actually made me look at the nearby changes due to commit 9c7d06d60680c7f00d931233873dee81fdb311c6 Author: Simon Riggs <simon@2ndQuadrant.com> Date: 2018-01-17 11:38:34 +0000 Ability to advance replication slots and uhm, I'm not sure they're fully baked. Something like: /* * If we don't have snapshot or we are just fast-forwarding, there is no * point in decoding changes. */ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || ctx->fast_forward) return; case XLOG_HEAP2_MULTI_INSERT: if (!ctx->fast_forward && SnapBuildProcessChange(builder, xid, buf->origptr)) DecodeMultiInsert(ctx, buf); break; is not very suggestive of that (note the second check). And related to the point of the theorizing above, I don't think skipping XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID record does not imply an invalidation message as discussed above, we'll afaict compute wrong snapshots when such transactions are encountered during forwarding. And we'll then log those snapshots to disk. Which then means the slot cannot safely be used for actual decoding anymore - as we'll use that snapshot when starting up decoding without fast forwarding. Greetings, Andres Freund
Hi, On 27. 07. 19 3:15, Andres Freund wrote: > Hi, > > Petr, Simon, see the potential issue related to fast forward at the > bottom. > > [..snip..] > > This actually made me look at the nearby changes due to > > commit 9c7d06d60680c7f00d931233873dee81fdb311c6 > Author: Simon Riggs <simon@2ndQuadrant.com> > Date: 2018-01-17 11:38:34 +0000 > > Ability to advance replication slots > > and uhm, I'm not sure they're fully baked. Something like: > > /* > * If we don't have snapshot or we are just fast-forwarding, there is no > * point in decoding changes. > */ > if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || > ctx->fast_forward) > return; > > case XLOG_HEAP2_MULTI_INSERT: > if (!ctx->fast_forward && > SnapBuildProcessChange(builder, xid, buf->origptr)) > DecodeMultiInsert(ctx, buf); > break; > > is not very suggestive of that (note the second check). > You mean that it's redundant, yeah.., although given your next point, see bellow. > > And related to the point of the theorizing above, I don't think skipping > XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID > record does not imply an invalidation message as discussed above, we'll > afaict compute wrong snapshots when such transactions are encountered > during forwarding. And we'll then log those snapshots to disk. Which > then means the slot cannot safely be used for actual decoding anymore - > as we'll use that snapshot when starting up decoding without fast > forwarding. > Hmm, I guess that's true. I think I have convinced myself that CID does not matter outside of this transaction, but since we might actually share the computed snapshot via file save/restore with other slots, any non-fast-forwarding decoding that reads the same transaction could miss the CID thanks to the shared snapshot which does not include it. Given that we don't process any other records in this function besides XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest fix is to just remove the first check for fast forward and keep the one in XLOG_HEAP2_MULTI_INSERT. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Mon, 29 Jul 2019 at 00:09, Petr Jelinek <petr@2ndquadrant.com> wrote:
Given that we don't process any other records in this function besides
XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest
fix is to just remove the first check for fast forward and keep the one
in XLOG_HEAP2_MULTI_INSERT.
Fix proposed by Petr, with comments as explained by Andres.
Attachment
On 2019-Jul-26, Andres Freund wrote: > 2) We could simply assign the subtransaction to the parent using > ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's > caller. That ought to also fix the bug > > I also has the advantage that we can save some memory in transactions > that have some, but fewer than the ASSIGNMENT limit subtransactions, > because it allows us to avoid having a separate base snapshot for > them (c.f. ReorderBufferTransferSnapToParent()). I'm not sure I understood this suggestion correctly. I first tried with this, which seems the simplest rendition: --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, { CommandId cid; + if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) && + (xlrec->top_xid != xid)) + { + ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn); + } + /* * we only log new_cid's if a catalog tuple was modified, so mark the * transaction as containing catalog modifications test_decoding's tests pass with that, but if I try the example script provided by Ildar, all pgbench clients die with this: client 19 script 1 aborted in command 1 query 0: ERROR: subtransaction logged without previous top-level txn record I thought I would create the main txn before calling AssignChild in snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c. So that seems out. My next try was to remove the elog() that was causing the failure ... but that leads pretty quickly to a crash with this backtrace: #2 0x00005653241fb823 in ExceptionalCondition (conditionName=conditionName@entry=0x5653243c1960 "!(prev_first_lsn < cur_txn->first_lsn)", errorType=errorType@entry=0x565324250596 "FailedAssertion", fileName=fileName@entry=0x5653243c18e8 "/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c", lineNumber=lineNumber@entry=680) at /pgsql/source/master/src/backend/utils/error/assert.c:54 #3 0x0000565324062a84 in AssertTXNLsnOrder (rb=rb@entry=0x565326304fa8) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:680 #4 0x0000565324062e39 in ReorderBufferTXNByXid (rb=rb@entry=0x565326304fa8, xid=<optimized out>, xid@entry=185613, create=create@entry=true, is_new=is_new@entry=0x0, lsn=lsn@entry=2645271944, create_as_top=create_as_top@entry=true) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:559 #5 0x0000565324067365 in ReorderBufferAddNewTupleCids (rb=0x565326304fa8, xid=185613, lsn=lsn@entry=2645271944, node=...,tid=..., cmin=0, cmax=4294967295, combocid=4294967295) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:2100 #6 0x0000565324069451 in SnapBuildProcessNewCid (builder=0x56532630afd8, xid=185614, lsn=2645271944, xlrec=0x5653262efc78) at /pgsql/source/master/src/backend/replication/logical/snapbuild.c:787 Now this failure goes away if I relax the < to <= in the complained-about line ... but at this point it's two sanity checks that I've lobotomized in order to get this to run at all. Not really comfortable with that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2019-08-07 16:19:13 -0400, Alvaro Herrera wrote: > On 2019-Jul-26, Andres Freund wrote: > > > 2) We could simply assign the subtransaction to the parent using > > ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's > > caller. That ought to also fix the bug > > > > I also has the advantage that we can save some memory in transactions > > that have some, but fewer than the ASSIGNMENT limit subtransactions, > > because it allows us to avoid having a separate base snapshot for > > them (c.f. ReorderBufferTransferSnapToParent()). > > I'm not sure I understood this suggestion correctly. I first tried with > this, which seems the simplest rendition: > > --- a/src/backend/replication/logical/snapbuild.c > +++ b/src/backend/replication/logical/snapbuild.c > @@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, > { > CommandId cid; > > + if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) && > + (xlrec->top_xid != xid)) > + { > + ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn); > + } > + I think we would need to do this for all values of SnapBuildCurrentState() - after all the problem occurs because we *previously* didn't assign subxids to the toplevel xid. Compared to the cost of catalog changes, ReorderBufferAssignChild() is really cheap. So I don't think there's any problem just calling it unconditionally (when top_xid <> xid, of course). If the above is the only change, I think the body of the if should be unreachable, DecodeHeap2Op guards against that: /* * If we don't have snapshot or we are just fast-forwarding, there is no * point in decoding changes. */ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || ctx->fast_forward) return; > I thought I would create the main txn before calling AssignChild in > snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c. > So that seems out. There shouldn't be any need for doing that, ReorderBufferAssignChild does that. Greetings, Andres Freund
On 2019-Aug-07, Andres Freund wrote: > I think we would need to do this for all values of > SnapBuildCurrentState() - after all the problem occurs because we > *previously* didn't assign subxids to the toplevel xid. Compared to the > cost of catalog changes, ReorderBufferAssignChild() is really cheap. So > I don't think there's any problem just calling it unconditionally (when > top_xid <> xid, of course). BTW I wrote the code as suggested and it passes all the tests ... but I then noticed that the unpatched code doesn't fail Ildar's original pgbench-based test for me, either. So maybe my laptop is not powerful enough to reproduce it, or maybe I'm doing something wrong. I'm tempted to just push it, since it seems "obviously" more correct than the original. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Aug-07, Andres Freund wrote: > > > I think we would need to do this for all values of > > SnapBuildCurrentState() - after all the problem occurs because we > > *previously* didn't assign subxids to the toplevel xid. Compared to the > > cost of catalog changes, ReorderBufferAssignChild() is really cheap. So > > I don't think there's any problem just calling it unconditionally (when > > top_xid <> xid, of course). > > BTW I wrote the code as suggested and it passes all the tests ... but I > then noticed that the unpatched code doesn't fail Ildar's original > pgbench-based test for me, either. So maybe my laptop is not powerful > enough to reproduce it, or maybe I'm doing something wrong. If you share the patch fixing this issue I'll test it on my environment where I could reproduce the original problem. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2019-Aug-19, Masahiko Sawada wrote: > On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > BTW I wrote the code as suggested and it passes all the tests ... but I > > then noticed that the unpatched code doesn't fail Ildar's original > > pgbench-based test for me, either. So maybe my laptop is not powerful > > enough to reproduce it, or maybe I'm doing something wrong. > > If you share the patch fixing this issue I'll test it on my > environment where I could reproduce the original problem. Never mind. I was able to reproduce it later, and verify that Andres' proposed strategy doesn't seem to fix the problem. I'm going to study the problem again today. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >On 2019-Aug-19, Masahiko Sawada wrote: > >> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera ><alvherre@2ndquadrant.com> wrote: > >> > BTW I wrote the code as suggested and it passes all the tests ... >but I >> > then noticed that the unpatched code doesn't fail Ildar's original >> > pgbench-based test for me, either. So maybe my laptop is not >powerful >> > enough to reproduce it, or maybe I'm doing something wrong. >> >> If you share the patch fixing this issue I'll test it on my >> environment where I could reproduce the original problem. > >Never mind. I was able to reproduce it later, and verify that Andres' >proposed strategy doesn't seem to fix the problem. I'm going to study >the problem again today. Could you post the patch? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi, On 2019-08-19 08:51:43 -0700, Andres Freund wrote: > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >On 2019-Aug-19, Masahiko Sawada wrote: > > > >> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera > ><alvherre@2ndquadrant.com> wrote: > > > >> > BTW I wrote the code as suggested and it passes all the tests ... > >but I > >> > then noticed that the unpatched code doesn't fail Ildar's original > >> > pgbench-based test for me, either. So maybe my laptop is not > >powerful > >> > enough to reproduce it, or maybe I'm doing something wrong. > >> > >> If you share the patch fixing this issue I'll test it on my > >> environment where I could reproduce the original problem. > > > >Never mind. I was able to reproduce it later, and verify that Andres' > >proposed strategy doesn't seem to fix the problem. I'm going to study > >the problem again today. > > Could you post the patch? Ping? Greetings, Andres Freund
Hello, On 2019-Sep-06, Andres Freund wrote: > On 2019-08-19 08:51:43 -0700, Andres Freund wrote: > > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > >Never mind. I was able to reproduce it later, and verify that Andres' > > >proposed strategy doesn't seem to fix the problem. I'm going to study > > >the problem again today. > > > > Could you post the patch? Here's a couple of patches. always_decode_assignment.patch is Masahiko Sawada's patch, which has been confirmed to fix the assertion failure. assign-child.patch is what I understood you were proposing -- namely to assign the subxid to the top-level xid on NEW_CID. In order for it to work at all, I had to remove a different safety check; but the assertion still hits when running Ildar's test case. So the patch doesn't actually fix anything. And I think it makes sense that it fails, since the first thing that's happening in this patch is that we create both the top-level xact and the subxact with the same LSN value, which is what triggers the assertion in the first place. It's possible that I misunderstood what you were suggesting. If you want to propose a different fix, be my guest, but failing that I'm inclined to push always_decode_assignment.patch sometime before the end of the week. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote: > Here's a couple of patches. > > always_decode_assignment.patch is Masahiko Sawada's patch, which has > been confirmed to fix the assertion failure. I pushed this one to all branches. Thanks Ildar for reporting and Sawada-san for fixing, and reviewers. If you (Andres) want to propose a different fix for this, be my guest. We can always revert this one if you have a different better fix. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-09-13 16:42:47 -0300, Alvaro Herrera wrote: > On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote: > > > Here's a couple of patches. > > > > always_decode_assignment.patch is Masahiko Sawada's patch, which has > > been confirmed to fix the assertion failure. > > I pushed this one to all branches. Thanks Ildar for reporting and > Sawada-san for fixing, and reviewers. > > If you (Andres) want to propose a different fix for this, be my guest. > We can always revert this one if you have a different better fix. I'm bit surprised to see this go in just now, after I asked for the changes you were reporting as not working for three weeks, and you sent them out three days ago (during which I was at the linux plumbers conference)... Greetings, Andres Freund
On 2019-Jul-26, Andres Freund wrote: > Petr, Simon, see the potential issue related to fast forward at the > bottom. I think we neglected this bit. I looked at the patch Simon submitted downthread, and while I vaguely understand that we need to process NEW_CID records during fast-forwarding, I don't quite understand why we still can skip XLOG_INVALIDATION messages. I *think* we should process those too. Here's a patch that also contains that change; I also reworded Simon's proposed comment. I appreciate reviews. Thoughts? Relevant extracts from Andres' message below. > Thinking about it, it was not immediately clear to me that it is > necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the > cid mapping when decoding content of the transaction that the > XLOG_HEAP2_NEW_CID record was about - which will not happen if it > started before SNAPBUILD_FULL. > > Except that they also are responsible for signalling that a transaction > performed catalog modifications (cf ReorderBufferXidSetCatalogChanges() > call), which in turn is important for SnapBuildCommitTxn() to know > whether to include that transaction needs to be included in historic > snapshots. > > So unless I am missing something - which is entirely possible, I've had > this code thoroughly swapped out - that means that we only need to > process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions > with relevant catalog changes, that don't have any invalidations > messages. > > After thinking about it for a bit, that's not guaranteed however. For > one, even for system catalog tables, looking at > CacheInvalidateHeapTuple() et al there can be catalog modifications that > create neither a snapshot invalidation message, nor a catcache > one. There's also the remote scenario that we possibly might be only > modifying a toast relation. > > But more importantly, the only modified table could be a user defined > catalog table (i.e. WITH (user_catalog_table = true)). Which in all > likelihood won't cause invalidation messages. So I think currently it is > required to process NEW_ID records - although we don't need to actually > execute the ReorderBufferAddNewTupleCids() etc calls. [...] > And related to the point of the theorizing above, I don't think skipping > XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID > record does not imply an invalidation message as discussed above, we'll > afaict compute wrong snapshots when such transactions are encountered > during forwarding. And we'll then log those snapshots to disk. Which > then means the slot cannot safely be used for actual decoding anymore - > as we'll use that snapshot when starting up decoding without fast > forwarding. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Jan 31, 2020 at 12:35 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Jul-26, Andres Freund wrote: > > > Petr, Simon, see the potential issue related to fast forward at the > > bottom. > > I think we neglected this bit. I looked at the patch Simon submitted > downthread, and while I vaguely understand that we need to process > NEW_CID records during fast-forwarding, > Right, IIUC, that is mainly to mark txn has catalog changes (ReorderBufferXidSetCatalogChanges) so that later such a transaction can be used to build historic snapshots (during SnapBuildCommitTxn). Now, if that is true, then won't we need a similar change in DecodeHeapOp for XLOG_HEAP_INPLACE case as well? Also, I am not sure if SnapBuildProcessNewCid needs to create Cid map in such a case. > I don't quite understand why we > still can skip XLOG_INVALIDATION messages. I *think* we should process > those too. > I also think so. If you are allowing to execute invalidation irrespective of fast_forward in DecodeStandbyOp, then why not do the same in DecodeCommit where we add ReorderBufferAddInvalidations? > Here's a patch that also contains that change; I also > reworded Simon's proposed comment. I appreciate reviews. > Even though, in theory, the changes look to be in the right direction, but it is better if we can create a test case to reproduce the problem. I am not sure, but I think we need to generate a few DDLs for the transaction for which we want to fast forward and then after moving forward the DMLs dependent on those WAL should create some problem as we have skipped executing invalidations and addition of such a transaction in the historic snapshot. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com