Thread: Duplicated LSN in ReorderBuffer

Duplicated LSN in ReorderBuffer

From
Ildar Musin
Date:
Hi hackers,

I believe we found a bug in logical decoding. It only occures with
casserts enabled. It was originally discovered and reproduced by Murat
Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got:

#0  0x00007facc66ef82f in raise () from /usr/lib/libc.so.6
#1  0x00007facc66da672 in abort () from /usr/lib/libc.so.6
#2  0x0000000000ac4ebf in ExceptionalCondition (
    conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)",
    errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38 "reorderbuffer.c",
    lineNumber=680) at assert.c:54
#3  0x00000000008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at reorderbuffer.c:680
#4  0x00000000008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609, create=true,
    is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559
#5  0x00000000008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128, xid=65609,
    lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295, combocid=4294967295)
    at reorderbuffer.c:2098
#6  0x00000000008b096b in SnapBuildProcessNewCid (builder=0x25d0158, xid=65610,
    lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781
#7  0x000000000089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0) at decode.c:382
#8  0x000000000089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8, record=0x25ba378)
    at decode.c:125
#9  0x00000000008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at logical.c:492
#10 0x00000000008b9c3d in CreateReplicationSlot (cmd=0x257be20) at walsender.c:957
#11 0x00000000008baa60 in exec_replication_command (
    cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531
#12 0x0000000000937230 in PostgresMain (argc=1, argv=0x25233a8, dbname=0x2523380 "postgres",
    username=0x24f23c8 "zilder") at postgres.c:4245
#13 0x0000000000881453 in BackendRun (port=0x251a900) at postmaster.c:4431
#14 0x0000000000880b4f in BackendStartup (port=0x251a900) at postmaster.c:4122
#15 0x000000000087cbbe in ServerLoop () at postmaster.c:1704
#16 0x000000000087c34a in PostmasterMain (argc=3, argv=0x24f0330) at postmaster.c:1377
#17 0x00000000007926b6 in main (argc=3, argv=0x24f0330) at main.c:228

After viewing coredump we see that
`prev_first_lsn == cur_txn->first_lsn`

The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs
with the same LSN, but different transaction ids: subxid and top-level
xid. See FIX part below.


STEPS TO REPRODUCE
------------------

We were able reproduce it on 10, 11 and on master branch. Postgres was
configured as:

./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12

Additional options in postgresql.conf:

wal_level='logical'
max_connections=1000
max_replication_slots=100
max_wal_senders=100
max_logical_replication_workers=100

pgbench scripts:

$ cat create_table.sql
BEGIN;
SAVEPOINT p1;
CREATE temp TABLE t_t (id INT) ON COMMIT DROP;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK;

$ cat create_slot.sql
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY;
SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(), 'pgoutput', true);
SELECT pg_drop_replication_slot('test' || pg_backend_pid());
ROLLBACK;

Run in parallel terminals:

$HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50
$HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50

It may take some time. On my local machine it breaks in few seconds.


FIX?
----

Can't say that i have enough understanding of what's going on in the
logical decoding code. But the one thing i've noticed is inconsistency
of xids used to make ReorderBufferTXNByXid() call:

1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is
called with subtransaction id; it actually creates ReorderBufferTXN
and adds it to reorder buffer's hash table and toplevel_by_lsn list;
2. second, within ReorderBufferXidSetCatalogChanges() it uses same
subxid to lookup the ReorderBufferTXN that was created before,
successfully;
3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction
id instead for lookup; it cannot find xid in hash table and tries to
add a new record with the same LSN. And it fails since this LSN is
already in toplevel_by_lsn list.

Attached is a simple patch that uses subxid instead of top-level xid
in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
i'm not sure that this is a valid change. Can someone please verify it
or maybe suggest a better solution for the issue?

Best regards,
Ildar
Attachment

Re: Duplicated LSN in ReorderBuffer

From
Thomas Munro
Date:
On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote:
> Attached is a simple patch that uses subxid instead of top-level xid
> in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> i'm not sure that this is a valid change. Can someone please verify it
> or maybe suggest a better solution for the issue?

Hello Ildar,

I hope someone more familiar with this code than me can comment, but
while going through the Commitfest CI results I saw this segfault with
your patch:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555184304

At a glance, HistoricSnapshotGetTupleCids() returned NULL in
HeapTupleSatisfiesHistoricMVCC(), so ResolveCminCmaxDuringDecoding()
blew up.

-- 
Thomas Munro
https://enterprisedb.com



Re: Duplicated LSN in ReorderBuffer

From
Masahiko Sawada
Date:
On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote:
> > Attached is a simple patch that uses subxid instead of top-level xid
> > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > i'm not sure that this is a valid change. Can someone please verify it
> > or maybe suggest a better solution for the issue?
>

I've reproduced this issue with script Ildar provided. I don't find
out the root cause yet and I'm not sure the patch takes a correct way
to fix this.

In my environment, I got the following pg_waldump output and the
logical decoding failed at 0/019FA058 when processing NEW_CID.

  90489 rmgr: Transaction len (rec/tot):     38/    38, tx:
1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
subxacts: 1999
  90490 rmgr: Standby     len (rec/tot):    405/   405, tx:
0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
1914 1982 1958 1840 1920 1926 1992 1962 1
  90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
  90491 rmgr: Heap2       len (rec/tot):     60/    60, tx:
1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
4294967295
  90492 rmgr: Heap        len (rec/tot):    127/   127, tx:
1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
0x00, blkref #0: rel 1663/12678/2615 blk 11

I thought that the logical decoding doesn't create ReorderBufferTXN of
xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
xid=1999 beforehand. But what actually happen is that it skips NEW_CID
since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
for xid = 1999 and xid = 1998 as top-level transactions when
processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
and ReorderBufferAddNewTupleCids creates xid = 1998). And therefore it
got the assertion failure when adding ReorderBufferTXN of xid = 1998.

I'll look into this more deeply tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Duplicated LSN in ReorderBuffer

From
Masahiko Sawada
Date:
On Mon, Jul 8, 2019 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar@adjust.com> wrote:
> > > Attached is a simple patch that uses subxid instead of top-level xid
> > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > > i'm not sure that this is a valid change. Can someone please verify it
> > > or maybe suggest a better solution for the issue?
> >
>
> I've reproduced this issue with script Ildar provided. I don't find
> out the root cause yet and I'm not sure the patch takes a correct way
> to fix this.
>
> In my environment, I got the following pg_waldump output and the
> logical decoding failed at 0/019FA058 when processing NEW_CID.
>
>   90489 rmgr: Transaction len (rec/tot):     38/    38, tx:
> 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
> subxacts: 1999
>   90490 rmgr: Standby     len (rec/tot):    405/   405, tx:
> 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
> latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
> 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
> 1914 1982 1958 1840 1920 1926 1992 1962 1
>   90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
> 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
>   90491 rmgr: Heap2       len (rec/tot):     60/    60, tx:
> 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
> 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
> 4294967295
>   90492 rmgr: Heap        len (rec/tot):    127/   127, tx:
> 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
> 0x00, blkref #0: rel 1663/12678/2615 blk 11
>
> I thought that the logical decoding doesn't create ReorderBufferTXN of
> xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
> xid=1999 beforehand. But what actually happen is that it skips NEW_CID
> since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
> and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
> RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
> for xid = 1999 and xid = 1998 as top-level transactions when
> processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
> and ReorderBufferAddNewTupleCids creates xid = 1998).

I think the cause of this bug would be that a ReorderBufferTXN entry
of sub transaction is created as top-level transaction. And this
happens because we skip to decode ASSIGNMENT during the state of
snapshot builder < SNAPBUILD_FULL.

@@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder,
TransactionId xid,
  */
  ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

- ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn,
  xlrec->target_node, xlrec->target_tid,
  xlrec->cmin, xlrec->cmax,
  xlrec->combocid);

The above change in the proposed patch changes SnapBuildProcessNewCid
so that it passes sub transaction id instead of top transaction id to
ReorderBufferAddNewTupleCids that adds a (relfilenode, tid) -> (cmin,
cmax) mapping to the transaction. But I think the fix is not correct
since as the comment of ReorderBufferTXN describes, the mappings are
always assigned to the top-level transaction.

in reorderbuffer.h,
    /*
     * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
     * Those are always assigned to the toplevel transaction. (Keep track of
     * #entries to create a hash of the right size)
     */
    dlist_head  tuplecids;
    uint64      ntuplecids;

Instead, I wonder if we can decode ASSIGNMENT even when the state of
snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
ReorderBufferTXN entries of both top transaction and sub transaction
are created properly before we decode NEW_CID.

Attached patch do that. In my environment the issue seems to be fixed
but I'm still not confident that this is the right fix. Please review
it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Jul-09, Masahiko Sawada wrote:

> I think the cause of this bug would be that a ReorderBufferTXN entry
> of sub transaction is created as top-level transaction. And this
> happens because we skip to decode ASSIGNMENT during the state of
> snapshot builder < SNAPBUILD_FULL.

That explanation seems to make sense.

> Instead, I wonder if we can decode ASSIGNMENT even when the state of
> snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
> ReorderBufferTXN entries of both top transaction and sub transaction
> are created properly before we decode NEW_CID.

Yeah, that seems a sensible remediation to me.

I would reduce the scope a little bit -- only create the assignment in
the BUILDING state, and skip it in the START state.  I'm not sure that
it's possible to get assignments while in START state that are
significant (I'm still trying to digest SnapBuildFindSnapshot).

I would propose the attached.  Andres, do you have an opinion on this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Duplicated LSN in ReorderBuffer

From
Andres Freund
Date:
Hi,

Petr, Simon, see the potential issue related to fast forward at the
bottom.


On 2019-07-26 18:46:35 -0400, Alvaro Herrera wrote:
> On 2019-Jul-09, Masahiko Sawada wrote:
>
> > I think the cause of this bug would be that a ReorderBufferTXN entry
> > of sub transaction is created as top-level transaction. And this
> > happens because we skip to decode ASSIGNMENT during the state of
> > snapshot builder < SNAPBUILD_FULL.
>
> That explanation seems to make sense.

Yea. The comment that "in the assignment case we'll not decode those
xacts" is true, but it misses that we *do* currently process
XLOG_HEAP2_NEW_CID records for transactions that started before reaching
FULL_SNAPSHOT.

Thinking about it, it was not immediately clear to me that it is
necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the
cid mapping when decoding content of the transaction that the
XLOG_HEAP2_NEW_CID record was about - which will not happen if it
started before SNAPBUILD_FULL.

Except that they also are responsible for signalling that a transaction
performed catalog modifications (cf ReorderBufferXidSetCatalogChanges()
call), which in turn is important for SnapBuildCommitTxn() to know
whether to include that transaction needs to be included in historic
snapshots.

So unless I am missing something - which is entirely possible, I've had
this code thoroughly swapped out - that means that we only need to
process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions
with relevant catalog changes, that don't have any invalidations
messages.

After thinking about it for a bit, that's not guaranteed however. For
one, even for system catalog tables, looking at
CacheInvalidateHeapTuple() et al there can be catalog modifications that
create neither a snapshot invalidation message, nor a catcache
one. There's also the remote scenario that we possibly might be only
modifying a toast relation.

But more importantly, the only modified table could be a user defined
catalog table (i.e. WITH (user_catalog_table = true)). Which in all
likelihood won't cause invalidation messages. So I think currently it is
required to process NEW_ID records - although we don't need to actually
execute the ReorderBufferAddNewTupleCids() etc calls.

Perhaps the right fix for the future would actually be to not rely on on
NEW_ID for recognizing transactions as such, but instead have an xact.c
marker that signals whether a transaction performed catalog
modifications.

Hm, need to think more about this.


> > Instead, I wonder if we can decode ASSIGNMENT even when the state of
> > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
> > ReorderBufferTXN entries of both top transaction and sub transaction
> > are created properly before we decode NEW_CID.
>
> Yeah, that seems a sensible remediation to me.

That does seems like a reasonable approach. I can see two alternatives:

1) Change SnapBuildProcessNewCid()'s ReorderBufferXidSetCatalogChanges()
   call to reference the toplevel xid. That has the disadvantage that if
   the subtransaction that performed DDL rolls back, the main
   transaction will still be treated as a catalog transaction - i have a
   hard time seeing that being common, however.

   That'd then also require SnapBuildProcessNewCid() in
   SNAPBUILD_FULL_SNAPSHOT to return before processing any data assigned
   to subtransactions. Which would be good, because that's currently
   unnecessarily stored in memory.

2) We could simply assign the subtransaction to the parent using
   ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
   caller. That ought to also fix the bug

   I also has the advantage that we can save some memory in transactions
   that have some, but fewer than the ASSIGNMENT limit subtransactions,
   because it allows us to avoid having a separate base snapshot for
   them (c.f. ReorderBufferTransferSnapToParent()).

   Like 1) that could be combined with adding an early return when <
   SNAPBUILD_FULL_SNAPSHOT, after ReorderBufferXidSetCatalogChanges(),
   but I don't think it'd be required for correctness in contrast to 1).

Both of these would have the advantage that we only would track
additional information for transactions that have modified the catalog,
whereas the proposal to process ASSIGNMENT earlier, would mean that we
additionally track all transactions with more than 64 children.  So
provided that I didn't mis-analyze here, I think both of my alternatives
are preferrable? I think 2) is simpler?


>      /*
> -     * No point in doing anything yet, data could not be decoded anyway. It's
> -     * ok not to call ReorderBufferProcessXid() in that case, except in the
> -     * assignment case there'll not be any later records with the same xid;
> -     * and in the assignment case we'll not decode those xacts.
> +     * If the snapshot isn't yet fully built, we cannot decode anything, so
> +     * bail out.
> +     *
> +     * However, it's critical to process XLOG_XACT_ASSIGNMENT records even
> +     * when the snapshot is being built: it is possible to get later records
> +     * that require subxids to be properly assigned.
>       */

I think I would want this comment to be slightly more expansive. It's
not exactly obvious why such records would exist, at least to me. I
can't quite come up with something much shorter than the above braindump
right now however. I'll try to come up with something more concise.
Probably worthwhile to add somewhere, even if we go for one of my
alternative proposals.


This actually made me look at the nearby changes due to

commit 9c7d06d60680c7f00d931233873dee81fdb311c6
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   2018-01-17 11:38:34 +0000

    Ability to advance replication slots

and uhm, I'm not sure they're fully baked. Something like:

    /*
     * If we don't have snapshot or we are just fast-forwarding, there is no
     * point in decoding changes.
     */
    if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
        ctx->fast_forward)
        return;

        case XLOG_HEAP2_MULTI_INSERT:
            if (!ctx->fast_forward &&
                SnapBuildProcessChange(builder, xid, buf->origptr))
                DecodeMultiInsert(ctx, buf);
            break;

is not very suggestive of that (note the second check).


And related to the point of the theorizing above, I don't think skipping
XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
record does not imply an invalidation message as discussed above, we'll
afaict compute wrong snapshots when such transactions are encountered
during forwarding. And we'll then log those snapshots to disk. Which
then means the slot cannot safely be used for actual decoding anymore -
as we'll use that snapshot when starting up decoding without fast
forwarding.

Greetings,

Andres Freund



Re: Duplicated LSN in ReorderBuffer

From
Petr Jelinek
Date:
Hi,

On 27. 07. 19 3:15, Andres Freund wrote:
> Hi,
> 
> Petr, Simon, see the potential issue related to fast forward at the
> bottom.
> 
> [..snip..]
> 
> This actually made me look at the nearby changes due to
> 
> commit 9c7d06d60680c7f00d931233873dee81fdb311c6
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date:   2018-01-17 11:38:34 +0000
> 
>      Ability to advance replication slots
> 
> and uhm, I'm not sure they're fully baked. Something like:
> 
>     /*
>      * If we don't have snapshot or we are just fast-forwarding, there is no
>      * point in decoding changes.
>      */
>     if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
>         ctx->fast_forward)
>         return;
> 
>         case XLOG_HEAP2_MULTI_INSERT:
>             if (!ctx->fast_forward &&
>                 SnapBuildProcessChange(builder, xid, buf->origptr))
>                 DecodeMultiInsert(ctx, buf);
>             break;
> 
> is not very suggestive of that (note the second check).
> 

You mean that it's redundant, yeah.., although given your next point, 
see bellow.

> 
> And related to the point of the theorizing above, I don't think skipping
> XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
> record does not imply an invalidation message as discussed above, we'll
> afaict compute wrong snapshots when such transactions are encountered
> during forwarding. And we'll then log those snapshots to disk. Which
> then means the slot cannot safely be used for actual decoding anymore -
> as we'll use that snapshot when starting up decoding without fast
> forwarding.
> 

Hmm, I guess that's true. I think I have convinced myself that CID does 
not matter outside of this transaction, but since we might actually 
share the computed snapshot via file save/restore with other slots, any 
non-fast-forwarding decoding that reads the same transaction could miss 
the CID thanks to the shared snapshot which does not include it.

Given that we don't process any other records in this function besides 
XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest 
fix is to just remove the first check for fast forward and keep the one 
in XLOG_HEAP2_MULTI_INSERT.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: Duplicated LSN in ReorderBuffer

From
Simon Riggs
Date:
On Mon, 29 Jul 2019 at 00:09, Petr Jelinek <petr@2ndquadrant.com> wrote:
 
Given that we don't process any other records in this function besides
XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest
fix is to just remove the first check for fast forward and keep the one
in XLOG_HEAP2_MULTI_INSERT.

Fix proposed by Petr, with comments as explained by Andres.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Solutions for the Enterprise
Attachment

Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Jul-26, Andres Freund wrote:

> 2) We could simply assign the subtransaction to the parent using
>    ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
>    caller. That ought to also fix the bug
> 
>    I also has the advantage that we can save some memory in transactions
>    that have some, but fewer than the ASSIGNMENT limit subtransactions,
>    because it allows us to avoid having a separate base snapshot for
>    them (c.f. ReorderBufferTransferSnapToParent()).

I'm not sure I understood this suggestion correctly.  I first tried with
this, which seems the simplest rendition:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
     CommandId    cid;
 
+    if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
+        (xlrec->top_xid != xid))
+    {
+        ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+    }
+
     /*
      * we only log new_cid's if a catalog tuple was modified, so mark the
      * transaction as containing catalog modifications

test_decoding's tests pass with that, but if I try the example script
provided by Ildar, all pgbench clients die with this:

client 19 script 1 aborted in command 1 query 0: ERROR:  subtransaction logged without previous top-level txn record

I thought I would create the main txn before calling AssignChild in
snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
So that seems out.  My next try was to remove the elog() that was
causing the failure ... but that leads pretty quickly to a crash with
this backtrace:

#2  0x00005653241fb823 in ExceptionalCondition (conditionName=conditionName@entry=0x5653243c1960 "!(prev_first_lsn <
cur_txn->first_lsn)",
 
    errorType=errorType@entry=0x565324250596 "FailedAssertion", 
    fileName=fileName@entry=0x5653243c18e8 "/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c", 
    lineNumber=lineNumber@entry=680) at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3  0x0000565324062a84 in AssertTXNLsnOrder (rb=rb@entry=0x565326304fa8)
    at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:680
#4  0x0000565324062e39 in ReorderBufferTXNByXid (rb=rb@entry=0x565326304fa8, xid=<optimized out>, xid@entry=185613,
create=create@entry=true,
 
    is_new=is_new@entry=0x0, lsn=lsn@entry=2645271944, create_as_top=create_as_top@entry=true)
    at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:559
#5  0x0000565324067365 in ReorderBufferAddNewTupleCids (rb=0x565326304fa8, xid=185613, lsn=lsn@entry=2645271944,
node=...,tid=..., cmin=0, 
 
    cmax=4294967295, combocid=4294967295) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:2100
#6  0x0000565324069451 in SnapBuildProcessNewCid (builder=0x56532630afd8, xid=185614, lsn=2645271944,
xlrec=0x5653262efc78)
    at /pgsql/source/master/src/backend/replication/logical/snapbuild.c:787

Now this failure goes away if I relax the < to <= in the
complained-about line ... but at this point it's two sanity checks that
I've lobotomized in order to get this to run at all.  Not really
comfortable with that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Duplicated LSN in ReorderBuffer

From
Andres Freund
Date:
Hi,

On 2019-08-07 16:19:13 -0400, Alvaro Herrera wrote:
> On 2019-Jul-26, Andres Freund wrote:
> 
> > 2) We could simply assign the subtransaction to the parent using
> >    ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
> >    caller. That ought to also fix the bug
> > 
> >    I also has the advantage that we can save some memory in transactions
> >    that have some, but fewer than the ASSIGNMENT limit subtransactions,
> >    because it allows us to avoid having a separate base snapshot for
> >    them (c.f. ReorderBufferTransferSnapToParent()).
> 
> I'm not sure I understood this suggestion correctly.  I first tried with
> this, which seems the simplest rendition:
> 
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
>  {
>      CommandId    cid;
>  
> +    if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
> +        (xlrec->top_xid != xid))
> +    {
> +        ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
> +    }
> +

I think we would need to do this for all values of
SnapBuildCurrentState() - after all the problem occurs because we
*previously* didn't assign subxids to the toplevel xid.  Compared to the
cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
I don't think there's any problem just calling it unconditionally (when
top_xid <> xid, of course).

If the above is the only change, I think the body of the if should be
unreachable, DecodeHeap2Op guards against that:

    /*
     * If we don't have snapshot or we are just fast-forwarding, there is no
     * point in decoding changes.
     */
    if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
        ctx->fast_forward)
        return;


> I thought I would create the main txn before calling AssignChild in
> snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
> So that seems out.

There shouldn't be any need for doing that, ReorderBufferAssignChild
does that.

Greetings,

Andres Freund



Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Aug-07, Andres Freund wrote:

> I think we would need to do this for all values of
> SnapBuildCurrentState() - after all the problem occurs because we
> *previously* didn't assign subxids to the toplevel xid.  Compared to the
> cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
> I don't think there's any problem just calling it unconditionally (when
> top_xid <> xid, of course).

BTW I wrote the code as suggested and it passes all the tests ... but I
then noticed that the unpatched code doesn't fail Ildar's original
pgbench-based test for me, either.  So maybe my laptop is not powerful
enough to reproduce it, or maybe I'm doing something wrong.

I'm tempted to just push it, since it seems "obviously" more correct
than the original.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Duplicated LSN in ReorderBuffer

From
Masahiko Sawada
Date:
On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Aug-07, Andres Freund wrote:
>
> > I think we would need to do this for all values of
> > SnapBuildCurrentState() - after all the problem occurs because we
> > *previously* didn't assign subxids to the toplevel xid.  Compared to the
> > cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
> > I don't think there's any problem just calling it unconditionally (when
> > top_xid <> xid, of course).
>
> BTW I wrote the code as suggested and it passes all the tests ... but I
> then noticed that the unpatched code doesn't fail Ildar's original
> pgbench-based test for me, either.  So maybe my laptop is not powerful
> enough to reproduce it, or maybe I'm doing something wrong.

If you share the patch fixing this issue I'll test it on my
environment where I could reproduce the original problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Aug-19, Masahiko Sawada wrote:

> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> > BTW I wrote the code as suggested and it passes all the tests ... but I
> > then noticed that the unpatched code doesn't fail Ildar's original
> > pgbench-based test for me, either.  So maybe my laptop is not powerful
> > enough to reproduce it, or maybe I'm doing something wrong.
> 
> If you share the patch fixing this issue I'll test it on my
> environment where I could reproduce the original problem.

Never mind.  I was able to reproduce it later, and verify that Andres'
proposed strategy doesn't seem to fix the problem.  I'm going to study
the problem again today.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Duplicated LSN in ReorderBuffer

From
Andres Freund
Date:
Hi,

On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>On 2019-Aug-19, Masahiko Sawada wrote:
>
>> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera
><alvherre@2ndquadrant.com> wrote:
>
>> > BTW I wrote the code as suggested and it passes all the tests ...
>but I
>> > then noticed that the unpatched code doesn't fail Ildar's original
>> > pgbench-based test for me, either.  So maybe my laptop is not
>powerful
>> > enough to reproduce it, or maybe I'm doing something wrong.
>>
>> If you share the patch fixing this issue I'll test it on my
>> environment where I could reproduce the original problem.
>
>Never mind.  I was able to reproduce it later, and verify that Andres'
>proposed strategy doesn't seem to fix the problem.  I'm going to study
>the problem again today.

Could you post the patch?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Duplicated LSN in ReorderBuffer

From
Andres Freund
Date:
Hi,

On 2019-08-19 08:51:43 -0700, Andres Freund wrote:
> On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >On 2019-Aug-19, Masahiko Sawada wrote:
> >
> >> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera
> ><alvherre@2ndquadrant.com> wrote:
> >
> >> > BTW I wrote the code as suggested and it passes all the tests ...
> >but I
> >> > then noticed that the unpatched code doesn't fail Ildar's original
> >> > pgbench-based test for me, either.  So maybe my laptop is not
> >powerful
> >> > enough to reproduce it, or maybe I'm doing something wrong.
> >> 
> >> If you share the patch fixing this issue I'll test it on my
> >> environment where I could reproduce the original problem.
> >
> >Never mind.  I was able to reproduce it later, and verify that Andres'
> >proposed strategy doesn't seem to fix the problem.  I'm going to study
> >the problem again today.
> 
> Could you post the patch?

Ping?

Greetings,

Andres Freund



Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera from 2ndQuadrant
Date:
Hello,

On 2019-Sep-06, Andres Freund wrote:

> On 2019-08-19 08:51:43 -0700, Andres Freund wrote:
> > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> > >Never mind.  I was able to reproduce it later, and verify that Andres'
> > >proposed strategy doesn't seem to fix the problem.  I'm going to study
> > >the problem again today.
> > 
> > Could you post the patch?

Here's a couple of patches.

always_decode_assignment.patch is Masahiko Sawada's patch, which has
been confirmed to fix the assertion failure.

assign-child.patch is what I understood you were proposing -- namely to
assign the subxid to the top-level xid on NEW_CID.  In order for it to
work at all, I had to remove a different safety check; but the assertion
still hits when running Ildar's test case.  So the patch doesn't
actually fix anything.  And I think it makes sense that it fails, since
the first thing that's happening in this patch is that we create both
the top-level xact and the subxact with the same LSN value, which is
what triggers the assertion in the first place.  It's possible that I
misunderstood what you were suggesting.

If you want to propose a different fix, be my guest, but failing that
I'm inclined to push always_decode_assignment.patch sometime before the
end of the week.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:

> Here's a couple of patches.
> 
> always_decode_assignment.patch is Masahiko Sawada's patch, which has
> been confirmed to fix the assertion failure.

I pushed this one to all branches.  Thanks Ildar for reporting and
Sawada-san for fixing, and reviewers.

If you (Andres) want to propose a different fix for this, be my guest.
We can always revert this one if you have a different better fix.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Duplicated LSN in ReorderBuffer

From
Andres Freund
Date:
Hi,

On 2019-09-13 16:42:47 -0300, Alvaro Herrera wrote:
> On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:
> 
> > Here's a couple of patches.
> > 
> > always_decode_assignment.patch is Masahiko Sawada's patch, which has
> > been confirmed to fix the assertion failure.
> 
> I pushed this one to all branches.  Thanks Ildar for reporting and
> Sawada-san for fixing, and reviewers.
> 
> If you (Andres) want to propose a different fix for this, be my guest.
> We can always revert this one if you have a different better fix.

I'm bit surprised to see this go in just now, after I asked for the
changes you were reporting as not working for three weeks, and you sent
them out three days ago (during which I was at the linux plumbers
conference)...

Greetings,

Andres Freund



Re: Duplicated LSN in ReorderBuffer

From
Alvaro Herrera
Date:
On 2019-Jul-26, Andres Freund wrote:

> Petr, Simon, see the potential issue related to fast forward at the
> bottom.

I think we neglected this bit.  I looked at the patch Simon submitted
downthread, and while I vaguely understand that we need to process
NEW_CID records during fast-forwarding, I don't quite understand why we
still can skip XLOG_INVALIDATION messages.  I *think* we should process
those too.  Here's a patch that also contains that change; I also
reworded Simon's proposed comment.  I appreciate reviews.

Thoughts?  Relevant extracts from Andres' message below.

> Thinking about it, it was not immediately clear to me that it is
> necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the
> cid mapping when decoding content of the transaction that the
> XLOG_HEAP2_NEW_CID record was about - which will not happen if it
> started before SNAPBUILD_FULL.
> 
> Except that they also are responsible for signalling that a transaction
> performed catalog modifications (cf ReorderBufferXidSetCatalogChanges()
> call), which in turn is important for SnapBuildCommitTxn() to know
> whether to include that transaction needs to be included in historic
> snapshots.
> 
> So unless I am missing something - which is entirely possible, I've had
> this code thoroughly swapped out - that means that we only need to
> process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions
> with relevant catalog changes, that don't have any invalidations
> messages.
> 
> After thinking about it for a bit, that's not guaranteed however. For
> one, even for system catalog tables, looking at
> CacheInvalidateHeapTuple() et al there can be catalog modifications that
> create neither a snapshot invalidation message, nor a catcache
> one. There's also the remote scenario that we possibly might be only
> modifying a toast relation.
> 
> But more importantly, the only modified table could be a user defined
> catalog table (i.e. WITH (user_catalog_table = true)). Which in all
> likelihood won't cause invalidation messages. So I think currently it is
> required to process NEW_ID records - although we don't need to actually
> execute the ReorderBufferAddNewTupleCids() etc calls.

[...]

> And related to the point of the theorizing above, I don't think skipping
> XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
> record does not imply an invalidation message as discussed above, we'll
> afaict compute wrong snapshots when such transactions are encountered
> during forwarding. And we'll then log those snapshots to disk. Which
> then means the slot cannot safely be used for actual decoding anymore -
> as we'll use that snapshot when starting up decoding without fast
> forwarding.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Duplicated LSN in ReorderBuffer

From
Amit Kapila
Date:
On Fri, Jan 31, 2020 at 12:35 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>
> On 2019-Jul-26, Andres Freund wrote:
>
> > Petr, Simon, see the potential issue related to fast forward at the
> > bottom.
>
> I think we neglected this bit.  I looked at the patch Simon submitted
> downthread, and while I vaguely understand that we need to process
> NEW_CID records during fast-forwarding,
>

Right, IIUC, that is mainly to mark txn has catalog changes
(ReorderBufferXidSetCatalogChanges) so that later such a transaction
can be used to build historic snapshots (during SnapBuildCommitTxn).
Now, if that is true, then won't we need a similar change in
DecodeHeapOp for XLOG_HEAP_INPLACE case as well?  Also, I am not sure
if  SnapBuildProcessNewCid needs to create Cid map in such a case.


> I don't quite understand why we
> still can skip XLOG_INVALIDATION messages.  I *think* we should process
> those too.
>

I also think so.  If you are allowing to execute invalidation
irrespective of fast_forward in DecodeStandbyOp, then why not do the
same in DecodeCommit where we add
ReorderBufferAddInvalidations?

>   Here's a patch that also contains that change; I also
> reworded Simon's proposed comment.  I appreciate reviews.
>

Even though, in theory, the changes look to be in the right direction,
but it is better if we can create a test case to reproduce the
problem.  I am not sure, but I think we need to generate a few DDLs
for the transaction for which we want to fast forward and then after
moving forward the DMLs dependent on those WAL should create some
problem as we have skipped executing invalidations and addition of
such a transaction in the historic snapshot.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com