Re: Duplicated LSN in ReorderBuffer - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Duplicated LSN in ReorderBuffer
Date
Msg-id 20190727011508.io6q6sum4tzvopad@alap3.anarazel.de
Whole thread Raw
In response to Re: Duplicated LSN in ReorderBuffer  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Duplicated LSN in ReorderBuffer
Re: Duplicated LSN in ReorderBuffer
Re: Duplicated LSN in ReorderBuffer
List pgsql-hackers
Hi,

Petr, Simon, see the potential issue related to fast forward at the
bottom.


On 2019-07-26 18:46:35 -0400, Alvaro Herrera wrote:
> On 2019-Jul-09, Masahiko Sawada wrote:
>
> > I think the cause of this bug would be that a ReorderBufferTXN entry
> > of sub transaction is created as top-level transaction. And this
> > happens because we skip to decode ASSIGNMENT during the state of
> > snapshot builder < SNAPBUILD_FULL.
>
> That explanation seems to make sense.

Yea. The comment that "in the assignment case we'll not decode those
xacts" is true, but it misses that we *do* currently process
XLOG_HEAP2_NEW_CID records for transactions that started before reaching
FULL_SNAPSHOT.

Thinking about it, it was not immediately clear to me that it is
necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the
cid mapping when decoding content of the transaction that the
XLOG_HEAP2_NEW_CID record was about - which will not happen if it
started before SNAPBUILD_FULL.

Except that they also are responsible for signalling that a transaction
performed catalog modifications (cf ReorderBufferXidSetCatalogChanges()
call), which in turn is important for SnapBuildCommitTxn() to know
whether to include that transaction needs to be included in historic
snapshots.

So unless I am missing something - which is entirely possible, I've had
this code thoroughly swapped out - that means that we only need to
process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions
with relevant catalog changes, that don't have any invalidations
messages.

After thinking about it for a bit, that's not guaranteed however. For
one, even for system catalog tables, looking at
CacheInvalidateHeapTuple() et al there can be catalog modifications that
create neither a snapshot invalidation message, nor a catcache
one. There's also the remote scenario that we possibly might be only
modifying a toast relation.

But more importantly, the only modified table could be a user defined
catalog table (i.e. WITH (user_catalog_table = true)). Which in all
likelihood won't cause invalidation messages. So I think currently it is
required to process NEW_ID records - although we don't need to actually
execute the ReorderBufferAddNewTupleCids() etc calls.

Perhaps the right fix for the future would actually be to not rely on on
NEW_ID for recognizing transactions as such, but instead have an xact.c
marker that signals whether a transaction performed catalog
modifications.

Hm, need to think more about this.


> > Instead, I wonder if we can decode ASSIGNMENT even when the state of
> > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
> > ReorderBufferTXN entries of both top transaction and sub transaction
> > are created properly before we decode NEW_CID.
>
> Yeah, that seems a sensible remediation to me.

That does seems like a reasonable approach. I can see two alternatives:

1) Change SnapBuildProcessNewCid()'s ReorderBufferXidSetCatalogChanges()
   call to reference the toplevel xid. That has the disadvantage that if
   the subtransaction that performed DDL rolls back, the main
   transaction will still be treated as a catalog transaction - i have a
   hard time seeing that being common, however.

   That'd then also require SnapBuildProcessNewCid() in
   SNAPBUILD_FULL_SNAPSHOT to return before processing any data assigned
   to subtransactions. Which would be good, because that's currently
   unnecessarily stored in memory.

2) We could simply assign the subtransaction to the parent using
   ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
   caller. That ought to also fix the bug

   I also has the advantage that we can save some memory in transactions
   that have some, but fewer than the ASSIGNMENT limit subtransactions,
   because it allows us to avoid having a separate base snapshot for
   them (c.f. ReorderBufferTransferSnapToParent()).

   Like 1) that could be combined with adding an early return when <
   SNAPBUILD_FULL_SNAPSHOT, after ReorderBufferXidSetCatalogChanges(),
   but I don't think it'd be required for correctness in contrast to 1).

Both of these would have the advantage that we only would track
additional information for transactions that have modified the catalog,
whereas the proposal to process ASSIGNMENT earlier, would mean that we
additionally track all transactions with more than 64 children.  So
provided that I didn't mis-analyze here, I think both of my alternatives
are preferrable? I think 2) is simpler?


>      /*
> -     * No point in doing anything yet, data could not be decoded anyway. It's
> -     * ok not to call ReorderBufferProcessXid() in that case, except in the
> -     * assignment case there'll not be any later records with the same xid;
> -     * and in the assignment case we'll not decode those xacts.
> +     * If the snapshot isn't yet fully built, we cannot decode anything, so
> +     * bail out.
> +     *
> +     * However, it's critical to process XLOG_XACT_ASSIGNMENT records even
> +     * when the snapshot is being built: it is possible to get later records
> +     * that require subxids to be properly assigned.
>       */

I think I would want this comment to be slightly more expansive. It's
not exactly obvious why such records would exist, at least to me. I
can't quite come up with something much shorter than the above braindump
right now however. I'll try to come up with something more concise.
Probably worthwhile to add somewhere, even if we go for one of my
alternative proposals.


This actually made me look at the nearby changes due to

commit 9c7d06d60680c7f00d931233873dee81fdb311c6
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   2018-01-17 11:38:34 +0000

    Ability to advance replication slots

and uhm, I'm not sure they're fully baked. Something like:

    /*
     * If we don't have snapshot or we are just fast-forwarding, there is no
     * point in decoding changes.
     */
    if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
        ctx->fast_forward)
        return;

        case XLOG_HEAP2_MULTI_INSERT:
            if (!ctx->fast_forward &&
                SnapBuildProcessChange(builder, xid, buf->origptr))
                DecodeMultiInsert(ctx, buf);
            break;

is not very suggestive of that (note the second check).


And related to the point of the theorizing above, I don't think skipping
XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
record does not imply an invalidation message as discussed above, we'll
afaict compute wrong snapshots when such transactions are encountered
during forwarding. And we'll then log those snapshots to disk. Which
then means the slot cannot safely be used for actual decoding anymore -
as we'll use that snapshot when starting up decoding without fast
forwarding.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Chengchao Yu
Date:
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs
Next
From: Andres Freund
Date:
Subject: warning on reload for PGC_POSTMASTER, guc.c duplication, ...