Re: Duplicated LSN in ReorderBuffer - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: Duplicated LSN in ReorderBuffer
Date
Msg-id 6ba31f29-5f93-5abb-55fa-350667b452b7@2ndquadrant.com
Whole thread Raw
In response to Re: Duplicated LSN in ReorderBuffer  (Andres Freund <andres@anarazel.de>)
Responses Re: Duplicated LSN in ReorderBuffer  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 27. 07. 19 3:15, Andres Freund wrote:
> Hi,
> 
> Petr, Simon, see the potential issue related to fast forward at the
> bottom.
> 
> [..snip..]
> 
> This actually made me look at the nearby changes due to
> 
> commit 9c7d06d60680c7f00d931233873dee81fdb311c6
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date:   2018-01-17 11:38:34 +0000
> 
>      Ability to advance replication slots
> 
> and uhm, I'm not sure they're fully baked. Something like:
> 
>     /*
>      * If we don't have snapshot or we are just fast-forwarding, there is no
>      * point in decoding changes.
>      */
>     if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
>         ctx->fast_forward)
>         return;
> 
>         case XLOG_HEAP2_MULTI_INSERT:
>             if (!ctx->fast_forward &&
>                 SnapBuildProcessChange(builder, xid, buf->origptr))
>                 DecodeMultiInsert(ctx, buf);
>             break;
> 
> is not very suggestive of that (note the second check).
> 

You mean that it's redundant, yeah.., although given your next point, 
see bellow.

> 
> And related to the point of the theorizing above, I don't think skipping
> XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
> record does not imply an invalidation message as discussed above, we'll
> afaict compute wrong snapshots when such transactions are encountered
> during forwarding. And we'll then log those snapshots to disk. Which
> then means the slot cannot safely be used for actual decoding anymore -
> as we'll use that snapshot when starting up decoding without fast
> forwarding.
> 

Hmm, I guess that's true. I think I have convinced myself that CID does 
not matter outside of this transaction, but since we might actually 
share the computed snapshot via file save/restore with other slots, any 
non-fast-forwarding decoding that reads the same transaction could miss 
the CID thanks to the shared snapshot which does not include it.

Given that we don't process any other records in this function besides 
XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest 
fix is to just remove the first check for fast forward and keep the one 
in XLOG_HEAP2_MULTI_INSERT.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Commitfest 2019-07, the first of five* for PostgreSQL 13
Next
From: Thomas Munro
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs