Home > mailing lists

Re: Potential data loss due to race condition during logical replication slot creation - Mailing list pgsql-bugs

From	Masahiko Sawada
Subject	Re: Potential data loss due to race condition during logical replication slot creation
Date	June 21, 2024 06:46:18
Msg-id	CAD21AoCf+PiedCR5ODquL=tnp_2+nj_jemydnyu+KwPQ8beovA@mail.gmail.com Whole thread Raw
In response to	Re: Potential data loss due to race condition during logical replication slot creation (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Potential data loss due to race condition during logical replication slot creation
List	pgsql-bugs

Tree view

On Thu, Jun 20, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 16, 2024 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Pondering further, I came across the question; in what case we would
> > need to restore the snapshot and jump to the consistent state when
> > finding the initial start point?
> >
> > When creating a slot, in ReplicationSlotReserveWal(), we determine the
> > restart_lsn and write a RUNNING_XACT record. This record would be the
> > first RUNNING_XACT record that the logical decoding decodes in most
> > cases.
> >
>
> This probably won't be true on standby as we don't LOG RUNNING_XACT
> record in that case.

Yes, on the standby we wait for the RUNNING_XACT record before
starting actual decoding.

>
> > In SnapBuildFindSnapshot(), if the record satisfies (a) (i.e.
> > running->oldestRunningXid == running->nextXid), it can jump to the
> > consistent state. If not, it means there were running transactions at
> > that time of the RUNNING_XACT record being produced. Therefore, we
> > must not restore the snapshot and jump to the consistent state.
> > Because otherwise, we end up deciding the start point in the middle of
> > a transaction that started before the RUNNING_XACT record. Probably
> > the same is true for all subsequent snapbuild states.
> >
>
> One thing to consider on this is that we can only restore the snapshot
> if by that time some other get changes (in your case, step
> "s2_get_changes_slot0" as in your draft patch) would have serialized
> the consistent snapshot at that LSN. One could question if we can't
> reach a consistent state at a particular LSN then why in the first
> place the snapshot has been serialized at that LSN? The answer could
> be that it is okay to use such a serialized snapshot after initial
> snapshot creation because we know that the restart_lsn of a slot in
> such cases would be a location where we won't see the data for partial
> transactions.

True. After the CreateInitDecodingContext() the restart_lsn is just an
arbitrary LSN in a sense; either the latest replay LSN or the current
insertion LSN. We need to find an LSN for the start point that
corresponds to that restart_lsn. So I think we need to find the start
point while disabling snapshot restores.

>
> > I might be missing something but I could not find the case where we
> > can or want to restore the serialized snapshot when finding the
> > initial start point. If my analysis is correct, we can go with either
> > (a) or (c) I proposed before[1]. Between these two options, it also
> > could be an option that (a) is for backbranches for safety and (c) is
> > for master.
> >
>
> The approach (a) has a downside, it will lead to tracking more
> transactions (non-catalog) than required without any benefit for the
> user. Considering that is true, I wouldn't prefer that approach.

Yes, it will lead to tracking non-catalog-change transactions as well.
If there are many subtransactions, the overhead could be noticeable.
But it happens only once when creating a slot.

Another variant of (a) is that we skip snapshot restores if the
initial_xmin_hirizon is a valid transaction id. The
initia_xmin_horizon is always set to a valida transaction id when
initializing the decoding context, e.g. during
CreateInitDecodingContext(). That way, we don't need to track
non-catalog-change transctions. A downside is that this approach
assumes that DecodingContextFindStartpoint() is called with the
decoding context created by CreateInitDecodingContxt(), which is true
in the core codes, but might not be true in third party extensions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-bugs by date:

From: Michael Paquier
Date: 21 June 2024, 01:54:09
Subject: Re: BUG #18499: Reindexing spgist index concurrently triggers Assert("TransactionIdIsValid(state->myXid)")

From: Alvaro Herrera
Date: 21 June 2024, 12:39:11
Subject: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943

Re: Potential data loss due to race condition during logical replication slot creation - Mailing list pgsql-bugs

Previous

Next