Re: Potential data loss due to race condition during logical replication slot creation - Mailing list pgsql-bugs
From | Masahiko Sawada |
---|---|
Subject | Re: Potential data loss due to race condition during logical replication slot creation |
Date | |
Msg-id | CAD21AoCf+PiedCR5ODquL=tnp_2+nj_jemydnyu+KwPQ8beovA@mail.gmail.com Whole thread Raw |
In response to | Re: Potential data loss due to race condition during logical replication slot creation (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Potential data loss due to race condition during logical replication slot creation
|
List | pgsql-bugs |
On Thu, Jun 20, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 16, 2024 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Pondering further, I came across the question; in what case we would > > need to restore the snapshot and jump to the consistent state when > > finding the initial start point? > > > > When creating a slot, in ReplicationSlotReserveWal(), we determine the > > restart_lsn and write a RUNNING_XACT record. This record would be the > > first RUNNING_XACT record that the logical decoding decodes in most > > cases. > > > > This probably won't be true on standby as we don't LOG RUNNING_XACT > record in that case. Yes, on the standby we wait for the RUNNING_XACT record before starting actual decoding. > > > In SnapBuildFindSnapshot(), if the record satisfies (a) (i.e. > > running->oldestRunningXid == running->nextXid), it can jump to the > > consistent state. If not, it means there were running transactions at > > that time of the RUNNING_XACT record being produced. Therefore, we > > must not restore the snapshot and jump to the consistent state. > > Because otherwise, we end up deciding the start point in the middle of > > a transaction that started before the RUNNING_XACT record. Probably > > the same is true for all subsequent snapbuild states. > > > > One thing to consider on this is that we can only restore the snapshot > if by that time some other get changes (in your case, step > "s2_get_changes_slot0" as in your draft patch) would have serialized > the consistent snapshot at that LSN. One could question if we can't > reach a consistent state at a particular LSN then why in the first > place the snapshot has been serialized at that LSN? The answer could > be that it is okay to use such a serialized snapshot after initial > snapshot creation because we know that the restart_lsn of a slot in > such cases would be a location where we won't see the data for partial > transactions. True. After the CreateInitDecodingContext() the restart_lsn is just an arbitrary LSN in a sense; either the latest replay LSN or the current insertion LSN. We need to find an LSN for the start point that corresponds to that restart_lsn. So I think we need to find the start point while disabling snapshot restores. > > > I might be missing something but I could not find the case where we > > can or want to restore the serialized snapshot when finding the > > initial start point. If my analysis is correct, we can go with either > > (a) or (c) I proposed before[1]. Between these two options, it also > > could be an option that (a) is for backbranches for safety and (c) is > > for master. > > > > The approach (a) has a downside, it will lead to tracking more > transactions (non-catalog) than required without any benefit for the > user. Considering that is true, I wouldn't prefer that approach. Yes, it will lead to tracking non-catalog-change transactions as well. If there are many subtransactions, the overhead could be noticeable. But it happens only once when creating a slot. Another variant of (a) is that we skip snapshot restores if the initial_xmin_hirizon is a valid transaction id. The initia_xmin_horizon is always set to a valida transaction id when initializing the decoding context, e.g. during CreateInitDecodingContext(). That way, we don't need to track non-catalog-change transctions. A downside is that this approach assumes that DecodingContextFindStartpoint() is called with the decoding context created by CreateInitDecodingContxt(), which is true in the core codes, but might not be true in third party extensions. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-bugs by date:
Previous
From: Michael PaquierDate:
Subject: Re: BUG #18499: Reindexing spgist index concurrently triggers Assert("TransactionIdIsValid(state->myXid)")
Next
From: Alvaro HerreraDate:
Subject: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943