On Tue, Dec 1, 2020 at 12:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> So what caused it to skip due to start_decoding_at? Because the commit
> where the snapshot became consistent is after Prepare. Does it happen
> due to the below code in SnapBuildFindSnapshot() where we bump
> start_decoding_at.
>
> {
> ...
> if (running->oldestRunningXid == running->nextXid)
> {
> if (builder->start_decoding_at == InvalidXLogRecPtr ||
> builder->start_decoding_at <= lsn)
> /* can decode everything after this */
> builder->start_decoding_at = lsn + 1;
I think the reason is that in the function
DecodingContextFindStartpoint(), the code
loops till it finds the consistent snapshot. Then once consistent
snapshot is found, it sets
slot->data.confirmed_flush = ctx->reader->EndRecPtr; This will be used
as the start_decoding_at when the slot is
restarted for decoding.
> Sure, but you can see in your example above it got skipped due to
> start_decoding_at not due to DecodingContextReady. So, the problem as
> mentioned by me previously was how we distinguish those cases because
> it can skip due to start_decoding_at during restart as well when we
> would have already sent the prepare to the subscriber.
The distinguishing factor is that at restart, the Prepare does satisfy
DecodingContextReady (because the snapshot is consistent then).
In both cases, the prepare is prior to start_decoding_at, but when the
prepare is before a consistent point,
it does not satisfy DecodingContextReady. Which is why I suggested
using the check DecodingContextReady to mark the prepare as 'Not
decoded".
regards,
Ajin Cherian
Fujitsu Australia