Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Timeline following for logical slots
Date
Msg-id 20160404063917.GD2431@awork2.anarazel.de
Whole thread Raw
In response to Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 2016-04-04 14:18:53 +0800, Craig Ringer wrote:
> I want to go over the rest of your reply in more detail, but regarding this
> and the original comment, I'm going by:
> 
>     if (start_lsn == InvalidXLogRecPtr)
>     {
>         /* continue from last position */
>         start_lsn = slot->data.confirmed_flush;
>     }
>     ....
>     ctx = StartupDecodingContext(output_plugin_options,
>                                  start_lsn, InvalidTransactionId,
>                                  read_page, prepare_write, do_write);
> 
> ...  in CreateDecodingContext(), which in turn does ...
> 
>     ctx->snapshot_builder =
>         AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
> 
> ... in StartupDecodingContext(...).
> 
> The snapshot builder controls when the snapshot builder returns
> SNAPBUILD_CONSISTENT, therefore when we start returning decoded commits to
> the client. Right?

> We pass InvalidXLogRecPtr as the start_lsn
> in pg_logical_slot_get_changes_guts(...).
> 
> So, when I wrote that:
> 
>         /*
>          * We start reading xlog from the restart lsn, even though in
>          * CreateDecodingContext we set the snapshot builder up using the
>          * slot's confirmed_flush. This means we might read xlog we don't
>          * actually decode rows from, but the snapshot builder might need it
>          * to get to a consistent point. The point we start returning data
> to
>          * *users* at is the confirmed_flush lsn set up in the decoding
>          * context.
>          */
> 
> I don't see what's wrong there at all.

They're independent. What the snapshot builder gets as the start
position isn't the same as where we start reading WAL.


> We do "start reading xlog from the slot's restart_lsn". That's what gets
> passed to set up the xlogreader. That's where we read the first xlog
> records from.
> 
> In CreateDecodingContext we _do_ "set the snapshot builder up using the
> slot's confirmed_flush" [unless overridden by explicit argument to
> CreateDecodingContext, which isn't given here].

Yes. But again, that hasn't really got anything to do with where we read
WAL from. We always have to read older WAL than were confirmed_flush is
set to; to assemble all the changes from transactions that are
in-progress at the point of confirmed_flush.


> This is presumably what you're taking issue with:
> 
>          * This means we might read xlog we don't
>          * actually decode rows from, but the snapshot builder might need it
>          * to get to a consistent point.
> 
> and would be better worded as something like:
> 
> This means we will read and decode xlog from before the point we actually
> start returning decoded commits to the client, but the snapshot builder may
> need this additional xlog to get to a consistent point.

> right?

No. The snapshot builder really hasn't gotten that much to do with
things. The fundamental thing we have to do is re-assemble all
in-progress transactions (as in reorderbuffer.c, not snapbuild.c).


> I think
> 
>             The point we start returning data to
>          * *users* at is the confirmed_flush lsn set up in the decoding
>          * context.
> 
> is still right.
> 
> What I was proposing instead is, in pg_logical_slot_get_changes_guts,
> changing:
> 
>         ctx = CreateDecodingContext(InvalidXLogRecPtr,
>                                     options,
>                                     logical_read_local_xlog_page,
>                                     LogicalOutputPrepareWrite,
>                                     LogicalOutputWrite);
> 
> so that instead of InvalidXLogRecPtr we explicitly pass
> 
>    MyReplicationSlot->data.confirmed_flush;
> 
> thus making it way easier to see what's going on without having to chase
> way deeper to figure out why data isn't returned from the restart_lsn.
> Where, y'know, you'd expect decoding to restart from... and where it does
> restart from, just not where it resumes sending output to the client from.

I just don't buy this. It's absolutely fundamental to understand that
the commit LSN isn't the point where all the data of transactions is
stored.


Andres



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Timeline following for logical slots
Next
From: Andres Freund
Date:
Subject: Re: Timeline following for logical slots