Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Timeline following for logical slots
Date
Msg-id CAMsr+YHUP2VaFek6NEXGCGVMsvf_m7RtkR0Wnpz_5daUxn4cvQ@mail.gmail.com
Whole thread Raw
In response to Re: Timeline following for logical slots  (Andres Freund <andres@anarazel.de>)
Responses Re: Timeline following for logical slots  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 1 April 2016 at 14:46, Andres Freund <andres@anarazel.de> wrote:
 

> You're thinking from the perspective of someone who knows this code
> intimately.

Maybe. But that's not addressed by adding half-true comments to places
they only belong to peripherally. And not to all the relevant places,
since you've not added the same comment to StartLogicalReplication().

I want to go over the rest of your reply in more detail, but regarding this and the original comment, I'm going by:

    if (start_lsn == InvalidXLogRecPtr)
    {
        /* continue from last position */
        start_lsn = slot->data.confirmed_flush;
    }
    ....
    ctx = StartupDecodingContext(output_plugin_options,
                                 start_lsn, InvalidTransactionId,
                                 read_page, prepare_write, do_write);

...  in CreateDecodingContext(), which in turn does ...

    ctx->snapshot_builder =
        AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);

... in StartupDecodingContext(...).

The snapshot builder controls when the snapshot builder returns SNAPBUILD_CONSISTENT, therefore when we start returning decoded commits to the client. Right?

We pass InvalidXLogRecPtr as the start_lsn in pg_logical_slot_get_changes_guts(...). 

So, when I wrote that:

        /*
         * We start reading xlog from the restart lsn, even though in
         * CreateDecodingContext we set the snapshot builder up using the
         * slot's confirmed_flush. This means we might read xlog we don't
         * actually decode rows from, but the snapshot builder might need it
         * to get to a consistent point. The point we start returning data to
         * *users* at is the confirmed_flush lsn set up in the decoding
         * context.
         */

I don't see what's wrong there at all.

We do "start reading xlog from the slot's restart_lsn". That's what gets passed to set up the xlogreader. That's where we read the first xlog records from.

In CreateDecodingContext we _do_ "set the snapshot builder up using the slot's confirmed_flush" [unless overridden by explicit argument to CreateDecodingContext, which isn't given here]. 

This is presumably what you're taking issue with:

         * This means we might read xlog we don't
         * actually decode rows from, but the snapshot builder might need it
         * to get to a consistent point.

and would be better worded as something like:

This means we will read and decode xlog from before the point we actually start returning decoded commits to the client, but the snapshot builder may need this additional xlog to get to a consistent point.

right?

I think

            The point we start returning data to
         * *users* at is the confirmed_flush lsn set up in the decoding
         * context.

is still right.

What I was proposing instead is, in pg_logical_slot_get_changes_guts, changing:

        ctx = CreateDecodingContext(InvalidXLogRecPtr,
                                    options,
                                    logical_read_local_xlog_page,
                                    LogicalOutputPrepareWrite,
                                    LogicalOutputWrite);

so that instead of InvalidXLogRecPtr we explicitly pass

   MyReplicationSlot->data.confirmed_flush;

thus making it way easier to see what's going on without having to chase way deeper to figure out why data isn't returned from the restart_lsn. Where, y'know, you'd expect decoding to restart from... and where it does restart from, just not where it resumes sending output to the client from.

> Speaking of which, did you see the proposed README I sent for
> src/backend/replication/logical ?

I skimmed it. But given we have a CF full of patches, some submitted
over a year ago, it seems unfair to spend time on a patch submitted a
few days ago

Of course. I wasn't expecting to wedge this into the release, nor do I see any value in doing so. I just thought you'd probably rather know about it than not and if it was obviously wrong it'd be good to know. I've found it very hard to get my head around how logical decoding's parts fit together, and now that there are some other folks in 2ndQ getting involved  it seemed like a good idea to start working on making that easier for the next guy.

Thanks for linking to your old docs. While a bit outdated as you noted, they'll be simple to update and would be really valuable to have in a README where they're discoverable.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Rajkumar Raghuwanshi
Date:
Subject: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
Next
From: Andres Freund
Date:
Subject: Re: pgbench more operators & functions