Re: Timeline following for logical slots - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Timeline following for logical slots |
Date | |
Msg-id | 20160404063917.GD2431@awork2.anarazel.de Whole thread Raw |
In response to | Re: Timeline following for logical slots (Craig Ringer <craig@2ndquadrant.com>) |
List | pgsql-hackers |
On 2016-04-04 14:18:53 +0800, Craig Ringer wrote: > I want to go over the rest of your reply in more detail, but regarding this > and the original comment, I'm going by: > > if (start_lsn == InvalidXLogRecPtr) > { > /* continue from last position */ > start_lsn = slot->data.confirmed_flush; > } > .... > ctx = StartupDecodingContext(output_plugin_options, > start_lsn, InvalidTransactionId, > read_page, prepare_write, do_write); > > ... in CreateDecodingContext(), which in turn does ... > > ctx->snapshot_builder = > AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn); > > ... in StartupDecodingContext(...). > > The snapshot builder controls when the snapshot builder returns > SNAPBUILD_CONSISTENT, therefore when we start returning decoded commits to > the client. Right? > We pass InvalidXLogRecPtr as the start_lsn > in pg_logical_slot_get_changes_guts(...). > > So, when I wrote that: > > /* > * We start reading xlog from the restart lsn, even though in > * CreateDecodingContext we set the snapshot builder up using the > * slot's confirmed_flush. This means we might read xlog we don't > * actually decode rows from, but the snapshot builder might need it > * to get to a consistent point. The point we start returning data > to > * *users* at is the confirmed_flush lsn set up in the decoding > * context. > */ > > I don't see what's wrong there at all. They're independent. What the snapshot builder gets as the start position isn't the same as where we start reading WAL. > We do "start reading xlog from the slot's restart_lsn". That's what gets > passed to set up the xlogreader. That's where we read the first xlog > records from. > > In CreateDecodingContext we _do_ "set the snapshot builder up using the > slot's confirmed_flush" [unless overridden by explicit argument to > CreateDecodingContext, which isn't given here]. Yes. But again, that hasn't really got anything to do with where we read WAL from. We always have to read older WAL than were confirmed_flush is set to; to assemble all the changes from transactions that are in-progress at the point of confirmed_flush. > This is presumably what you're taking issue with: > > * This means we might read xlog we don't > * actually decode rows from, but the snapshot builder might need it > * to get to a consistent point. > > and would be better worded as something like: > > This means we will read and decode xlog from before the point we actually > start returning decoded commits to the client, but the snapshot builder may > need this additional xlog to get to a consistent point. > right? No. The snapshot builder really hasn't gotten that much to do with things. The fundamental thing we have to do is re-assemble all in-progress transactions (as in reorderbuffer.c, not snapbuild.c). > I think > > The point we start returning data to > * *users* at is the confirmed_flush lsn set up in the decoding > * context. > > is still right. > > What I was proposing instead is, in pg_logical_slot_get_changes_guts, > changing: > > ctx = CreateDecodingContext(InvalidXLogRecPtr, > options, > logical_read_local_xlog_page, > LogicalOutputPrepareWrite, > LogicalOutputWrite); > > so that instead of InvalidXLogRecPtr we explicitly pass > > MyReplicationSlot->data.confirmed_flush; > > thus making it way easier to see what's going on without having to chase > way deeper to figure out why data isn't returned from the restart_lsn. > Where, y'know, you'd expect decoding to restart from... and where it does > restart from, just not where it resumes sending output to the client from. I just don't buy this. It's absolutely fundamental to understand that the commit LSN isn't the point where all the data of transactions is stored. Andres
pgsql-hackers by date: