Re: Possible bug in logical replication. - Mailing list pgsql-hackers
From | Arseny Sher |
---|---|
Subject | Re: Possible bug in logical replication. |
Date | |
Msg-id | 87woutq467.fsf@ars-thinkpad Whole thread Raw |
In response to | Re: Possible bug in logical replication. (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Possible bug in logical replication.
|
List | pgsql-hackers |
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote: >> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote: >> It seems to me that we still want to have the slot forwarding finish in >> this case even if this is interrupted. Petr, isn't that the intention >> here? > > I have been chewing a bit more on the proposed patch, finishing with the > attached to close the loop. Thoughts? Sorry for being pedantic, but it seems to me worthwhile to mention *why* we need decoding machinery at all -- like I wrote: + * While we could just do LogicalConfirmReceivedLocation updating + * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn + * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples). Also, > * The slot's restart_lsn is used as start point for reading records, This is clearly seen from the code, I propose to remove this. > * while confirmed_lsn is used as base point for the decoding context. And as I wrote, this doesn't matter as changes are not produced. > * The LSN position to move to is checked by doing a per-record scan and > * logical decoding which makes sure that confirmed_lsn is updated to a > * LSN which allows the future slot consumer to get consistent logical > - * changes. > + * changes. As decoding is done with fast_forward mode, no changes are > + * actually generated. confirmed_lsn is always updated to `moveto` unless we run out of WAL earlier (and unless we try to move slot backwards, which is obviously forbidden) -- consistent changes are practically irrelevant here. Moreover, we can directly set confirmed_lsn and still have consistent changes further as restart_lsn and xmin of the slot are not touched. What we actually do here is trying to advance *restart_lsn and xmin* as far as we can but up to the point which ensures that decoding can assemble a consistent snapshot allowing to fully decode all COMMITs since updated `confirmed_flush_lsn`. All this happens in SnapBuildProcessRunningXacts. > It seems to me that we still want to have the slot forwarding finish in > this case even if this is interrupted. Petr, isn't that the intention > here? Probably, though I am not sure what is the point of this. Ok, I keep this check in the updated (with your comments) patch and CC'ing Petr. -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 61588d626f..76bafca41c 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -335,11 +335,14 @@ CreateInitDecodingContext(char *plugin, * The LSN at which to start decoding. If InvalidXLogRecPtr, restart * from the slot's confirmed_flush; otherwise, start from the specified * location (but move it forwards to confirmed_flush if it's older than - * that, see below). + * that, see below). Doesn't matter in fast_forward mode. * * output_plugin_options * contains options passed to the output plugin. * + * fast_forward + * bypasses the generation of logical changes. + * * read_page, prepare_write, do_write, update_progress * callbacks that have to be filled to perform the use-case dependent, * actual work. diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 2806e1076c..0a4985ef8c 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -341,12 +341,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto) /* * Helper function for advancing logical replication slot forward. - * The slot's restart_lsn is used as start point for reading records, - * while confirmed_lsn is used as base point for the decoding context. - * The LSN position to move to is checked by doing a per-record scan and - * logical decoding which makes sure that confirmed_lsn is updated to a - * LSN which allows the future slot consumer to get consistent logical - * changes. + * While we could just do LogicalConfirmReceivedLocation updating + * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn + * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples). + * We do it in special fast_forward mode without actual replay. */ static XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto) @@ -358,7 +356,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto) PG_TRY(); { - /* restart at slot's confirmed_flush */ ctx = CreateDecodingContext(InvalidXLogRecPtr, NIL, true, @@ -388,10 +385,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto) */ startlsn = InvalidXLogRecPtr; - /* - * The {begin_txn,change,commit_txn}_wrapper callbacks above will - * store the description into our tuplestore. - */ + /* Changes are not actually produced in fast_forward mode. */ if (record != NULL) LogicalDecodingProcessRecord(ctx, ctx->reader);
pgsql-hackers by date: