Re: A few patches to clarify snapshot management - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: A few patches to clarify snapshot management |
Date | |
Msg-id | 20250809222338.cc.nmisch@google.com Whole thread Raw |
In response to | Re: A few patches to clarify snapshot management (Heikki Linnakangas <hlinnaka@iki.fi>) |
List | pgsql-hackers |
On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote: > On 07/01/2025 00:00, Andres Freund wrote: > > On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote: > > > While playing around some more with this, I noticed that this code in > > > GetTransactionSnapshot() is never reached, and AFAICS has always been dead > > > code: > > > > > > > Snapshot > > > > GetTransactionSnapshot(void) > > > > { > > > > /* > > > > * Return historic snapshot if doing logical decoding. We'll never need a > > > > * non-historic transaction snapshot in this (sub-)transaction, so there's > > > > * no need to be careful to set one up for later calls to > > > > * GetTransactionSnapshot(). > > > > */ > > > > if (HistoricSnapshotActive()) > > > > { > > > > Assert(!FirstSnapshotSet); > > > > return HistoricSnapshot; > > > > } > > > > > > when you think about it, that's good, because it doesn't really make sense > > > to call GetTransactionSnapshot() during logical decoding. We jump through > > > hoops to make the historic catalog decoding possible with historic > > > snapshots, tracking subtransactions that modify catalogs and WAL-logging > > > command ids, but they're not suitable for general purpose queries. So I > > > think we should turn that into an error, per attached patch. > > > > Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C > > code in an output functions calling GetTransactionSnapshot() or such to do > > some internal lookups? > > I haven't seen any. And I don't think that would work correctly while doing > logical decoding anyway, because historical snapshots only track XIDs that > modify catalogs. regclassout and enumout do work because they use the > catalog snapshot rather than GetTransactionSnapshot(). > > (I committed that change in commit 1585ff7387 already, but discussion is > still welcome of course) https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code for its row_filter feature. row_filter calls ExecEvalExpr() from the output plugin, e.g. to evaluate expression "id between 2 AND 4" arising from this configuration in the pglogical test suite: SELECT * FROM pglogical.replication_set_add_table('default', 'basic_dml', false, row_filter := $rf$id between 2 AND 4$rf$); One of the GetTransactionSnapshot() calls is in pglogical_output_plugin.c itself. For that, I could work around the change by forcing the old HistoricSnapshot use: - PushActiveSnapshot(GetTransactionSnapshot()); + Assert(HistoricSnapshotActive()); + PushActiveSnapshot(GetCatalogSnapshot(InvalidOid)); That doesn't get far. Calling a plpgsql function in the expression reaches the "cannot take query snapshot during logical decoding" error via this stack trace: GetTransactionSnapshot at snapmgr.c:279:3 exec_eval_simple_expr at pl_exec.c:6214:3 (inlined by) exec_eval_expr at pl_exec.c:5699:6 exec_stmt_raise at pl_exec.c:3820:8 (inlined by) exec_stmts at pl_exec.c:2096:10 exec_stmt_block at pl_exec.c:1955:6 exec_toplevel_block at pl_exec.c:1646:7 plpgsql_exec_function at pl_exec.c:636:5 plpgsql_call_handler at pl_handler.c:278:11 fmgr_security_definer at fmgr.c:755:52 ExecInterpExpr at execExprInterp.c:927:7 pglogical_change_filter at pglogical_output_plugin.c:663:7 (inlined by) pg_decode_change at pglogical_output_plugin.c:691:7 change_cb_wrapper at logical.c:1121:22 ReorderBufferApplyChange at reorderbuffer.c:2078:3 (inlined by) ReorderBufferProcessTXN at reorderbuffer.c:2383:7 DecodeCommit at decode.c:743:3 (inlined by) xact_decode at decode.c:242:5 LogicalDecodingProcessRecord at decode.c:123:1 XLogSendLogical at walsender.c:3442:33 WalSndLoop at walsender.c:2837:7 StartLogicalReplication at walsender.c:1504:2 (inlined by) exec_replication_command at walsender.c:2158:6 PostgresMain at postgres.c:4762:10 BackendMain at backend_startup.c:80:2 postmaster_child_launch at launch_backend.c:291:3 BackendStartup at postmaster.c:3587:8 (inlined by) ServerLoop at postmaster.c:1702:6 PostmasterMain at postmaster.c:1252:6 main at main.c:165:4 > > Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C > > code in an output functions calling GetTransactionSnapshot() or such to do > > some internal lookups? I think pglogical_output_plugin.c w/ plpgsql is largely sane when used with a plpgsql function that consults only catalogs and the output tuple. If the pglogical test suite is representative, that's the usual case for a row_filter. A plpgsql function that reads user tables will be fragile with concurrent pruning, but a user might sanely accept that fragility. A plpgsql function that writes tuples is not sane in a row_filter. How do you see it? So far, I know of these options: 1. Make pglogical block the row_filter feature for any v18+ origin. 2. Revert postgresql.git commit 1585ff7387. 3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles snapshots directly. That should keep simple row_filter expressions like "col > 0" functioning. Entering plpgsql or similarly-complex logic will fail with "cannot take query snapshot during logical decoding", and we'll consider that to be working as intended. 4. Fail later and lazily, for just the most-unreasonable cases. For example, fail when HistoricSnapshot applies to a write operation. (Maybe this already fails. I didn't check.) Which of those or other options should we consider? For reference, https://github.com/2ndQuadrant/pglogical/pull/503 is an otherwise-working port of pglogical to v18. Its Makefile currently disables the tests that reach "cannot take query snapshot during logical decoding".
pgsql-hackers by date: