On Mon, Jul 15, 2024 at 10:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > On Fri, Jul 5, 2024 at 9:49 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> >> To fix, how about instead modifying postgres_fdw so that if the remote
> >> server is v15 or later, it just recreates a cursor when rewinding it
> >> is needed, like the attached?
>
> Uh ... does this ensure that the data hasn't changed?
postgres_fdw opens a remote transaction using REPEATABLE READ or
SERIALIZABLE, so the recreated cursor uses the same snapshot except
changes made by the remote transaction.
This causes eg, a join-UPDATE query where multiple rows join to the
same foreign target row to repeatedly update the target row, as shown
below, which would never happen if rewinding the cursor.
Slightly modified version of the test case in the proposed patch where
the foreign row (1001, ‘foo’) is repeatedly updated:
CREATE TABLE loct1 (c1 int);
CREATE TABLE loct2 (c1 int, c2 text);
INSERT INTO loct1 VALUES (1001);
INSERT INTO loct1 VALUES (1001);
INSERT INTO loct2 SELECT id, to_char(id, 'FM0000') FROM
generate_series(1, 1000) id;
INSERT INTO loct2 VALUES (1001, 'foo');
INSERT INTO loct2 VALUES (1002, 'bar');
CREATE FOREIGN TABLE remt2 (c1 int, c2 text) SERVER loopback OPTIONS
(table_name 'loct2');
ANALYZE loct1;
ANALYZE remt2;
SET enable_mergejoin TO false;
SET enable_hashjoin TO false;
SET enable_material TO false;
EXPLAIN (VERBOSE, COSTS OFF)
UPDATE remt2 SET c2 = remt2.c2 || remt2.c2 FROM loct1 WHERE loct1.c1 =
remt2.c1 RETURNING remt2.*;
QUERY PLAN
--------------------------------------------------------------------------------
Update on public.remt2
Output: remt2.c1, remt2.c2
Remote SQL: UPDATE public.loct2 SET c2 = $2 WHERE ctid = $1 RETURNING c1, c2
-> Nested Loop
Output: (remt2.c2 || remt2.c2), remt2.ctid, remt2.*, loct1.ctid
Join Filter: (remt2.c1 = loct1.c1)
-> Seq Scan on public.loct1
Output: loct1.ctid, loct1.c1
-> Foreign Scan on public.remt2
Output: remt2.c2, remt2.ctid, remt2.*, remt2.c1
Remote SQL: SELECT c1, c2, ctid FROM public.loct2 FOR UPDATE
(11 rows)
UPDATE remt2 SET c2 = remt2.c2 || remt2.c2 FROM loct1 WHERE loct1.c1 =
remt2.c1 RETURNING remt2.*;
c1 | c2
------+--------------
1001 | foofoo
1001 | foofoofoofoo
(2 rows)
Note that postgres_fdw already recreates a cursor when doing a rescan
with parameter changes, so we already have this issue. IMO I think we
should avoid writing a query like this.
Best regards,
Etsuro Fujita