Thread: FDW INSERT batching can change behavior
Hi, According to the code, foreign data wrapper INSERT ON CONFLICT batching has several limitations such as no RETURNING clause, no row triggers(?). I found one case that is not disallowed and ends up causing a behavior difference depending on whether batching is enabled and not. This example is derived from the contrib/postgres_fdw pg_regress test: CREATE EXTENSION postgres_fdw; CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; END; $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; create table gloc1 ( a int PRIMARY KEY, b int generated always as (a * 2) stored); alter table gloc1 set (autovacuum_enabled = 'false'); create foreign table grem1 ( a int not null, b int generated always as (a * 2) stored) server loopback options(table_name 'gloc1'); create function counter() returns int8 language sql as $$select count(*) from grem1$$; ALTER FOREIGN TABLE grem1 ALTER COLUMN a SET DEFAULT (counter()); insert into grem1 (a) values (default), (default), (default), (default), (default); alter server loopback options (add batch_size '3'); insert into grem1 (a) values (default), (default), (default), (default), (default); The first insert does not use batching, so it goes R W R W R W R W R W (R for executing the default function to generate a slot: nodeModifyTable.c ExecModifyTable context.planSlot = ExecProcNode(subplanstate); W for inserting into the table). This way, whenever the default function is called, it returns a new value. The second insert uses batching, so it goes R R R W W W R R W W. The function returns the same value within a batch, and in this case, it causes a conflict: ERROR: duplicate key value violates unique constraint "gloc1_pkey" DETAIL: Key (a)=(5) already exists. CONTEXT: remote SQL command: INSERT INTO public.gloc1(a, b) VALUES ($1, DEFAULT), ($2, DEFAULT), ($3, DEFAULT) Tested on 15.2 and 16.4 I compiled myself. Jason
On 8/9/24 05:07, git@jasonk.me wrote: > Hi, > > ... > > The first insert does not use batching, so it goes R W R W R W R W R W (R for > executing the default function to generate a slot: nodeModifyTable.c > ExecModifyTable context.planSlot = ExecProcNode(subplanstate); W for inserting > into the table). This way, whenever the default function is called, it returns > a new value. > > The second insert uses batching, so it goes R R R W W W R R W W. The function > returns the same value within a batch, and in this case, it causes a conflict: > > ERROR: duplicate key value violates unique constraint "gloc1_pkey" > DETAIL: Key (a)=(5) already exists. > CONTEXT: remote SQL command: INSERT INTO public.gloc1(a, b) VALUES ($1, DEFAULT), ($2, DEFAULT), ($3, DEFAULT) > > Tested on 15.2 and 16.4 I compiled myself. > Yeah, we don't seem to check for this. I don't recall if it didn't occur to me we could have DEFAULT on the foreign table. We could/should disable batching, but I'm not quite sure what exactly to check. AFAIK this can happen only when there are default expressions on the foreign table, so maybe that? Or maybe only when the DEFAULT calls a volatile function? Thanks for the report. -- Tomas Vondra
On 2024-08-09T21:55:00+0200, Tomas Vondra wrote: > Yeah, we don't seem to check for this. I don't recall if it didn't occur > to me we could have DEFAULT on the foreign table. > > We could/should disable batching, but I'm not quite sure what exactly to > check. AFAIK this can happen only when there are default expressions on > the foreign table, so maybe that? Or maybe only when the DEFAULT calls a > volatile function? I didn't bother checking, but if CHECK constraints can call volatile functions, that is another avenue for differing behavior. Here is a completely different example concerning the way batches to different partitions are flushed per-partition resulting in out-of-order insertion: CREATE EXTENSION postgres_fdw; CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; END; $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; create table itrtest (a int) partition by range (a); create table loct1 (a int check (a % 100 != 3)); create foreign table remp1 (a int check (a % 100 != 3)) server loopback options (table_name 'loct1'); create table loct2 (a int check (a % 100 != 3)); create foreign table remp2 (a int check (a % 100 != 3)) server loopback options (table_name 'loct2'); alter table itrtest attach partition remp1 for values from (1) to (100); alter table itrtest attach partition remp2 for values from (101) to (200); insert into itrtest values (1), (2), (101), (103), (3); truncate itrtest; alter server loopback options (add batch_size '3'); insert into itrtest values (1), (2), (101), (103), (3); The first insert (non-batched) gives ERROR: new row for relation "loct2" violates check constraint "loct2_a_check" DETAIL: Failing row contains (103). CONTEXT: remote SQL command: INSERT INTO public.loct2(a) VALUES ($1) But the second insert (batched) gives ERROR: new row for relation "loct1" violates check constraint "loct1_a_check" DETAIL: Failing row contains (3). CONTEXT: remote SQL command: INSERT INTO public.loct1(a) VALUES ($1), ($2), ($3) This is because (103) is queued up in the batch and not actually inserted. There might be a more severe example than this, but I did not think too much about it. Jason
On 8/13/24 06:57, Jason Kim wrote: > On 2024-08-09T21:55:00+0200, Tomas Vondra wrote: >> Yeah, we don't seem to check for this. I don't recall if it didn't occur >> to me we could have DEFAULT on the foreign table. >> >> We could/should disable batching, but I'm not quite sure what exactly to >> check. AFAIK this can happen only when there are default expressions on >> the foreign table, so maybe that? Or maybe only when the DEFAULT calls a >> volatile function? > > I didn't bother checking, but if CHECK constraints can call volatile functions, > that is another avenue for differing behavior. > While we technically allow volatile functions in CHECK constraints, the documentation [1] says: PostgreSQL assumes that CHECK constraints' conditions are immutable, that is, they will always give the same result for the same input row. So the expectation is that CHECK only calls immutable stuff. [1] https://www.postgresql.org/docs/current/ddl-constraints.html > Here is a completely different example concerning the way batches to different > partitions are flushed per-partition resulting in out-of-order insertion: > > CREATE EXTENSION postgres_fdw; > > CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw; > DO $d$ > BEGIN > EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw > OPTIONS (dbname '$$||current_database()||$$', > port '$$||current_setting('port')||$$' > )$$; > END; > $d$; > > CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; > > create table itrtest (a int) partition by range (a); > create table loct1 (a int check (a % 100 != 3)); > create foreign table remp1 (a int check (a % 100 != 3)) server loopback options (table_name 'loct1'); > create table loct2 (a int check (a % 100 != 3)); > create foreign table remp2 (a int check (a % 100 != 3)) server loopback options (table_name 'loct2'); > alter table itrtest attach partition remp1 for values from (1) to (100); > alter table itrtest attach partition remp2 for values from (101) to (200); > > insert into itrtest values (1), (2), (101), (103), (3); > truncate itrtest; > alter server loopback options (add batch_size '3'); > insert into itrtest values (1), (2), (101), (103), (3); > > The first insert (non-batched) gives > > ERROR: new row for relation "loct2" violates check constraint "loct2_a_check" > DETAIL: Failing row contains (103). > CONTEXT: remote SQL command: INSERT INTO public.loct2(a) VALUES ($1) > > But the second insert (batched) gives > > ERROR: new row for relation "loct1" violates check constraint "loct1_a_check" > DETAIL: Failing row contains (3). > CONTEXT: remote SQL command: INSERT INTO public.loct1(a) VALUES ($1), ($2), ($3) > > This is because (103) is queued up in the batch and not actually inserted. > There might be a more severe example than this, but I did not think too much > about it. > I agree this is a bit annoying, because it makes the observed error a bit unpredictable. But as long as it does not allow inserting rows violating the CHECK constraint, I wouldn't call this a bug. The only way to prevent this would be to ensure we inserts actually happen in the input order, and that's inherently against the whole idea of batching. So if you want that, you have to disable batching. regards -- Tomas Vondra
Hi, I took a closer look at this, planning to way to fix this, but I think it's actually a bit worse than reported - both in impact and ways how to fix that. The problem is it's not really specific to DEFAULT values. The exact same issue exists whenever the insert uses the expressions directly. That is, if you do this: insert into grem1 (a) values (counter()), (counter()), (counter()), (counter()), (counter()); it will misbehave in exactly the same way as with the default values. Of course, this also means that my original idea to disable batching if the foreign table has (volatile) expression in the DEFAULT value won't fly. This can happen whenever the to-be-inserted rows have any expressions. But those expressions are evaluated *outside* ModifyTable - in the nodes that produce the rows. In the above example it's ValueScan. But it could be any other node. For example: insert into grem1 (a) select counter() from generate_series(1,5); does that in a subquery. But AFAICS it could be any other node. Ideally we'd simply set batch_size=1 for those cases, but at this point I have no idea how to check this from ModifyTable :-( In retrospect the issue is pretty obvious, yet I haven't thought about this while working on the batching. This is embarrassing. regards -- Tomas Vondra
On 8/13/24 20:31, Tomas Vondra wrote: > Hi, > > I took a closer look at this, planning to way to fix this, but I think > it's actually a bit worse than reported - both in impact and ways how to > fix that. > > The problem is it's not really specific to DEFAULT values. The exact > same issue exists whenever the insert uses the expressions directly. > That is, if you do this: > > > insert into grem1 (a) values (counter()), (counter()), > (counter()), (counter()), > (counter()); > > it will misbehave in exactly the same way as with the default values. Of > course, this also means that my original idea to disable batching if the > foreign table has (volatile) expression in the DEFAULT value won't fly. > > This can happen whenever the to-be-inserted rows have any expressions. > But those expressions are evaluated *outside* ModifyTable - in the nodes > that produce the rows. In the above example it's ValueScan. But it could > be any other node. For example: > > insert into grem1 (a) select counter() from generate_series(1,5); > > does that in a subquery. But AFAICS it could be any other node. > > Ideally we'd simply set batch_size=1 for those cases, but at this point > I have no idea how to check this from ModifyTable :-( > > In retrospect the issue is pretty obvious, yet I haven't thought about > this while working on the batching. This is embarrassing. > I've been thinking about this a bit more, and I'm not really sure using counter() as a default value can ever be "correct". The problem is it's inherently broken with concurrency - even with no batching, it'll fail if two of these inserts run at the same time. The batching only makes that more obvious / easier to hit, but it's not really the root cause. regards -- Tomas Vondra
On 8/14/24 13:31, Tomas Vondra wrote: > On 8/13/24 20:31, Tomas Vondra wrote: >> Hi, >> >> I took a closer look at this, planning to way to fix this, but I think >> it's actually a bit worse than reported - both in impact and ways how to >> fix that. >> >> The problem is it's not really specific to DEFAULT values. The exact >> same issue exists whenever the insert uses the expressions directly. >> That is, if you do this: >> >> >> insert into grem1 (a) values (counter()), (counter()), >> (counter()), (counter()), >> (counter()); >> >> it will misbehave in exactly the same way as with the default values. Of >> course, this also means that my original idea to disable batching if the >> foreign table has (volatile) expression in the DEFAULT value won't fly. >> >> This can happen whenever the to-be-inserted rows have any expressions. >> But those expressions are evaluated *outside* ModifyTable - in the nodes >> that produce the rows. In the above example it's ValueScan. But it could >> be any other node. For example: >> >> insert into grem1 (a) select counter() from generate_series(1,5); >> >> does that in a subquery. But AFAICS it could be any other node. >> >> Ideally we'd simply set batch_size=1 for those cases, but at this point >> I have no idea how to check this from ModifyTable :-( >> >> In retrospect the issue is pretty obvious, yet I haven't thought about >> this while working on the batching. This is embarrassing. >> > > I've been thinking about this a bit more, and I'm not really sure using > counter() as a default value can ever be "correct". The problem is it's > inherently broken with concurrency - even with no batching, it'll fail > if two of these inserts run at the same time. The batching only makes > that more obvious / easier to hit, but it's not really the root cause. > FWIW I've brought this up on pgsql-hackers, just to get more opinions if this should be considered a bug, and I think the conclusion is from that thread is "not a bug". I do agree the behavior change is a bit surprising, but even ignoring the concurrency issues this INSERT/SELECT is inherently unsafe. There's no guarantee the INSERT / SELECT will proceed in lockstep row by row, which is what the counter() function relies on. We'd be well within our right to e.g. run the SELECT to completion first, stash the results into a tuplestore (in a Materialize node or something like that), and only then to actually start processing the INSERT. While that currently does not happen, it's not unthinkable we might do something like that in a future reease (e.g. by doing a bit of batching or vectorization in the executor). regards -- Tomas Vondra