Thread: FDW INSERT batching can change behavior

FDW INSERT batching can change behavior

From

git@jasonk.me

Date:

09 August 2024, 03:07:55

Hi,

According to the code, foreign data wrapper INSERT ON CONFLICT batching has
several limitations such as no RETURNING clause, no row triggers(?).  I found
one case that is not disallowed and ends up causing a behavior difference
depending on whether batching is enabled and not.

This example is derived from the contrib/postgres_fdw pg_regress test:

    CREATE EXTENSION postgres_fdw;

    CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
    DO $d$
        BEGIN
            EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
                OPTIONS (dbname '$$||current_database()||$$',
                         port '$$||current_setting('port')||$$'
                )$$;
        END;
    $d$;

    CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

    create table gloc1 (
      a int PRIMARY KEY,
      b int generated always as (a * 2) stored);
    alter table gloc1 set (autovacuum_enabled = 'false');
    create foreign table grem1 (
      a int not null,
      b int generated always as (a * 2) stored)
      server loopback options(table_name 'gloc1');

    create function counter() returns int8 language sql as $$select count(*) from grem1$$;
    ALTER FOREIGN TABLE grem1 ALTER COLUMN a SET DEFAULT (counter());

    insert into grem1 (a) values (default), (default), (default), (default), (default);

    alter server loopback options (add batch_size '3');
    insert into grem1 (a) values (default), (default), (default), (default), (default);

The first insert does not use batching, so it goes R W R W R W R W R W (R for
executing the default function to generate a slot: nodeModifyTable.c
ExecModifyTable context.planSlot = ExecProcNode(subplanstate); W for inserting
into the table).  This way, whenever the default function is called, it returns
a new value.

The second insert uses batching, so it goes R R R W W W R R W W.  The function
returns the same value within a batch, and in this case, it causes a conflict:

    ERROR:  duplicate key value violates unique constraint "gloc1_pkey"
    DETAIL:  Key (a)=(5) already exists.
    CONTEXT:  remote SQL command: INSERT INTO public.gloc1(a, b) VALUES ($1, DEFAULT), ($2, DEFAULT), ($3, DEFAULT)

Tested on 15.2 and 16.4 I compiled myself.

Jason

Re: FDW INSERT batching can change behavior

From

Tomas Vondra

Date:

09 August 2024, 19:55:00

On 8/9/24 05:07, git@jasonk.me wrote:
> Hi,
>
> ...
>
> The first insert does not use batching, so it goes R W R W R W R W R W (R for
> executing the default function to generate a slot: nodeModifyTable.c
> ExecModifyTable context.planSlot = ExecProcNode(subplanstate); W for inserting
> into the table).  This way, whenever the default function is called, it returns
> a new value.
> 
> The second insert uses batching, so it goes R R R W W W R R W W.  The function
> returns the same value within a batch, and in this case, it causes a conflict:
> 
>     ERROR:  duplicate key value violates unique constraint "gloc1_pkey"
>     DETAIL:  Key (a)=(5) already exists.
>     CONTEXT:  remote SQL command: INSERT INTO public.gloc1(a, b) VALUES ($1, DEFAULT), ($2, DEFAULT), ($3, DEFAULT)
> 
> Tested on 15.2 and 16.4 I compiled myself.
> 

Yeah, we don't seem to check for this. I don't recall if it didn't occur
to me we could have DEFAULT on the foreign table.

We could/should disable batching, but I'm not quite sure what exactly to
check. AFAIK this can happen only when there are default expressions on
the foreign table, so maybe that? Or maybe only when the DEFAULT calls a
volatile function?

Thanks for the report.

-- 
Tomas Vondra

Re: FDW INSERT batching can change behavior

From

Jason Kim

Date:

13 August 2024, 04:57:39

On 2024-08-09T21:55:00+0200, Tomas Vondra wrote:
> Yeah, we don't seem to check for this. I don't recall if it didn't occur
> to me we could have DEFAULT on the foreign table.
> 
> We could/should disable batching, but I'm not quite sure what exactly to
> check. AFAIK this can happen only when there are default expressions on
> the foreign table, so maybe that? Or maybe only when the DEFAULT calls a
> volatile function?

I didn't bother checking, but if CHECK constraints can call volatile functions,
that is another avenue for differing behavior.

Here is a completely different example concerning the way batches to different
partitions are flushed per-partition resulting in out-of-order insertion:

    CREATE EXTENSION postgres_fdw;

    CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
    DO $d$
    BEGIN
        EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (dbname '$$||current_database()||$$',
                     port '$$||current_setting('port')||$$'
            )$$;
    END;
    $d$;

    CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

    create table itrtest (a int) partition by range (a);
    create table loct1 (a int check (a % 100 != 3));
    create foreign table remp1 (a int check (a % 100 != 3)) server loopback options (table_name 'loct1');
    create table loct2 (a int check (a % 100 != 3));
    create foreign table remp2 (a int check (a % 100 != 3)) server loopback options (table_name 'loct2');
    alter table itrtest attach partition remp1 for values from (1) to (100);
    alter table itrtest attach partition remp2 for values from (101) to (200);

    insert into itrtest values (1), (2), (101), (103), (3);
    truncate itrtest;
    alter server loopback options (add batch_size '3');
    insert into itrtest values (1), (2), (101), (103), (3);

The first insert (non-batched) gives

    ERROR:  new row for relation "loct2" violates check constraint "loct2_a_check"
    DETAIL:  Failing row contains (103).
    CONTEXT:  remote SQL command: INSERT INTO public.loct2(a) VALUES ($1)

But the second insert (batched) gives

    ERROR:  new row for relation "loct1" violates check constraint "loct1_a_check"
    DETAIL:  Failing row contains (3).
    CONTEXT:  remote SQL command: INSERT INTO public.loct1(a) VALUES ($1), ($2), ($3)

This is because (103) is queued up in the batch and not actually inserted.
There might be a more severe example than this, but I did not think too much
about it.

Jason

Re: FDW INSERT batching can change behavior

From

Tomas Vondra

Date:

13 August 2024, 12:12:27

On 8/13/24 06:57, Jason Kim wrote:
> On 2024-08-09T21:55:00+0200, Tomas Vondra wrote:
>> Yeah, we don't seem to check for this. I don't recall if it didn't occur
>> to me we could have DEFAULT on the foreign table.
>>
>> We could/should disable batching, but I'm not quite sure what exactly to
>> check. AFAIK this can happen only when there are default expressions on
>> the foreign table, so maybe that? Or maybe only when the DEFAULT calls a
>> volatile function?
> 
> I didn't bother checking, but if CHECK constraints can call volatile functions,
> that is another avenue for differing behavior.
> 

While we technically allow volatile functions in CHECK constraints, the
documentation [1] says:

  PostgreSQL assumes that CHECK constraints' conditions are immutable,
  that is, they will always give the same result for the same input row.

So the expectation is that CHECK only calls immutable stuff.


[1] https://www.postgresql.org/docs/current/ddl-constraints.html

> Here is a completely different example concerning the way batches to different
> partitions are flushed per-partition resulting in out-of-order insertion:
> 
>     CREATE EXTENSION postgres_fdw;
> 
>     CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
>     DO $d$
>     BEGIN
>         EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
>             OPTIONS (dbname '$$||current_database()||$$',
>                      port '$$||current_setting('port')||$$'
>             )$$;
>     END;
>     $d$;
> 
>     CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
> 
>     create table itrtest (a int) partition by range (a);
>     create table loct1 (a int check (a % 100 != 3));
>     create foreign table remp1 (a int check (a % 100 != 3)) server loopback options (table_name 'loct1');
>     create table loct2 (a int check (a % 100 != 3));
>     create foreign table remp2 (a int check (a % 100 != 3)) server loopback options (table_name 'loct2');
>     alter table itrtest attach partition remp1 for values from (1) to (100);
>     alter table itrtest attach partition remp2 for values from (101) to (200);
> 
>     insert into itrtest values (1), (2), (101), (103), (3);
>     truncate itrtest;
>     alter server loopback options (add batch_size '3');
>     insert into itrtest values (1), (2), (101), (103), (3);
> 
> The first insert (non-batched) gives
> 
>     ERROR:  new row for relation "loct2" violates check constraint "loct2_a_check"
>     DETAIL:  Failing row contains (103).
>     CONTEXT:  remote SQL command: INSERT INTO public.loct2(a) VALUES ($1)
> 
> But the second insert (batched) gives
> 
>     ERROR:  new row for relation "loct1" violates check constraint "loct1_a_check"
>     DETAIL:  Failing row contains (3).
>     CONTEXT:  remote SQL command: INSERT INTO public.loct1(a) VALUES ($1), ($2), ($3)
> 
> This is because (103) is queued up in the batch and not actually inserted.
> There might be a more severe example than this, but I did not think too much
> about it.
> 

I agree this is a bit annoying, because it makes the observed error a
bit unpredictable. But as long as it does not allow inserting rows
violating the CHECK constraint, I wouldn't call this a bug.

The only way to prevent this would be to ensure we inserts actually
happen in the input order, and that's inherently against the whole idea
of batching. So if you want that, you have to disable batching.


regards

-- 
Tomas Vondra

Re: FDW INSERT batching can change behavior

From

Tomas Vondra

Date:

13 August 2024, 18:31:54

Hi,

I took a closer look at this, planning to way to fix this, but I think
it's actually a bit worse than reported - both in impact and ways how to
fix that.

The problem is it's not really specific to DEFAULT values. The exact
same issue exists whenever the insert uses the expressions directly.
That is, if you do this:


    insert into grem1 (a) values (counter()), (counter()),
                                 (counter()), (counter()),
                                 (counter());

it will misbehave in exactly the same way as with the default values. Of
course, this also means that my original idea to disable batching if the
foreign table has (volatile) expression in the DEFAULT value won't fly.

This can happen whenever the to-be-inserted rows have any expressions.
But those expressions are evaluated *outside* ModifyTable - in the nodes
that produce the rows. In the above example it's ValueScan. But it could
be any other node. For example:

    insert into grem1 (a) select counter() from generate_series(1,5);

does that in a subquery. But AFAICS it could be any other node.

Ideally we'd simply set batch_size=1 for those cases, but at this point
I have no idea how to check this from ModifyTable :-(

In retrospect the issue is pretty obvious, yet I haven't thought about
this while working on the batching. This is embarrassing.


regards

-- 
Tomas Vondra

Re: FDW INSERT batching can change behavior

From

Tomas Vondra

Date:

14 August 2024, 11:31:02

On 8/13/24 20:31, Tomas Vondra wrote:
> Hi,
> 
> I took a closer look at this, planning to way to fix this, but I think
> it's actually a bit worse than reported - both in impact and ways how to
> fix that.
> 
> The problem is it's not really specific to DEFAULT values. The exact
> same issue exists whenever the insert uses the expressions directly.
> That is, if you do this:
> 
> 
>     insert into grem1 (a) values (counter()), (counter()),
>                                  (counter()), (counter()),
>                                  (counter());
> 
> it will misbehave in exactly the same way as with the default values. Of
> course, this also means that my original idea to disable batching if the
> foreign table has (volatile) expression in the DEFAULT value won't fly.
> 
> This can happen whenever the to-be-inserted rows have any expressions.
> But those expressions are evaluated *outside* ModifyTable - in the nodes
> that produce the rows. In the above example it's ValueScan. But it could
> be any other node. For example:
> 
>     insert into grem1 (a) select counter() from generate_series(1,5);
> 
> does that in a subquery. But AFAICS it could be any other node.
> 
> Ideally we'd simply set batch_size=1 for those cases, but at this point
> I have no idea how to check this from ModifyTable :-(
> 
> In retrospect the issue is pretty obvious, yet I haven't thought about
> this while working on the batching. This is embarrassing.
> 

I've been thinking about this a bit more, and I'm not really sure using
counter() as a default value can ever be "correct". The problem is it's
inherently broken with concurrency - even with no batching, it'll fail
if two of these inserts run at the same time. The batching only makes
that more obvious / easier to hit, but it's not really the root cause.

regards

-- 
Tomas Vondra

Re: FDW INSERT batching can change behavior

From

Tomas Vondra

Date:

21 August 2024, 00:47:30


On 8/14/24 13:31, Tomas Vondra wrote:
> On 8/13/24 20:31, Tomas Vondra wrote:
>> Hi,
>>
>> I took a closer look at this, planning to way to fix this, but I think
>> it's actually a bit worse than reported - both in impact and ways how to
>> fix that.
>>
>> The problem is it's not really specific to DEFAULT values. The exact
>> same issue exists whenever the insert uses the expressions directly.
>> That is, if you do this:
>>
>>
>>     insert into grem1 (a) values (counter()), (counter()),
>>                                  (counter()), (counter()),
>>                                  (counter());
>>
>> it will misbehave in exactly the same way as with the default values. Of
>> course, this also means that my original idea to disable batching if the
>> foreign table has (volatile) expression in the DEFAULT value won't fly.
>>
>> This can happen whenever the to-be-inserted rows have any expressions.
>> But those expressions are evaluated *outside* ModifyTable - in the nodes
>> that produce the rows. In the above example it's ValueScan. But it could
>> be any other node. For example:
>>
>>     insert into grem1 (a) select counter() from generate_series(1,5);
>>
>> does that in a subquery. But AFAICS it could be any other node.
>>
>> Ideally we'd simply set batch_size=1 for those cases, but at this point
>> I have no idea how to check this from ModifyTable :-(
>>
>> In retrospect the issue is pretty obvious, yet I haven't thought about
>> this while working on the batching. This is embarrassing.
>>
> 
> I've been thinking about this a bit more, and I'm not really sure using
> counter() as a default value can ever be "correct". The problem is it's
> inherently broken with concurrency - even with no batching, it'll fail
> if two of these inserts run at the same time. The batching only makes
> that more obvious / easier to hit, but it's not really the root cause.
> 

FWIW I've brought this up on pgsql-hackers, just to get more opinions if
this should be considered a bug, and I think the conclusion is from that
thread is "not a bug".

I do agree the behavior change is a bit surprising, but even ignoring
the concurrency issues this INSERT/SELECT is inherently unsafe. There's
no guarantee the INSERT / SELECT will proceed in lockstep row by row,
which is what the counter() function relies on. We'd be well within our
right to e.g. run the SELECT to completion first, stash the results into
a tuplestore (in a Materialize node or something like that), and only
then to actually start processing the INSERT.

While that currently does not happen, it's not unthinkable we might do
something like that in a future reease (e.g. by doing a bit of batching
or vectorization in the executor).


regards

-- 
Tomas Vondra