Re: Problem with transition tables on partitioned tables with foreign-table partitions - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Problem with transition tables on partitioned tables with foreign-table partitions
Date
Msg-id CA+HiwqGAEP8auzZmP4Q7e9p7uR=J+WqMhyzYyZqO+xLc4s_YUw@mail.gmail.com
Whole thread Raw
In response to Re: Problem with transition tables on partitioned tables with foreign-table partitions  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: Problem with transition tables on partitioned tables with foreign-table partitions
List pgsql-hackers
On Wed, Jul 2, 2025 at 7:05 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Tue, Jul 1, 2025 at 4:42 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Jul 1, 2025 at 11:55 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > > While working on something else, I noticed that while we disallow
> > > transition tables on foreign tables, we allow transition tables on
> > > partitioned tables with foreign-table partitions, which produces
> > > incorrect results.  Here is an example using postgres_fdw:
> > >
> > > create table parent (a text, b int) partition by list (a);
> > > create table loct (a text, b int);
> > > create foreign table child (a text, b int)
> > >   server loopback options (table_name 'loct');
> > > alter table parent attach partition child for values in ('AAA');
> > >
> > > create function dump_insert() returns trigger language plpgsql as
> > > $$
> > >   begin
> > >     raise notice 'trigger = %, new table = %',
> > >                  TG_NAME,
> > >                  (select string_agg(new_table::text, ', ' order by a)
> > > from new_table);
> > >     return null;
> > >   end;
> > > $$;
> > > create trigger parent_insert_trig
> > >   after insert on parent referencing new table as new_table
> > >   for each statement execute procedure dump_insert();
> > >
> > > create function intercept_insert() returns trigger language plpgsql as
> > > $$
> > >   begin
> > >     new.b = new.b + 1000;
> > >     return new;
> > >   end;
> > > $$;
> > > create trigger intercept_insert_loct
> > >   before insert on loct
> > >   for each row execute procedure intercept_insert();
> > >
> > > insert into parent values ('AAA', 42);
> > > NOTICE:  trigger = parent_insert_trig, new table = (AAA,42)
> > > INSERT 0 1
> > >
> > > The trigger shows the original tuple created by the core, not the
> > > actual tuple inserted into the foreign-table partition, as
> > > postgres_fdw does not collect the actual tuple, of course!
> >
> > Maybe I'm missing something, but given that the intercept_insert()
> > function is applied during the "remote" operation, isn't it expected
> > that the parent table's trigger for a "local" operation shows the
> > original tuple?
>
> That is the question of how we define the after image of a row
> inserted into a foreign table, but consider the case where the
> partition is a plain table:
>
> create table parent (a text, b int) partition by list (a);
> create table child partition of parent for values in ('AAA');
> create trigger intercept_insert_child
>   before insert on child
>   for each row execute procedure intercept_insert();
> insert into parent values ('AAA', 42);
> NOTICE:  trigger = parent_insert_trig, new table = (AAA,1042)
> INSERT 0 1
>
> The trigger shows the final tuple, not the original tuple.  So from a
> consistency perspective, I thought it would be good if the trigger
> does so even in the case where the partition is a foreign table.

Ok, but if you define the trigger on the foreign table partition
(child) as follows, you do get what I think is the expected result?

create trigger intercept_insert_foreign_child
  before insert on child
  for each row execute procedure intercept_insert();

insert into parent values ('AAA', 42);
NOTICE:  trigger = parent_insert_trig, new table = (AAA,1042)

-- 2042, because row modified by both triggers
table parent;
  a  |  b
-----+------
 AAA | 2042
(1 row)

Or perhaps you're saying that the row returned by this line in ExecInsert():

       /*
         * insert into foreign table: let the FDW do it
         */
        slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
                                                               resultRelInfo,
                                                               slot,
                                                               planSlot);

is not the expected "after image", and thus should not be added to the
parent's transition table?

IIUC, to prevent that, we now hit the following error in:

void
ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
                     TupleTableSlot *slot, List *recheckIndexes,
                     TransitionCaptureState *transition_capture)
{
    TriggerDesc *trigdesc = relinfo->ri_TrigDesc;

    if (relinfo->ri_FdwRoutine && transition_capture &&
        transition_capture->tcs_insert_new_table)
    {
        Assert(relinfo->ri_RootResultRelInfo);
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot collect transition tuples from child
foreign tables")));
    }

> > >  So I would
> > > like to propose to fix this by the following: 1) disable using direct
> > > modify to modify foreign-table partitions if there are any
> > > transition-table triggers on the partitioned table, and then 2) throw
> > > an error in ExecARInsertTriggers()/ExecARUpdateTriggers()/ExecARDeleteTriggers()
> > > if they collects transition tuple(s) from a foreign-table partition.
> >
> > Is (2) intended to catch cases that occur during a foreign insert and
> > foreign/non-direct update/delete?
>
> That is right; the patch forces the FDW to perform ExecForeign*
> functions, and then throws an error in ExecAR* functions.  One good
> thing about this is that we are able to avoid throwing the error when
> local/remote row-level BEFORE triggers return NULL.

Given my question above, I’m not sure I fully understand the intention
behind adding these checks.

--
Thanks, Amit Langote



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Fujii Masao
Date:
Subject: Re: Don't keep closed WAL segment in page cache after replay