Re: a misbehavior of partition row movement (?) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: a misbehavior of partition row movement (?)
Date
Msg-id CA+HiwqHf-i7zdxhkOJ2wB8CocmkULER7OO3B0zfeYjjT5Pqu8w@mail.gmail.com
Whole thread Raw
In response to Re: a misbehavior of partition row movement (?)  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: a misbehavior of partition row movement (?)
List pgsql-hackers
On Sat, Oct 3, 2020 at 8:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote
> > On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:
> > >On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
> > ><david.g.johnston@gmail.com> wrote:
> > >> On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com>
> > >> wrote:
> > >>>
> > >>>
> > >>> Reporter on that thread says that the last update should have failed
> > >>> and I don't quite see a workable alternative to that.
> > >>
> > >>
> > >> To be clear the OP would rather have it just work, the same as the
> > >> non-row-movement version.  Maybe insert the new row first, execute
> > >> the on update trigger chained from the old row, then delete the old
> > >> row?
> > >
> > >I was thinking yesterday about making it just work, but considering the
> > >changes that would need to be made to how the underlying triggers fire,
> > >it does not seem we would be able to back-port the solution.
> > >
> >
> > I think we need to differentiate between master and backbranches. IMO we
> > should try to make it "just work" in master, and the amount of code
> > should not be an issue there I think (no opinion on whether insert and
> > update trigger is the way to go). For backbranches we may need to do
> > something less intrusive, of course.
>
> Sure, that makes sense.  I will try making a patch for HEAD to make it
> just work unless someone beats me to it.

After working on this for a while, here is my proposal for HEAD.

To reiterate, the problem occurs when an UPDATE of a partitioned PK
table is turned into a DELETE + INSERT.  In that case, the UPDATE RI
triggers are not fired at all, but DELETE ones are, so the foreign key
may fail to get enforced correctly.  For example, even though the new
row after the update is still logically present in the PK table, it
would wrongly get deleted because of the DELETE RI trigger firing if
there's a ON DELETE CASCADE clause on the foreign key.

To fix that, I propose that we teach trigger.c to skip queuing the
events that would be dangerous to fire, such as that for the DELETE on
the source leaf partition mentioned above.  Instead, queue an UPDATE
event on the root target table, matching the actual operation being
performed.  Note though that this new arrangement doesn't affect the
firing of any other triggers except those that are relevant to the
reported problem, viz. the PK-side RI triggers.  All other triggers
fire exactly as they did before.

To make that happen, I had to:

1. Make RI triggers available on partitioned tables at all, which they
are not today.  Also, mark the RI triggers in partitions correctly as
*clones* of the RI triggers in their respective parents.

2. Make it possible to allow AfterTriggerSaveEvent() to access the
query's actual target relation, that is, in addition to the target
relation on which an event was fired.  Also, added a bunch of code to
AFTER TRIGGER infrastructure to handle events fired on partitioned
tables.  Because those events cannot contain physical references to
affected tuples, I generalized the code currently used to handle after
triggers on foreign tables by storing the tuples in and retrieving
them from a tuple store.  I read a bunch of caveats of that
implementation (such as its uselessness for deferred triggers), but
for the limited cases for which it will be used for partitioned
tables, it seems safe, because it won't be used for deferred triggers
on partitioned tables.

Attached patches 0001 and 0002 implement 1 and 2, respectively.
Later, I will post an updated version of the patch for the
back-branches, which, as mentioned upthread, is to prevent the
cross-partition updates on foreign key PK tables.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: VACUUM (DISABLE_PAGE_SKIPPING on)
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Disable WAL logging to speed up data loading