Thread: BEFORE ROW triggers for partitioned tables
Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy -- just remove the check against them. With that, they work fine. The main problem is that the executor is not prepared to re-route the tuple if the user decides to change the partitioning columns (so you get the error that the partitioning constraint is violated by the partition, which makes no sense if you're inserting in the top-level partitioned table). There are several views about that: 1. Just let it be. If the user does something silly, it's their problem if they get an ugly error message. 2. If the partition keys are changed, raise an error. The trigger is allowed to change everything but those columns. Then there's no conflict, and it allows desirable use cases. 3. Allow the partition keys to change, as long as the tuple ends up in the same partition. This is the same as (1) except the error message is nicer. The attached patch implements (2). The cases that are allowed by (3) are a strict superset of those allowed by (2), so if we decide to allow it in the future, it is possible without breaking anything that works after implementing (2). The implementation harnesses the newly added pg_trigger.tgparentid column; if that is set to a non-zero value, then we search up the partitioning hierarchy for each partitioning key column, and verify the values are bitwise equal, up to the "root". Notes: * We must check all levels, not just the one immediately above, because the routing might involve crawling down several levels, and any of those might become invalidated if the trigger changes values. * The "root" is not necessarily the root partitioned table, but instead it's the table that was named in the command. Because of this, we don't need to acquire locks on the tables, since the executor already has the tables open and locked (thus they cannot be modified by concurrent commands). * I find it a little odd that the leaf partition doesn't have any intel on what its partitioning columns are. I guess they haven't been needed thus far, and it seems inappropriate for this admittedly very small feature to add such a burden on the rest of the system. * The new function I added, ReportTriggerPartkeyChange(), contains one serious bug (namely: it doesn't map attribute numbers properly if partitions are differently defined). Also, it has a performance issue, namely that we do heap_getattr() potentially repeatedly -- maybe it'd be better to "slotify" the tuple prior to doing the checks. Another possible controversial point is that its location in commands/trigger.c isn't great. (Frankly, I don't understand why the executor trigger firing functions are in commands/ at all.) Thoughts? -- Álvaro Herrera
Attachment
On 2020-02-27 17:51, Alvaro Herrera wrote: > Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy > -- just remove the check against them. With that, they work fine. This looks good to me in principle. It's a useful thing to support. > 1. Just let it be. If the user does something silly, it's their problem > if they get an ugly error message. > > 2. If the partition keys are changed, raise an error. The trigger is > allowed to change everything but those columns. Then there's no > conflict, and it allows desirable use cases. > > 3. Allow the partition keys to change, as long as the tuple ends up in > the same partition. This is the same as (1) except the error message is > nicer. > > The attached patch implements (2). That seems alright to me. > * The new function I added, ReportTriggerPartkeyChange(), contains one > serious bug (namely: it doesn't map attribute numbers properly if > partitions are differently defined). Also, it has a performance issue, > namely that we do heap_getattr() potentially repeatedly -- maybe it'd be > better to "slotify" the tuple prior to doing the checks. The attribute ordering issue obviously needs to be addressed, but the performance issue is probably not so important. How many levels of partitioning are we expecting? > Another > possible controversial point is that its location in commands/trigger.c > isn't great. (Frankly, I don't understand why the executor trigger > firing functions are in commands/ at all.) yeah ... -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > * The "root" is not necessarily the root partitioned table, but instead > it's the table that was named in the command. Because of this, we don't > need to acquire locks on the tables, since the executor already has the > tables open and locked (thus they cannot be modified by concurrent > commands). I believe this is because of the partition level constraints on the table that was named in the command would catch any violation in the partition key change in the levels above that table. Will it be easier to subject the new tuple to the partition level constraints themselves and report if those are violated. See RelationGetPartitionQual() for getting partition constraints. This function includes partition constraints from all the levels so in your function you don't have to walk up the partition tree. It includes constraints from the level above the table that was named in the command, but that might be fine. We will catch the error earlier and may be provide a better error message. > > * The new function I added, ReportTriggerPartkeyChange(), contains one > serious bug (namely: it doesn't map attribute numbers properly if > partitions are differently defined). IIUC the code in your patch, it seems you are just looking at partnatts. But partition key can contain expressions also which are stored in partexprs. So, I think the code won't catch change in the partition key values when it contains expression. Using RelationGetPartitionQual() will fix this problem and also problem of attribute remapping across the partition hierarchy. -- Best Wishes, Ashutosh Bapat
On Wed, Mar 11, 2020 at 8:53 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > > > * The "root" is not necessarily the root partitioned table, but instead > > it's the table that was named in the command. Because of this, we don't > > need to acquire locks on the tables, since the executor already has the > > tables open and locked (thus they cannot be modified by concurrent > > commands). > > I believe this is because of the partition level constraints on the > table that was named in the command would catch any violation in the > partition key change in the levels above that table. > > Will it be easier to subject the new tuple to the partition level > constraints themselves and report if those are violated. See > RelationGetPartitionQual() for getting partition constraints. This > function includes partition constraints from all the levels so in your > function you don't have to walk up the partition tree. It includes > constraints from the level above the table that was named in the > command, but that might be fine. We will catch the error earlier and > may be provide a better error message. I realized that this will implement the third option in your original proposal, not the second one. I suppose that's fine too? -- Best Wishes, Ashutosh Bapat
On 2020-03-12 05:17, Ashutosh Bapat wrote: > On Wed, Mar 11, 2020 at 8:53 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: >> Will it be easier to subject the new tuple to the partition level >> constraints themselves and report if those are violated. See >> RelationGetPartitionQual() for getting partition constraints. This >> function includes partition constraints from all the levels so in your >> function you don't have to walk up the partition tree. It includes >> constraints from the level above the table that was named in the >> command, but that might be fine. We will catch the error earlier and >> may be provide a better error message. > > I realized that this will implement the third option in your original > proposal, not the second one. I suppose that's fine too? It might be that that is actually easier to do. Instead of trying to figure out which columns have changed, in the face of different column ordering and general expressions, just check after a trigger whether the column still fits into the partition. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Mar-11, Ashutosh Bapat wrote: > On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > * The new function I added, ReportTriggerPartkeyChange(), contains one > > serious bug (namely: it doesn't map attribute numbers properly if > > partitions are differently defined). > > IIUC the code in your patch, it seems you are just looking at > partnatts. But partition key can contain expressions also which are > stored in partexprs. So, I think the code won't catch change in the > partition key values when it contains expression. Using > RelationGetPartitionQual() will fix this problem and also problem of > attribute remapping across the partition hierarchy. Oh, of course. In fact, I don't need to deal with PartitionQual directly; I can just use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already have all we need. v2 attached. Here's some example output. With my previous patch, this was the error we reported: insert into parted values (1, 1, 'uno uno v2'); -- fail ERROR: changing partitioning columns in a before trigger is not supported DETAIL: Column "a" was changed by trigger "t". Now, passing emitError=true to ExecPartitionCheck, I get this: insert into parted values (1, 1, 'uno uno v2'); -- fail ERROR: new row for relation "parted_1_1" violates partition constraint DETAIL: Failing row contains (2, 1, uno uno v2). Note the discrepancy in the table named in the INSERT vs. the one in the error message. This is a low-quality error IMO. So I'd instead pass emitError=false, and produce my own error. It's useful to report the trigger name and original partition name: insert into parted values (1, 1, 'uno uno v2'); -- fail ERROR: moving row to another partition during a BEFORE trigger is not supported DETAIL: Before trigger "t", row was to be in partition "public.parted_1_1" Note that in this implementation I no longer know which column is the problematic one, but I suppose users have clue enough. Wording suggestions welcome. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Mar-11, Ashutosh Bapat wrote:
> On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > * The new function I added, ReportTriggerPartkeyChange(), contains one
> > serious bug (namely: it doesn't map attribute numbers properly if
> > partitions are differently defined).
>
> IIUC the code in your patch, it seems you are just looking at
> partnatts. But partition key can contain expressions also which are
> stored in partexprs. So, I think the code won't catch change in the
> partition key values when it contains expression. Using
> RelationGetPartitionQual() will fix this problem and also problem of
> attribute remapping across the partition hierarchy.
Oh, of course.
In fact, I don't need to deal with PartitionQual directly; I can just
use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
have all we need. v2 attached.
Thanks.
insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: moving row to another partition during a BEFORE trigger is not supported
DETAIL: Before trigger "t", row was to be in partition "public.parted_1_1"
Note that in this implementation I no longer know which column is the
problematic one, but I suppose users have clue enough. Wording
suggestions welcome.
When we have expression as a partition key, there may not be one particular column which causes the row to move out of partition. So, this should be fine.
A slight wording suggestion below.
- */
- if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
- elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));
!AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF triggers
as well?
- */
- if (stmt->timing != TRIGGER_TYPE_AFTER)
Same comment as the above?
+ /*
+ * After a tuple in a partition goes through a trigger, the user
+ * could have changed the partition key enough that the tuple
+ * no longer fits the partition. Verify that.
+ */
+ if (trigger->tgisclone &&
Why do we want to restrict this check only for triggers which are cloned from
the ancestors?
+ !ExecPartitionCheck(relinfo, slot, estate, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("moving row to another partition during a BEFORE trigger is not supported"),
+ errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",
In the error message you removed above, we are mentioning BEFORE FOR EACH ROW
trigger. Should we continue to use the same terminology?
I was wondering whether it would be good to check the partition constraint only
once i.e. after all before row triggers have been executed. This would avoid
throwing an error in case multiple triggers together worked to keep the tuple
in the same partition when individual trigger/s caused it to move out of that
partition. But then we would loose the opportunity to point out the before row
trigger which actually caused the row to move out of the partition. Anyway,
wanted to bring that for the discussion here.
@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
*
* The purpose of this function is to ensure that we get the same
* PartitionDesc for each relation every time we look it up. In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with
Thanks for catching it. Looks unrelated though.
+-- Before triggers and partitions
The test looks good. Should we add a test for partitioned table with partition
key as expression?
The approach looks good to me.
Best Wishes,
Ashutosh
I was expecting that documentation somewhere covered the fact that BR triggers are not supported on a partitioned table. And this patch would remove/improve that portion of the code. But I don't see any documentation changes in this patch. On Tue, Mar 17, 2020 at 10:11 PM Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote: > > > > On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> >> On 2020-Mar-11, Ashutosh Bapat wrote: >> >> > On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera >> > <alvherre@2ndquadrant.com> wrote: >> >> > > * The new function I added, ReportTriggerPartkeyChange(), contains one >> > > serious bug (namely: it doesn't map attribute numbers properly if >> > > partitions are differently defined). >> > >> > IIUC the code in your patch, it seems you are just looking at >> > partnatts. But partition key can contain expressions also which are >> > stored in partexprs. So, I think the code won't catch change in the >> > partition key values when it contains expression. Using >> > RelationGetPartitionQual() will fix this problem and also problem of >> > attribute remapping across the partition hierarchy. >> >> Oh, of course. >> >> In fact, I don't need to deal with PartitionQual directly; I can just >> use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already >> have all we need. v2 attached. > > > Thanks. > >> >> insert into parted values (1, 1, 'uno uno v2'); -- fail >> ERROR: moving row to another partition during a BEFORE trigger is not supported >> DETAIL: Before trigger "t", row was to be in partition "public.parted_1_1" >> >> Note that in this implementation I no longer know which column is the >> problematic one, but I suppose users have clue enough. Wording >> suggestions welcome. > > > When we have expression as a partition key, there may not be one particular column which causes the row to move out ofpartition. So, this should be fine. > A slight wording suggestion below. > > - * Complain if we find an unexpected trigger type. > - */ > - if (!TRIGGER_FOR_AFTER(trigForm->tgtype)) > - elog(ERROR, "unexpected trigger \"%s\" found", > - NameStr(trigForm->tgname)); > > !AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF triggers > as well? > - */ > - if (stmt->timing != TRIGGER_TYPE_AFTER) > > Same comment as the above? > > + /* > + * After a tuple in a partition goes through a trigger, the user > + * could have changed the partition key enough that the tuple > + * no longer fits the partition. Verify that. > + */ > + if (trigger->tgisclone && > > Why do we want to restrict this check only for triggers which are cloned from > the ancestors? > > + !ExecPartitionCheck(relinfo, slot, estate, false)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("moving row to another partition during a BEFORE trigger is not supported"), > + errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"", > > In the error message you removed above, we are mentioning BEFORE FOR EACH ROW > trigger. Should we continue to use the same terminology? > > I was wondering whether it would be good to check the partition constraint only > once i.e. after all before row triggers have been executed. This would avoid > throwing an error in case multiple triggers together worked to keep the tuple > in the same partition when individual trigger/s caused it to move out of that > partition. But then we would loose the opportunity to point out the before row > trigger which actually caused the row to move out of the partition. Anyway, > wanted to bring that for the discussion here. > > @@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt) > * > * The purpose of this function is to ensure that we get the same > * PartitionDesc for each relation every time we look it up. In the > - * face of current DDL, different PartitionDescs may be constructed with > + * face of concurrent DDL, different PartitionDescs may be constructed with > > Thanks for catching it. Looks unrelated though. > > +-- Before triggers and partitions > > The test looks good. Should we add a test for partitioned table with partition > key as expression? > > The approach looks good to me. > > -- > Best Wishes, > Ashutosh -- Best Wishes, Ashutosh Bapat
On 2020-Mar-17, Ashutosh Bapat wrote: > On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > Note that in this implementation I no longer know which column is the > > problematic one, but I suppose users have clue enough. Wording > > suggestions welcome. > > When we have expression as a partition key, there may not be one particular > column which causes the row to move out of partition. So, this should be > fine. True. > A slight wording suggestion below. > > - * Complain if we find an unexpected trigger type. > - */ > - if (!TRIGGER_FOR_AFTER(trigForm->tgtype)) > - elog(ERROR, "unexpected trigger \"%s\" found", > - NameStr(trigForm->tgname)); > > !AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF > triggers as well? Hmm, yeah, this should check both types; I'll put it back. Note that this is just a cross-check that the catalogs we're going to copy don't contain bogus info; the real backstop for that at the user level is in the other one you complain about: > - */ > - if (stmt->timing != TRIGGER_TYPE_AFTER) > > Same comment as the above? Note that in this one we have a check for INSTEAD before we enter the FOR EACH ROW block, so this case is already covered -- AFAICS the code is correct. > + /* > + * After a tuple in a partition goes through a trigger, the user > + * could have changed the partition key enough that the tuple > + * no longer fits the partition. Verify that. > + */ > + if (trigger->tgisclone && > > Why do we want to restrict this check only for triggers which are > cloned from the ancestors? Because it's not our business in the other case. When the trigger is defined directly in the partition, it's the user's problem if something going amiss. > + !ExecPartitionCheck(relinfo, slot, estate, false)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("moving row to another partition during a BEFORE trigger is not > supported"), > + errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"", > > In the error message you removed above, we are mentioning BEFORE FOR EACH > ROW trigger. Should we continue to use the same terminology? Sounds good, I'll change that. I also changed the errdetail slightly: errdetail("Before executing trigger \"%s\", the row was to be in partition \"%s.%s\"", > I was wondering whether it would be good to check the partition > constraint only once i.e. after all before row triggers have been > executed. This would avoid throwing an error in case multiple triggers > together worked to keep the tuple in the same partition when > individual trigger/s caused it to move out of that partition. But then > we would loose the opportunity to point out the before row trigger > which actually caused the row to move out of the partition. Anyway, > wanted to bring that for the discussion here. Yeah, I too thought about a combination of triggers that move the tuple elsewhere and back. Frankly, I don't think we need to support that. It sounds devious and likely we'll miss some odd corner case -- anything involving the weird cross-partition UPDATE mechanism sounds easy to get wrong. > +-- Before triggers and partitions > > The test looks good. Should we add a test for partitioned table with > partition > key as expression? Will do. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for the reviews; I have pushed it now. (I made the doc edits you mentioned too.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi commiter,
Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch.
Here are two questions confusing me:
1. Your modification removes the check BR triggers against partitioned table,
and a more friendly error message is added to the ExecInsert and ExecUpdate.
You are correct. ExecInsert does not reroute partitions.
However, when ExecUpdate modifies partition keys,
it is almost equivalent to ExecDelete and ExecInsert,
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore,
why should an error be thrown in ExecUpdate?
Let's look at a case :
```
postgres=# create table parted (a int, b int, c text) partition by list (a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger language plpgsql as $$
begin
new.a = new.a + 1;
return new;
end;
$$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1');
INSERT 0 1
postgres=# create trigger t before update on parted
for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3';
```
If there is no check in the ExecUpdate,
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?
the partitioned table, but did not fundamentally solve the problem caused
by modifying partition keys in the BR trigger. What are the difficulties in
solving this problem fundamentally? We plan to implement it.
Can you give us some suggestions?
------------------------------------------------------------------发件人:Alvaro Herrera <alvherre@2ndquadrant.com>发送时间:2021年1月18日(星期一) 20:36收件人:Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>抄 送:Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; Pg Hackers <pgsql-hackers@lists.postgresql.org>主 题:Re: BEFORE ROW triggers for partitioned tablesThanks for the reviews; I have pushed it now.
(I made the doc edits you mentioned too.)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi commiter,
Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch.
Here are two questions confusing me:
1. Your modification removes the check BR triggers against partitioned table,
and a more friendly error message is added to the ExecInsert and ExecUpdate.
You are correct. ExecInsert does not reroute partitions.
However, when ExecUpdate modifies partition keys,
it is almost equivalent to ExecDelete and ExecInsert,
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore,
why should an error be thrown in ExecUpdate?
Let's look at a case :
```
postgres=# create table parted (a int, b int, c text) partition by list (a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger language plpgsql as $$
begin
new.a = new.a + 1;
return new;
end;
$$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1');
INSERT 0 1
postgres=# create trigger t before update on parted
for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3';
```
If there is no check in the ExecUpdate,
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?
the partitioned table, but did not fundamentally solve the problem caused
by modifying partition keys in the BR trigger. What are the difficulties in
solving this problem fundamentally? We plan to implement it.
Can you give us some suggestions?
We look forward to your reply.
Thank you very much,
Regards, Adger
------------------------------------------------------------------发件人:Alvaro Herrera <alvherre@2ndquadrant.com>发送时间:2021年1月18日(星期一) 20:36收件人:Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>抄 送:Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; Pg Hackers <pgsql-hackers@lists.postgresql.org>主 题:Re: BEFORE ROW triggers for partitioned tablesThanks for the reviews; I have pushed it now.
(I made the doc edits you mentioned too.)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services