Thread: Referential Integrity Checks with Statement-level Triggers
Back when Pg added statement-level triggers, I was interested in the potential promise of moving referential integrity checks to statement-level triggers.
The initial conversation, along with Kevin Grittner's POC script (in SQL) that showed a potential for a 98% reduction in time spent doing RI checks. The original thread is here:
I approached Kevin and Thomas Munro seeking feedback on my approach. I also made it into a session at the PgConf.ASIA un-conference, and then later with Michael Paquier at that same conference, and the coalesced feedback was this:
The initial conversation, along with Kevin Grittner's POC script (in SQL) that showed a potential for a 98% reduction in time spent doing RI checks. The original thread is here:
I dug around in the code, and was rather surprised at how close we already are to implementing this. The function RI_Initial_Check() already does a left-join query via SPI to look for any invalid data, so if we could just replace the near table with the transition table for inserted rows, we'd be home free. The function SPI_register_trigger_data() makes the transition tables visible to SPI, so I started to wonder why this hadn't be done already.
- the overhead of registering the transition tables probably makes it unprofitable for single row inserts
- the single row overhead is itself significant, so maybe the transition tables aren't worse
- there has been talk of replacing transition tables with an in-memory data structure that would be closer to "free" from a startup perspective and might even coalesce the transition tables of multiple statements in the same transaction
- because no declarative code changes, it's trivial to switch from row level to statement level triggering via pg_upgrade
- assuming that transition tables are an overhead that only pays off when > N rows have been updated, does it make sense to enforce RI with something that isn't actually a trigger?
- there was also some mention that parallel query uses a queue mechanism that might be leveraged to do row-level triggers for updates of <= N rows and statement level for > N
That's what I have so far. I'm going to be working on a POC patch so that I can benchmark a pure-statement-level solution, which if nothing else will let us know the approximate value of N.
All suggestions are appreciated.
All suggestions are appreciated.
po 17. 12. 2018 v 15:32 odesílatel Corey Huinker <corey.huinker@gmail.com> napsal:
Back when Pg added statement-level triggers, I was interested in the potential promise of moving referential integrity checks to statement-level triggers.
The initial conversation, along with Kevin Grittner's POC script (in SQL) that showed a potential for a 98% reduction in time spent doing RI checks. The original thread is here:I dug around in the code, and was rather surprised at how close we already are to implementing this. The function RI_Initial_Check() already does a left-join query via SPI to look for any invalid data, so if we could just replace the near table with the transition table for inserted rows, we'd be home free. The function SPI_register_trigger_data() makes the transition tables visible to SPI, so I started to wonder why this hadn't be done already.I approached Kevin and Thomas Munro seeking feedback on my approach. I also made it into a session at the PgConf.ASIA un-conference, and then later with Michael Paquier at that same conference, and the coalesced feedback was this:- the overhead of registering the transition tables probably makes it unprofitable for single row inserts- the single row overhead is itself significant, so maybe the transition tables aren't worse- there has been talk of replacing transition tables with an in-memory data structure that would be closer to "free" from a startup perspective and might even coalesce the transition tables of multiple statements in the same transaction- because no declarative code changes, it's trivial to switch from row level to statement level triggering via pg_upgrade- assuming that transition tables are an overhead that only pays off when > N rows have been updated, does it make sense to enforce RI with something that isn't actually a trigger?- there was also some mention that parallel query uses a queue mechanism that might be leveraged to do row-level triggers for updates of <= N rows and statement level for > NThat's what I have so far. I'm going to be working on a POC patch so that I can benchmark a pure-statement-level solution, which if nothing else will let us know the approximate value of N.
All suggestions are appreciated.
It is great. I though little bit about it - just theoretically.
ROW trigger call RI check too often, and statement trigger too less. I think so ideal design can be call RI check every 10K rows. I think so can be unfriendly if somebody does very long import and it fails on the end. I don't think so there should not be any performance difference, if RI check is called per 1000 or more rows.
Regards
Pavel
On 2018-Dec-17, Pavel Stehule wrote: > ROW trigger call RI check too often, and statement trigger too less. I > think so ideal design can be call RI check every 10K rows. I think so can > be unfriendly if somebody does very long import and it fails on the end. I > don't think so there should not be any performance difference, if RI check > is called per 1000 or more rows. This is a good point, but I'm not sure if it's possible to implement using statement-level triggers. I think the way transition tables work is that you get the full results at the end of the command; there's no way to pass control to the RI stuff at arbitrary points during the execution of the command. Is there any guidance on the SQL standard about this? I don't think the timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on this. Or do they? Maybe there is a solution for this. I think it's worth thinking about, even if it's just to say that we won't do it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
po 17. 12. 2018 v 18:27 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2018-Dec-17, Pavel Stehule wrote:
> ROW trigger call RI check too often, and statement trigger too less. I
> think so ideal design can be call RI check every 10K rows. I think so can
> be unfriendly if somebody does very long import and it fails on the end. I
> don't think so there should not be any performance difference, if RI check
> is called per 1000 or more rows.
This is a good point, but I'm not sure if it's possible to implement
using statement-level triggers. I think the way transition tables work
is that you get the full results at the end of the command; there's no
way to pass control to the RI stuff at arbitrary points during the
execution of the command.
It was just a idea. When I think about it, I am not sure, if RI check from statement trigger is not worse when statement related changes are too big. On second hand, it is premature optimization maybe. We can check performance on prototype.
Is there any guidance on the SQL standard about this? I don't think the
timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
this. Or do they?
Maybe there is a solution for this. I think it's worth thinking about,
even if it's just to say that we won't do it.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
It's something I know I am interested in. For me, I don't really care if my statement doesn't cancel until the very end if there is a RI violation. The benefit of not having deletes be slow on tables which have others referencing it with a fkey which don't have their own index is huge IMO. I have a good number of those type of logging tables where an index is not useful 99% of the time, but every once and a while a bulk delete needs to happen.
It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heard about transition tables being worked on.
Just my $0.02.
-Adam
po 17. 12. 2018 v 19:19 odesílatel Adam Brusselback <adambrusselback@gmail.com> napsal:
It's something I know I am interested in. For me, I don't really care if my statement doesn't cancel until the very end if there is a RI violation. The benefit of not having deletes be slow on tables which have others referencing it with a fkey which don't have their own index is huge IMO. I have a good number of those type of logging tables where an index is not useful 99% of the time, but every once and a while a bulk delete needs to happen.It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heard about transition tables being worked on.
note: my sentence about premature optimization was related to my idea to divide RI check per 10K rows.
It would be great if RI check will be faster.
Just my $0.02.-Adam
On Mon, Dec 17, 2018 at 11:27 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2018-Dec-17, Pavel Stehule wrote: > >> ROW trigger call RI check too often, and statement trigger too less. I >> think so ideal design can be call RI check every 10K rows. I think so can >> be unfriendly if somebody does very long import and it fails on the end. I >> don't think so there should not be any performance difference, if RI check >> is called per 1000 or more rows. > > This is a good point, but I'm not sure if it's possible to implement > using statement-level triggers. I think the way transition tables work > is that you get the full results at the end of the command; there's no > way to pass control to the RI stuff at arbitrary points during the > execution of the command. > > Is there any guidance on the SQL standard about this? I don't think the > timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on > this. Or do they? Yes, they do. *ALL* AFTER triggers fire after the statement completes, it's a question of whether a particular trigger fires once for the whole statement or once for each row. Observe: test=# CREATE TABLE t1 (t1id int PRIMARY KEY, t1desc text); CREATE TABLE test=# CREATE TABLE t2 (t2id int PRIMARY KEY, t1id int NOT NULL, t2desc text, test(# FOREIGN KEY (t1id) REFERENCES t1); CREATE TABLE test=# CREATE FUNCTION t2_insert_func() test-# RETURNS TRIGGER test-# LANGUAGE plpgsql test-# AS $$ test$# BEGIN test$# RAISE NOTICE '%', new; test$# RETURN new; test$# END; test$# $$; CREATE FUNCTION test=# CREATE TRIGGER t2_insert_trig test-# BEFORE INSERT ON t2 test-# FOR EACH ROW test-# EXECUTE FUNCTION t2_insert_func(); CREATE TRIGGER test=# INSERT INTO t1 VALUES (1), (2), (3); INSERT 0 3 test=# INSERT INTO t2 VALUES (10, 1), (20, 2), (30, 3), (40, 4), (50, 5); NOTICE: (10,1,) NOTICE: (20,2,) NOTICE: (30,3,) NOTICE: (40,4,) NOTICE: (50,5,) ERROR: insert or update on table "t2" violates foreign key constraint "t2_t1id_fkey" DETAIL: Key (t1id)=(4) is not present in table "t1". All inserts occur before the statement fails, per standard. Kevin Grittner VMware vCenter Server https://www.vmware.com/
On Mon, Dec 17, 2018 at 8:32 AM Corey Huinker <corey.huinker@gmail.com> wrote: > All suggestions are appreciated. As I recall, the main issue is how to handle concurrency. The existing RI triggers do some very special handling of the multi-xact ID to manage concurrent modifications. Has anyone looked at the issues with using those techniques with set-oriented statements for the transition table approach? -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
> It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heardabout transition tables being worked on. Me too. Never-ending DELETEs are a common pain point especially for people migrated from MySQL which creates indexes for foreign keys automatically.
Attached is a patch that refactors DELETE triggers to fire at the statement level.
I chose delete triggers partly out of simplicity, and partly because there some before/after row linkage in the ON UPDATE CASCADE cases where statement level triggers might not be feasible as we have currently implemented them.
ri-set-logic.sql is an edited benchmark script adapted from Kevin Grittner's benchmark that he ran against hand-rolled triggers and posted on 2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.
I chose delete triggers partly out of simplicity, and partly because there some before/after row linkage in the ON UPDATE CASCADE cases where statement level triggers might not be feasible as we have currently implemented them.
After having done the work, I think INSERT triggers would be similarly straightforward, but wanted to limit scope.
Also, after having stripped the delete cases out of the update-or-delete functions, it became obvious that the on-update-set-null and on-update-set-default cases differed by only 3-4 lines, so those functions were combined.
On a vagrant VM running on my desktop machine, I'm seeing a speed-up of about 25% in the benchmark provided. I think that figure is cloudy and below my expectations. Perhaps we'd get a much better picture of whether or not this is worth it on a bare metal machine, or at least a VM better suited to benchmarking.
On a vagrant VM running on my desktop machine, I'm seeing a speed-up of about 25% in the benchmark provided. I think that figure is cloudy and below my expectations. Perhaps we'd get a much better picture of whether or not this is worth it on a bare metal machine, or at least a VM better suited to benchmarking.
Currently 4 make-check tests are failing. Two of which appear to false positives (the test makes assumptions about triggers that are no longer true), and the other two are outside the scope of this benchmark so I'll revisit them if we go forward.
ri-set-logic.sql is an edited benchmark script adapted from Kevin Grittner's benchmark that he ran against hand-rolled triggers and posted on 2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.
Many thanks to everyone who helped, often despite their own objections to the overall reasoning behind the endeavor. I'm aware that a large contingent of highly experienced people would very much like to replace our entire trigger architecture, or at least divorce RI checks from triggers. Maybe this patch spurs on that change. Even if nothing comes of it, it's been a great learning experience.
On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli <emre@hasegeli.com> wrote:
> It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heard about transition tables being worked on.
Me too. Never-ending DELETEs are a common pain point especially for
people migrated from MySQL which creates indexes for foreign keys
automatically.
Attachment
Corey Huinker <corey.huinker@gmail.com> wrote: > Attached is a patch that refactors DELETE triggers to fire at the statement level. > > I chose delete triggers partly out of simplicity, and partly because there > some before/after row linkage in the ON UPDATE CASCADE cases where statement > level triggers might not be feasible as we have currently implemented them. I tried to review this patch, also with the intention to learn more about AFTER triggers internals. As for the code, I understood your idea and it really seems like low hanging fruit. However in trigger.c I noticed a comment that transition table is not supported for deferred constraints. Of course I tried to use this information to test your patch: CREATE TABLE t_pk(i int PRIMARY KEY); CREATE TABLE t_fk(i int REFERENCES t_pk ON DELETE NO ACTION DEFERRABLE); INSERT INTO t_pk(i) VALUES (1); INSERT INTO t_fk(i) VALUES (1); BEGIN; SET CONSTRAINTS t_fk_i_fkey DEFERRED; DELETE FROM t_pk; COMMIT; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Missing transition table really appears to be the problem: #0 0x0000000000a8c2a8 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at tuplestore.c:548 #1 0x00000000009b0e58 in ri_on_delete_restrict (trigdata=0x7ffe800c2890, is_no_action=true) at ri_triggers.c:720 #2 0x00000000009b0d3f in RI_FKey_noaction_del (fcinfo=0x7ffe800c27a0) at ri_triggers.c:607 #3 0x00000000006b8768 in ExecCallTriggerFunc (trigdata=0x7ffe800c2890, tgindx=0, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690)at trigger.c:2412 #4 0x00000000006bbb86 in AfterTriggerExecute (event=0x1f0ab10, rel=0x7fc71306ea70, trigdesc=0x1f089f8, finfo=0x1f08c10,instr=0x0, per_tuple_context=0x1f30690, trig_tuple_slot1=0x0, trig_tuple_slot2=0x0) at trigger.c:4367 #5 0x00000000006bbf9e in afterTriggerInvokeEvents (events=0xf97950 <afterTriggers+16>, firing_id=1, estate=0x1f086c8, delete_ok=true)at trigger.c:4560 #6 0x00000000006bc844 in AfterTriggerFireDeferred () at trigger.c:4996 #7 0x000000000055c989 in CommitTransaction () at xact.c:1987 #8 0x000000000055d6a4 in CommitTransactionCommand () at xact.c:2832 While the idea to use the transition table is good, this approach probably requires the trigger engine (trigger.c) to be adjusted, and that in a non-trivial way. I'm also not sure if it's o.k. that performance related patch potentially makes performance worse in some cases. If FK violations are checked at statement boundary, the wasted effort / time can (at least in theory) be high if early rows violate the FK. Have you considered bulk processing of individual rows by row-level trigger? For IMMEDIATE constraints we'd have to ensure that the trigger is notified that the current row is the last one from the current query, but that might not be difficult. -- Antonin Houska https://www.cybertec-postgresql.com
While the idea to use the transition table is good, this approach probably
requires the trigger engine (trigger.c) to be adjusted, and that in a
non-trivial way.
It probably does. Several people with advanced knowledge of trigger.c expressed a desire to rebuild trigger.c from the ground up, and with it create case-specific tuplestores for handling referential integrity constraints, which would be lighter than either the transition tables or the per-row invocation of a trigger. After all, we need a RI check to happen, we don't need it to happen through a trigger function.
I'm also not sure if it's o.k. that performance related patch potentially
makes performance worse in some cases. If FK violations are checked at
statement boundary, the wasted effort / time can (at least in theory) be high
if early rows violate the FK.
That concern was also expressed with varying levels of alarm in their voices.
Have you considered bulk processing of individual rows by row-level trigger?
For IMMEDIATE constraints we'd have to ensure that the trigger is notified
that the current row is the last one from the current query, but that might
not be difficult.
I'm not sure I understand what you're suggesting, but if it keeps the overhead of one trigger firing per row deleted, then it doesn't seem like much of a win.
Given that this patch has been punted to v13, I'd like to instead look at how we might go about building up the transition tuplestores for the specific purpose of doing the RI checks, not just deletes, and executing those at the appropriate time, rather than trying to make our needs fit into trigger form.
Corey Huinker <corey.huinker@gmail.com> wrote: > Have you considered bulk processing of individual rows by row-level trigger? > For IMMEDIATE constraints we'd have to ensure that the trigger is notified > that the current row is the last one from the current query, but that might > not be difficult. > > I'm not sure I understand what you're suggesting, but if it keeps the > overhead of one trigger firing per row deleted, then it doesn't seem like > much of a win. I thought of a trigger that still fires for each row, but most of the time it only stores the row deleted into a tuplestore of the appropriate lifespan. The queries that you proposed would only be executed if the tuplestore contains given amount of tuples or if the last row of the current statement has been stored. > Given that this patch has been punted to v13, I'd like to instead look at > how we might go about building up the transition tuplestores for the > specific purpose of doing the RI checks, not just deletes, and executing > those at the appropriate time, rather than trying to make our needs fit into > trigger form. Constraint-specific tuplestore can make some things easier, but if table has both constraints and (non-constraint) triggers which use the transition tables, then the tuples will have to be stored in both tuplestores. On the other hand, using the same tuplestore for both constraint and non-constraint triggers is difficult because deferred constraint triggers need to see rows added by all statements while the non-constraint triggers should only see rows of the current statement. In order to avoid per-row calls of the constraint trigger functions, we could try to "aggregate" the constraint-specific events somehow, but I think a separate queue would be needed for the constraint-specific events. In general, the (after) triggers and constraints have too much in common, so separation of these w/o seeing code changes is beyond my imagination. -- Antonin Houska https://www.cybertec-postgresql.com
In order to avoid per-row calls of the constraint trigger functions, we could
try to "aggregate" the constraint-specific events somehow, but I think a
separate queue would be needed for the constraint-specific events.
In general, the (after) triggers and constraints have too much in common, so
separation of these w/o seeing code changes is beyond my imagination.
Yeah, there's a lot of potential for overlap where a trigger could "borrow" an RI tuplestore or vice versa.
The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.
The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.
On Tue, Feb 26, 2019 at 5:41 AM Corey Huinker <corey.huinker@gmail.com> wrote: >> In order to avoid per-row calls of the constraint trigger functions, we could >> try to "aggregate" the constraint-specific events somehow, but I think a >> separate queue would be needed for the constraint-specific events. >> >> In general, the (after) triggers and constraints have too much in common, so >> separation of these w/o seeing code changes is beyond my imagination. > > Yeah, there's a lot of potential for overlap where a trigger could "borrow" an RI tuplestore or vice versa. > > The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up anyguidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. Idefinitely have an interest in working on this for 13, but I don't feel good about striking out on my own without theirinput. Very interesting thread, but the current patch has been through two CFs without comments or new patches, so I'm going to mark it "Returned with feedback". I hope all this discussion will trigger more research in this space. -- Thomas Munro https://enterprisedb.com
> The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.
Very interesting thread, but the current patch has been through two
CFs without comments or new patches, so I'm going to mark it "Returned
with feedback". I hope all this discussion will trigger more research
in this space.
I've noticed that the zedstore efforts ran into the same problem that refactoring triggers has: we cannot determine which columns in a table will be affected by a trigger. so we have to assume that all of them will be. This causes a lot of unnecessary overhead with triggers. If we had a compilation step for triggers (which, ultimately means a compilation step for procedures) which kept a dependency tree of which tables/columns were touched, then we would have that insight. it's true that one dynamic statement or SELECT * would force us right back to keep-everything, but if procedures which did not do such things had performance benefits, that would be an incentive to code them more fastidiously.