Thread: Do foreign key triggers get ran even if the key's value doesn't change?
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/ which seems to indicate so. When I run the following test script, having 50 foreign keys takes about twice as long to do the update. Is there a reason for that? Seems like the RI triggers wouldn't have to run on updates if the value doesn't change. begin; set client_min_messages='warning'; CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT) RETURNS INTERVAL AS $$ DECLARE i INT; start_time TIMESTAMP; end_time TIMESTAMP; BEGIN DROP TABLE if exists test_fk CASCADE; CREATE TABLE test_fk ( id BIGINT PRIMARY KEY, junk VARCHAR ); INSERT INTO test_fk SELECT generate_series(1, 100000), repeat(' ', 20); CLUSTER test_fk_pkey ON test_fk; FOR i IN 1..key_count LOOP EXECUTE 'CREATE TABLE test_fk_ref_' || i || ' (test_fk_id BIGINT REFERENCES test_fk (id) ON UPDATE NO ACTION)'; END LOOP; start_time = clock_timestamp(); FOR i IN 1..100000 LOOP UPDATE test_fk SET junk = ' ' WHERE id = i; END LOOP; end_time = clock_timestamp(); FOR i IN 1..key_count LOOP EXECUTE 'DROP TABLE test_fk_ref_' || i; END LOOP; RETURN end_time - start_time; END; $$ LANGUAGE plpgsql VOLATILE; SELECT fnc_check_fk_overhead(1); SELECT fnc_check_fk_overhead(50);
On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.
When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.
That's kind of a question of definitions. Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run. Where do you draw the line around what is the trigger proper and what is just infrastructure?
However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.
Cheers,
Jeff
On Wednesday, May 21, 2014, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.
When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.That's kind of a question of definitions. Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run. Where do you draw the line around what is the trigger proper and what is just infrastructure?However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.
I was expecting that the RI update triggers would have a "when (new.key is distinct from old.key)" condition on them, which would mean that the number of referencing tables wouldn't matter.
Cheers,Jeff
On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote:
On Wednesday, May 21, 2014, Jeff Janes <jeff.janes@gmail.com> wrote:On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.
When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.That's kind of a question of definitions. Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run. Where do you draw the line around what is the trigger proper and what is just infrastructure?However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.I was expecting that the RI update triggers would have a "when (new.key is distinct from old.key)" condition on them, which would mean that the number of referencing tables wouldn't matter.
But that condition is checked for each constraint individually, not for all constraints simultaneously. A table can be referenced on multiple combinations of columns, so just one check may not suffice. I guess the triggers could be organized into groups of identical firing criteria and then checked only once per group, but that seems like a pretty obscure optimization to make. I don't know how you would reorganize such groupings in a concurrency safe way when constraints were added or removed.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote: >> I was expecting that the RI update triggers would have a "when (new.key is >> distinct from old.key)" condition on them, which would mean that the number >> of referencing tables wouldn't matter. > But that condition is checked for each constraint individually, not for all > constraints simultaneously. A table can be referenced on multiple > combinations of columns, so just one check may not suffice. I guess the > triggers could be organized into groups of identical firing criteria and > then checked only once per group, but that seems like a pretty obscure > optimization to make. I don't know how you would reorganize such groupings > in a concurrency safe way when constraints were added or removed. FWIW, I profiled this example (after cranking it up to 500 target tables just because). AFAICT the primary component of the runtime increase is query startup overhead associated with the increased number of target tables. If the UPDATE were touching more than one tuple then it might get to the point where per-tuple costs dominate, but it's not there in this example. If we tried to do something like what Jeff suggests to improve the per-tuple costs, it could actually make this example slower by adding more startup overhead. regards, tom lane
On Thu, May 22, 2014 at 10:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote: >>> I was expecting that the RI update triggers would have a "when (new.key is >>> distinct from old.key)" condition on them, which would mean that the number >>> of referencing tables wouldn't matter. > >> But that condition is checked for each constraint individually, not for all >> constraints simultaneously. A table can be referenced on multiple >> combinations of columns, so just one check may not suffice. I guess the >> triggers could be organized into groups of identical firing criteria and >> then checked only once per group, but that seems like a pretty obscure >> optimization to make. I don't know how you would reorganize such groupings >> in a concurrency safe way when constraints were added or removed. > > FWIW, I profiled this example (after cranking it up to 500 target tables > just because). AFAICT the primary component of the runtime increase is > query startup overhead associated with the increased number of target > tables. I must be missing something, there's only one table being updated? start_time = clock_timestamp(); FOR i IN 1..100000 LOOP UPDATE test_fk SET junk = ' ' WHERE id = i; END LOOP; end_time = clock_timestamp(); Joe > If the UPDATE were touching more than one tuple then it might > get to the point where per-tuple costs dominate, but it's not there in > this example. If we tried to do something like what Jeff suggests to > improve the per-tuple costs, it could actually make this example slower > by adding more startup overhead. > > regards, tom lane
Joe Van Dyk <joe@tanga.com> writes: > On Thu, May 22, 2014 at 10:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> FWIW, I profiled this example (after cranking it up to 500 target tables >> just because). AFAICT the primary component of the runtime increase is >> query startup overhead associated with the increased number of target >> tables. > I must be missing something, there's only one table being updated? Sorry, I wrote that a bit too hastily. The startup cost is actually mainly driven by the number of triggers attached to the UPDATE's target table. Specifically, what I'm seeing in oprofile is samples % symbol name 32694 26.0730 MemoryContextStrdup 22608 18.0296 hash_search_with_hash_value 9979 7.9581 AllocSetAlloc 6919 5.5178 ri_KeysEqual 3910 3.1182 AllocSetCheck 3641 2.9036 AfterTriggerSaveEvent 2852 2.2744 MemoryContextAlloc 2778 2.2154 hash_any 2410 1.9219 ri_FetchConstraintInfo 2236 1.7832 hash_search 1906 1.5200 CopyTriggerDesc 1833 1.4618 hash_uint32 1817 1.4490 ri_NullCheck 1663 1.3262 RI_FKey_pk_upd_check_required 1338 1.0670 sentinel_ok (nothing else above 1%) The MemoryContextStrdup calls are all coming from CopyTriggerDesc, which has to copy 1000 triggers' names in my test case; those calls presumably also account for a good chunk of the AllocSetAlloc cycles. (It's possible we could get rid of the TriggerDesc copy step in InitResultRelInfo, but it seems rather nervous-making to do so, and it wouldn't really be worth doing for examples with more realistic numbers of triggers.) The hash_search calls are coming from a bunch of places, mostly relation lookup and buffer cache lookup. There are some coming from ri_KeysEqual's lookup of which comparison function to use to perform the IS NOT DISTINCT FROM tests, but that doesn't seem to be a large component of that profile item. Anyway, it's hard to see how to improve this much, short of a major redesign to avoid treating independent foreign key constraints as independent triggers. regards, tom lane