Thread: Do foreign key triggers get ran even if the key's value doesn't change?

Do foreign key triggers get ran even if the key's value doesn't change?

From
Joe Van Dyk
Date:
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.

When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.

begin;

set client_min_messages='warning';

CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS INTERVAL AS
$$
DECLARE
  i INT;
  start_time TIMESTAMP;
  end_time TIMESTAMP;
BEGIN
  DROP TABLE if exists test_fk CASCADE;

  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i ||
            ' (test_fk_id BIGINT REFERENCES test_fk (id) ON UPDATE NO ACTION)';
  END LOOP;

  start_time = clock_timestamp();

  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '                    '
     WHERE id = i;
  END LOOP;

  end_time = clock_timestamp();

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;

  RETURN end_time - start_time;

END;
$$ LANGUAGE plpgsql VOLATILE;

SELECT fnc_check_fk_overhead(1);
SELECT fnc_check_fk_overhead(50);



On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.

When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.

That's kind of a question of definitions.  Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run.  Where do you draw the line around what is the trigger proper and what is just infrastructure?  

However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.  
 
Cheers,

Jeff
On Wednesday, May 21, 2014, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.

When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.

That's kind of a question of definitions.  Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run.  Where do you draw the line around what is the trigger proper and what is just infrastructure?  

However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.  
 

I was expecting that the RI update triggers would have a "when (new.key is distinct from old.key)" condition on them, which would mean that the number of referencing tables wouldn't matter. 

 
Cheers,

Jeff
On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote:
On Wednesday, May 21, 2014, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, May 21, 2014 at 1:11 PM, Joe Van Dyk <joe@tanga.com> wrote:
I came across http://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/
which seems to indicate so.

When I run the following test script, having 50 foreign keys takes
about twice as long to do the update. Is there a reason for that?
Seems like the RI triggers wouldn't have to run on updates if the
value doesn't change.

That's kind of a question of definitions.  Perhaps the trigger itself doesn't need to run, but the code that decides whether the trigger needs to run does need to run.  Where do you draw the line around what is the trigger proper and what is just infrastructure?  

However you wish to define it, change your function so that it actually does change the key field, and see how much slower that is than the behavior where you update the row without updating the key.  
 

I was expecting that the RI update triggers would have a "when (new.key is distinct from old.key)" condition on them, which would mean that the number of referencing tables wouldn't matter. 

But that condition is checked for each constraint individually, not for all constraints simultaneously.  A table can be referenced on multiple combinations of columns, so just one check may not suffice.  I guess the triggers could be organized into groups of identical firing criteria and then checked only once per group, but that seems like a pretty obscure optimization to make.  I don't know how you would reorganize such groupings in a concurrency safe way when constraints were added or removed.

Cheers,

Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote:
>> I was expecting that the RI update triggers would have a "when (new.key is
>> distinct from old.key)" condition on them, which would mean that the number
>> of referencing tables wouldn't matter.

> But that condition is checked for each constraint individually, not for all
> constraints simultaneously.  A table can be referenced on multiple
> combinations of columns, so just one check may not suffice.  I guess the
> triggers could be organized into groups of identical firing criteria and
> then checked only once per group, but that seems like a pretty obscure
> optimization to make.  I don't know how you would reorganize such groupings
> in a concurrency safe way when constraints were added or removed.

FWIW, I profiled this example (after cranking it up to 500 target tables
just because).  AFAICT the primary component of the runtime increase is
query startup overhead associated with the increased number of target
tables.  If the UPDATE were touching more than one tuple then it might
get to the point where per-tuple costs dominate, but it's not there in
this example.  If we tried to do something like what Jeff suggests to
improve the per-tuple costs, it could actually make this example slower
by adding more startup overhead.

            regards, tom lane


On Thu, May 22, 2014 at 10:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Wed, May 21, 2014 at 7:48 PM, Joe Van Dyk <joe@tanga.com> wrote:
>>> I was expecting that the RI update triggers would have a "when (new.key is
>>> distinct from old.key)" condition on them, which would mean that the number
>>> of referencing tables wouldn't matter.
>
>> But that condition is checked for each constraint individually, not for all
>> constraints simultaneously.  A table can be referenced on multiple
>> combinations of columns, so just one check may not suffice.  I guess the
>> triggers could be organized into groups of identical firing criteria and
>> then checked only once per group, but that seems like a pretty obscure
>> optimization to make.  I don't know how you would reorganize such groupings
>> in a concurrency safe way when constraints were added or removed.
>
> FWIW, I profiled this example (after cranking it up to 500 target tables
> just because).  AFAICT the primary component of the runtime increase is
> query startup overhead associated with the increased number of target
> tables.

I must be missing something, there's only one table being updated?

  start_time = clock_timestamp();

  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '                    '
     WHERE id = i;
  END LOOP;

  end_time = clock_timestamp();

Joe


> If the UPDATE were touching more than one tuple then it might
> get to the point where per-tuple costs dominate, but it's not there in
> this example.  If we tried to do something like what Jeff suggests to
> improve the per-tuple costs, it could actually make this example slower
> by adding more startup overhead.
>
>                         regards, tom lane


Joe Van Dyk <joe@tanga.com> writes:
> On Thu, May 22, 2014 at 10:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, I profiled this example (after cranking it up to 500 target tables
>> just because).  AFAICT the primary component of the runtime increase is
>> query startup overhead associated with the increased number of target
>> tables.

> I must be missing something, there's only one table being updated?

Sorry, I wrote that a bit too hastily.  The startup cost is actually
mainly driven by the number of triggers attached to the UPDATE's target
table.  Specifically, what I'm seeing in oprofile is

samples  %        symbol name
32694    26.0730  MemoryContextStrdup
22608    18.0296  hash_search_with_hash_value
9979      7.9581  AllocSetAlloc
6919      5.5178  ri_KeysEqual
3910      3.1182  AllocSetCheck
3641      2.9036  AfterTriggerSaveEvent
2852      2.2744  MemoryContextAlloc
2778      2.2154  hash_any
2410      1.9219  ri_FetchConstraintInfo
2236      1.7832  hash_search
1906      1.5200  CopyTriggerDesc
1833      1.4618  hash_uint32
1817      1.4490  ri_NullCheck
1663      1.3262  RI_FKey_pk_upd_check_required
1338      1.0670  sentinel_ok
(nothing else above 1%)

The MemoryContextStrdup calls are all coming from CopyTriggerDesc, which
has to copy 1000 triggers' names in my test case; those calls presumably
also account for a good chunk of the AllocSetAlloc cycles.  (It's possible
we could get rid of the TriggerDesc copy step in InitResultRelInfo, but it
seems rather nervous-making to do so, and it wouldn't really be worth
doing for examples with more realistic numbers of triggers.)

The hash_search calls are coming from a bunch of places, mostly relation
lookup and buffer cache lookup.  There are some coming from ri_KeysEqual's
lookup of which comparison function to use to perform the IS NOT DISTINCT
FROM tests, but that doesn't seem to be a large component of that profile
item.

Anyway, it's hard to see how to improve this much, short of a major
redesign to avoid treating independent foreign key constraints as
independent triggers.

            regards, tom lane