Thread: foreign keys and RI triggers

foreign keys and RI triggers

From
Neil Conway
Date:
I spent a little while looking into a performance issue with a large 
UPDATE on a table with foreign keys. A few questions:

(1) When a PK table is updated, we skip firing the per-row UPDATE RI 
triggers if none of the referenced columns in the PK table have been 
modified. However, AFAICS we do not apply a similar optimization for 
updates of foreign key tables: if a user does not modify the foreign key 
column, we needn't check for the presence of the FK column value in the 
primary key table. Is there a reason we don't implement this?

(2) For per-row RI triggers of all kinds, we save the trigger under 
CurTransactionContext and invoke it at the end of the current query. 
There is not even overflow to disk (the report that prompted me to look 
into this was someone's database crashing because they kept running OOM 
when doing an UPDATE of a large table with FKs on a pretty lowend 
machine). While avoiding consuming a lot of memory for queued trigger 
execution is worth doing anyway, ISTM we needn't queue RI triggers in 
the first place. Is there a reason we can't just invoke after-row RI 
triggers immediately?

(Hmm, I suppose we would need to defer firing the trigger until the 
command ID is incremented if the foreign key references its own table. 
But even so, this should not be an issue for non-self-referential 
foreign keys.)

(3) This is minor, but AFAICS RI_FKey_check_upd() is not used -- 
RI_FKey_check_ins() is used to validate both inserts and updates on 
tables with foreign keys (see tablecmds.c circa 4423). Both functions 
are just wrappers over RI_FKey_check() anyway. This is rather confusing; 
would anyone object if I removed both functions and made RI_FKey_check() 
public?

-Neil



Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Neil Conway wrote:

> I spent a little while looking into a performance issue with a large
> UPDATE on a table with foreign keys. A few questions:
>
> (1) When a PK table is updated, we skip firing the per-row UPDATE RI
> triggers if none of the referenced columns in the PK table have been
> modified. However, AFAICS we do not apply a similar optimization for
> updates of foreign key tables: if a user does not modify the foreign key
> column, we needn't check for the presence of the FK column value in the
> primary key table. Is there a reason we don't implement this?

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction.  I'm not sure why it's doing the
transaction id check, but it looks like it will do an equals check at
least some of the time.

> (2) For per-row RI triggers of all kinds, we save the trigger under
> CurTransactionContext and invoke it at the end of the current query.
> There is not even overflow to disk (the report that prompted me to look
> into this was someone's database crashing because they kept running OOM
> when doing an UPDATE of a large table with FKs on a pretty lowend
> machine). While avoiding consuming a lot of memory for queued trigger
> execution is worth doing anyway, ISTM we needn't queue RI triggers in
> the first place. Is there a reason we can't just invoke after-row RI
> triggers immediately?

If I'm understanding the question, there's two things.  First is deferred
constraints and the second is that constraints happen after the entire
statement.

In a case like:
insert into pk values(2);
insert into pk values(1);
insert into fk values(2);
update pk set key=key+1;

In no action, that's not an error AFAIK because the constraint is
satisfied at end of statement. If the order of updates happened such that
the key=2 row were updated first we couldn't know whether or not the
constraint would be satisfied by later updates to the same table.


Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Stephan Szabo wrote:

> On Thu, 26 May 2005, Neil Conway wrote:
>
> > (2) For per-row RI triggers of all kinds, we save the trigger under
> > CurTransactionContext and invoke it at the end of the current query.
> > There is not even overflow to disk (the report that prompted me to look
> > into this was someone's database crashing because they kept running OOM
> > when doing an UPDATE of a large table with FKs on a pretty lowend
> > machine). While avoiding consuming a lot of memory for queued trigger
> > execution is worth doing anyway, ISTM we needn't queue RI triggers in
> > the first place. Is there a reason we can't just invoke after-row RI
> > triggers immediately?
>
> If I'm understanding the question, there's two things.  First is deferred
> constraints and the second is that constraints happen after the entire
> statement.
>
> In a case like:
> insert into pk values(2);
> insert into pk values(1);
> insert into fk values(2);
> update pk set key=key+1;
>
> In no action, that's not an error AFAIK because the constraint is
> satisfied at end of statement. If the order of updates happened such that
> the key=2 row were updated first we couldn't know whether or not the
> constraint would be satisfied by later updates to the same table.

Although, we could almost run referential actions that way.  The almost
comes from some behavior for set default cases to make sure the default
still exists which I think would still have to happen at end of statement
to be spec complient (I don't have easy access to my copy of SQL99 from
here).  I think we're still a little short of entirely complient on timing
in any case because unless I'm misremembering constraint checks happen
after user defined triggers and referential actions happen before which
would be difficult if not impossible to do right now with the way we fire
triggers.


Re: foreign keys and RI triggers

From
Neil Conway
Date:
Stephan Szabo wrote:
> Are you sure? RI_FKey_Check seems to have a section on
> TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
> old row wasn't part of this transaction.

Well, regardless of how RI_FKey_Check() itself works, ISTM there is no 
need to enqueue the RI trigger in the first place. That's when the 
update-on-PK-table optimization is applied -- see trigger.c circa 3005. 
The specific case I was looking at resulted in the postgres backend 
allocating a few hundred MB just to store all the pending RI triggers, 
even though the UPDATE in question didn't change the foreign key field, 
so it didn't matter a great deal how quickly RI_FKey_Check() was able to 
bail out.

> If I'm understanding the question, there's two things.  First is deferred
> constraints

Right -- obviously we can't fire RI triggers for deferred constraints 
immediately. Immediate constraints are the common case, though.

> constraints happen after the entire statement.
> In a case like:
> insert into pk values(2);
> insert into pk values(1);
> insert into fk values(2);
> update pk set key=key+1;

Hmm, good point. But ISTM there are still some circumstances in which we 
can safely check the RI trigger immediately, rather than at end of 
statement. For example, updating the FK table, inserting into the FK 
table, or deleting from the PK table.

-Neil


Re: foreign keys and RI triggers

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Are you sure? RI_FKey_Check seems to have a section on
> TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
> old row wasn't part of this transaction.  I'm not sure why it's doing the
> transaction id check, but it looks like it will do an equals check at
> least some of the time.

I think the reason for the xact check is that if we have deferred
triggers and we do
begin;insert new FK value;update new row;commit;

then when the trigger for the INSERT event fires, it will do nothing
because the tuple it's triggered on is now dead.  So the trigger for
the UPDATE event had better make the check.  It's possible we could
skip the UPDATE event if we could be certain the INSERT trigger had
already fired, but I'm not sure how to be certain about that.

>> While avoiding consuming a lot of memory for queued trigger
>> execution is worth doing anyway, ISTM we needn't queue RI triggers in
>> the first place. Is there a reason we can't just invoke after-row RI
>> triggers immediately?

> If I'm understanding the question, there's two things.  First is deferred
> constraints and the second is that constraints happen after the entire
> satement.

Right.  RI constraints are actually the only kind we do "right" in
terms of enforcing the check when the SQL spec says we should.

The thoughts I've had about special-casing RI events to save memory
have to do with the idea of lossy storage.  As you accumulate more
per-row events, at some point it becomes more efficient to forget
the individual rows and just reapply the original full-table check
query when it's time to check the constraint.  So if we could recognize
RI events as being associated with the same constraint, and keep track
of how many are pending for each constraint, we could make a decision to
discard the queue and instead register one event to apply a full-table
check.  It's not clear how to do that efficiently though.
        regards, tom lane


Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Fri, 27 May 2005, Neil Conway wrote:

> Stephan Szabo wrote:
> > Are you sure? RI_FKey_Check seems to have a section on
> > TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
> > old row wasn't part of this transaction.
>
> Well, regardless of how RI_FKey_Check() itself works, ISTM there is no
> need to enqueue the RI trigger in the first place. That's when the
> update-on-PK-table optimization is applied -- see trigger.c circa 3005.
> The specific case I was looking at resulted in the postgres backend
> allocating a few hundred MB just to store all the pending RI triggers,
> even though the UPDATE in question didn't change the foreign key field,
> so it didn't matter a great deal how quickly RI_FKey_Check() was able to
> bail out.

Okay, I can't think of cases even with triggers and the like where
removing the check on equal valued rows would give appreciably different
results, but I haven't thought too hard about it.

> > If I'm understanding the question, there's two things.  First is deferred
> > constraints
>
> Right -- obviously we can't fire RI triggers for deferred constraints
> immediately. Immediate constraints are the common case, though.
>
> > constraints happen after the entire statement.
> > In a case like:
> > insert into pk values(2);
> > insert into pk values(1);
> > insert into fk values(2);
> > update pk set key=key+1;
>
> Hmm, good point. But ISTM there are still some circumstances in which we
> can safely check the RI trigger immediately, rather than at end of
> statement. For example, updating the FK table, inserting into the FK
> table, or deleting from the PK table.

Unfortunately, I don't think so, if my assumption that user triggers are
supposed to happen before constraint checks is true.  In that case, we
must wait until not only the action but all triggers fired by that action
happen in order to run the constraint check because a trigger could make
an otherwise invalid row valid.


Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Stephan Szabo wrote:

> On Fri, 27 May 2005, Neil Conway wrote:
>
> > Stephan Szabo wrote:
> > > Are you sure? RI_FKey_Check seems to have a section on
> > > TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
> > > old row wasn't part of this transaction.
> >
> > Well, regardless of how RI_FKey_Check() itself works, ISTM there is no
> > need to enqueue the RI trigger in the first place. That's when the
> > update-on-PK-table optimization is applied -- see trigger.c circa 3005.
> > The specific case I was looking at resulted in the postgres backend
> > allocating a few hundred MB just to store all the pending RI triggers,
> > even though the UPDATE in question didn't change the foreign key field,
> > so it didn't matter a great deal how quickly RI_FKey_Check() was able to
> > bail out.
>
> Okay, I can't think of cases even with triggers and the like where
> removing the check on equal valued rows would give appreciably different
> results, but I haven't thought too hard about it.

Err, except the case that Tom mentions in his message.



Re: foreign keys and RI triggers

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>> Okay, I can't think of cases even with triggers and the like where
>> removing the check on equal valued rows would give appreciably different
>> results, but I haven't thought too hard about it.

> Err, except the case that Tom mentions in his message.

But the check could incorporate the same transaction ID test already
in use.  I think Neil is right that it'd be a win to apply the test
before enqueueing the trigger instead of after.
        regards, tom lane


Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> >> Okay, I can't think of cases even with triggers and the like where
> >> removing the check on equal valued rows would give appreciably different
> >> results, but I haven't thought too hard about it.
>
> > Err, except the case that Tom mentions in his message.
>
> But the check could incorporate the same transaction ID test already
> in use.  I think Neil is right that it'd be a win to apply the test
> before enqueueing the trigger instead of after.

Good point.  That would help in many cases anyway.



Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Tom Lane wrote:

> The thoughts I've had about special-casing RI events to save memory
> have to do with the idea of lossy storage.  As you accumulate more
> per-row events, at some point it becomes more efficient to forget
> the individual rows and just reapply the original full-table check
> query when it's time to check the constraint.  So if we could recognize

One problem with that is that it works for the constraint check but not
for referential actions, although if we instead fired the referential
actions truly immediately rather than queued to statement end that'd
prevent those from being an issue. The only thing there is that we'd have
to also have a constraint check for at least set default.

> RI events as being associated with the same constraint, and keep track
> of how many are pending for each constraint, we could make a decision to
> discard the queue and instead register one event to apply a full-table
> check.  It's not clear how to do that efficiently though.

Yeah, I was thinking we could keep a separate structure for (foreign key
trigger oid, action) where we could keep track of a current count and
whether or not we've consolidated already and scan the queue when we do
the consolidation removing items for that oid.  That's still not very good
though.


Re: foreign keys and RI triggers

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Thu, 26 May 2005, Tom Lane wrote:
>> The thoughts I've had about special-casing RI events to save memory
>> have to do with the idea of lossy storage.

> One problem with that is that it works for the constraint check but not
> for referential actions,

True.  But even fixing it only for constraint checks would be a win.

Sooner or later, we will have to implement spill-to-disk for the trigger
event queue anyway ...
        regards, tom lane


Re: foreign keys and RI triggers

From
Stephan Szabo
Date:
On Thu, 26 May 2005, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Thu, 26 May 2005, Tom Lane wrote:
> >> The thoughts I've had about special-casing RI events to save memory
> >> have to do with the idea of lossy storage.
>
> > One problem with that is that it works for the constraint check but not
> > for referential actions,
>
> True.  But even fixing it only for constraint checks would be a win.

Yeah, I'm just wondering if going the extra step and forcing really really
immediate referential actions (even if that sometimes means adding a no
action trigger on the event as well) would be worth doing as that could
remove the queued pk actions for cascade and set null as well and at least
turn the queued pk action for set default into one that could be
consolidated.



Re: foreign keys and RI triggers

From
Neil Conway
Date:
Tom Lane wrote:
> But the check could incorporate the same transaction ID test already
> in use.  I think Neil is right that it'd be a win to apply the test
> before enqueueing the trigger instead of after.

Speaking of which, does anyone see a reason why RI_FKey_keyequal_upd() 
is implemented as a pseudo-trigger function -- e.g. taking a pointer to 
a TriggerData? It seems a kludge.

-Neil


Re: foreign keys and RI triggers

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Speaking of which, does anyone see a reason why RI_FKey_keyequal_upd() 
> is implemented as a pseudo-trigger function -- e.g. taking a pointer to 
> a TriggerData?

Because (a) it needs all the same arguments and (b) it can share
infrastructure with the other RI triggers.
        regards, tom lane


Re: foreign keys and RI triggers

From
Neil Conway
Date:
Tom Lane wrote:
> Because (a) it needs all the same arguments

Well, it needs the Trigger that we're in the process of queueing, the 
old tuple, the new tuple, and the updated relation. It doesn't need the 
rest of the content of TriggerData. trigger.c has to manually construct 
a TriggerData to pass to it, so it's not like it's a notational convenience.

> (b) it can share infrastructure with the other RI triggers.

Such as? I don't see anything it allows us to share.

-Neil


Re: foreign keys and RI triggers

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
>> (b) it can share infrastructure with the other RI triggers.

> Such as? I don't see anything it allows us to share.

Note the ri_BuildQueryKeyFull call, and the arguments thereto.

As to the notational convenience issue, I think it's good that
RI_FKey_keyequal_upd looks almost exactly like the other RI trigger
functions.  Having it look different in order to save a couple lines
in trigger.c doesn't seem like a win to me.
        regards, tom lane