Re: Optimising Foreign Key checks - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Optimising Foreign Key checks
Date
Msg-id CA+U5nMKLOauuC8jXzQUH=oHgyAGEU76s7j4NoBLhv0nK5AwOZA@mail.gmail.com
Whole thread Raw
In response to Re: Optimising Foreign Key checks  (Noah Misch <noah@leadboat.com>)
Responses Re: Optimising Foreign Key checks
Re: Optimising Foreign Key checks
List pgsql-hackers
On 9 June 2013 02:12, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
>> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah@leadboat.com> wrote:
>> > Likewise; I don't see why we couldn't perform an optimistic check ASAP and
>> > schedule a final after-statement check when an early check fails.  That
>> > changes performance characteristics without changing semantics.
>>
>> ...this seems like it might have some promise; but what if the action
>> we're performing isn't idempotent?  And how do we know?
>
> The action discussed so far is RI_FKey_check_ins().  It acquires a KEY SHARE
> lock (idempotent by nature) on a row that it finds using B-tree equality
> (presumed IMMUTABLE, thus idempotent).  RI_FKey_check_upd() is nearly the same
> action, so the same argument holds.  Before treating any other operation in
> the same way, one would need to conduct similar analysis.

As long as we are talking about FKs only, then this approach can work.
All we are doing is applying the locks slightly earlier than before.
Once locked they will prevent any later violations, so we are safe
from anybody except *ourselves* from making changes that would
invalidate the earlier check.  Trouble is, there are various ways I
can see that as possible, so making a check early doesn't allow you to
avoid making the check later as well.

AFAICS there are weird cases where changing the way FKs execute will
change the way complex trigger applications will execute. I don't see
a way to avoid that other than "do nothing". Currently, we execute the
checks following the normal order of execution rules for triggers.
Every idea we've had so far changes that in some way; variously in
major or minor ways, but changed nonetheless.

Even the approach of deferring checks to allow them to be applied in a
batch mean we might change the way applications execute in detail.
However, since the only possible change there would be to decrease the
number of self-induced failures that seems OK.

So the question is how much change do we want to introduce? I'll guess
"not much", rather than "lots" or "none".

Proposal: Have a WHEN clause that accumulates values to be checked in
a hash table up to work_mem in size, allowing us to eliminate the most
common duplicates (just not *all* duplicates). If the value isn't a
duplicate (or at least the first seen tuple with that value), we will
queue up a check for later. That approach gives us *exactly* what we
have now and works with the two common cases: i) few, mostly
duplicated values, ii) many values, but clustered together. Then apply
changes in batches at end of statement.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] add --throttle to pgbench (submission 3)
Next
From: Dean Rasheed
Date:
Subject: WITH CHECK OPTION for auto-updatable views