Thread: Tweaking Foreign Keys for larger tables

Tweaking Foreign Keys for larger tables

From

Simon Riggs

Date:

31 October 2014, 10:19:53

Various ways of tweaking Foreign Keys are suggested that are helpful
for larger databases.

* Deferrable Enforcement Timing Clause

* NOT DEFERRABLE - immediate execution
* DEFERRABLE
* INITIALLY IMMEDIATE - existing
* INITIALLY DEFERRED - existing
* INITIALLY NOT ENFORCED
FK created, but is not enforced during DML.
Will be/Must be marked NOT VALID when first created.
We can run a VALIDATE on the constraint at any time; if it passes the
check it is marked VALID and presumed to stay that way until the next
VALIDATE run. If it fails that check the FK would be marked as NOT
VALID, causing it to be no longer useful for optimization.
This allows FKs to be checked in bulk, rather than executing during
front-end code path, but yet still be there for optimization and
documentation (or visibility by tools etc).

There is no corresponding SET CONSTRAINTs call for the NOT ENFORCED
case, since that would require us to mark the constraint as not valid.

* Referenced Table actions

ON DELETE IGNORE
ON UPDATE IGNORE
If we allow this specification then the FK is "one way" - we check the
existence of a row in the referenced table, but there is no need for a
trigger on the referenced table to enforce an action on delete or
update, so no need to lock the referenced table when adding FKs.
This is very useful for very highly referenced tables.
Or for larger tables where we aren't planning on deleting or updating
the referenced table without also deleting or updating the referencing
table.

-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Tweaking Foreign Keys for larger tables

From

Peter Eisentraut

Date:

05 November 2014, 21:15:48

On 10/31/14 6:19 AM, Simon Riggs wrote:
> Various ways of tweaking Foreign Keys are suggested that are helpful
> for larger databases.

> *    INITIALLY NOT ENFORCED
> FK created, but is not enforced during DML.
> Will be/Must be marked NOT VALID when first created.
> We can run a VALIDATE on the constraint at any time; if it passes the
> check it is marked VALID and presumed to stay that way until the next
> VALIDATE run.

Does that mean the FK would become invalid after every DML operation,
until you expicitly revalidate it?  Is that practical?

> ON DELETE IGNORE
> ON UPDATE IGNORE
> If we allow this specification then the FK is "one way" - we check the
> existence of a row in the referenced table, but there is no need for a
> trigger on the referenced table to enforce an action on delete or
> update, so no need to lock the referenced table when adding FKs.

Are you worried about locking the table at all, or about having to lock
many rows?

Re: Tweaking Foreign Keys for larger tables

From

David G Johnston

Date:

06 November 2014, 08:11:31

Peter Eisentraut-2 wrote
> On 10/31/14 6:19 AM, Simon Riggs wrote:
>> Various ways of tweaking Foreign Keys are suggested that are helpful
>> for larger databases.
> 
>> *    INITIALLY NOT ENFORCED
>> FK created, but is not enforced during DML.
>> Will be/Must be marked NOT VALID when first created.
>> We can run a VALIDATE on the constraint at any time; if it passes the
>> check it is marked VALID and presumed to stay that way until the next
>> VALIDATE run.
> 
> Does that mean the FK would become invalid after every DML operation,
> until you expicitly revalidate it?  Is that practical?

My read is that it means that you can insert invalid data but the system
will pretend it is valid unless someone asks it for confirmation.  Upon
validation the FK will become invalid until the discrepancy is fixed and
another validation is performed.

>> ON DELETE IGNORE
>> ON UPDATE IGNORE
>> If we allow this specification then the FK is "one way" - we check the
>> existence of a row in the referenced table, but there is no need for a
>> trigger on the referenced table to enforce an action on delete or
>> update, so no need to lock the referenced table when adding FKs.
> 
> Are you worried about locking the table at all, or about having to lock
> many rows?

Wouldn't you at least need some kind of trigger to make the constraint
invalid as soon as any record is updated or removed from the referenced
table since in all likelihood the FK relationship has just been broken?

How expensive is validation going to be?  Especially, can validation occur
incrementally or does every record need to be validated each time?

Is this useful for master-detail setups, record-category, or both (others?)?

Will optimizations over invalid data give incorrect answers and in what
specific scenarios can that be expected?

I get the idea of having a system that let's you skip constant data
validation since in all likelihood once in production some scenarios would
be extremely resistant to the introduction of errors and can be dealt with
on-the-fly.  Trust only since the verify is expensive - but keep the option
open and the model faithfully represented.

I don't know that I would ever think to use this in my world since the
additional admin effort is obvious but the cost of the thing I'd be avoiding
is vague.  As it is now someone could simply drop their FK constraints and
run a validation query periodically to see if the data being inserted is
correct.  That doesn't allow for optimizations to take place though and so
this is an improvement; but the documentation and support aspects for a
keep/drop decision can be fleshed out first as that would be valuable in its
own right.  Then go about figuring out how to make a hybrid implementation
work.

Put another way: at what point does the cost of the FK constraint outweigh
the optimization savings?  While size is obvious both schema and read/write
patterns likely have a significant influence.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Tweaking-Foreign-Keys-for-larger-tables-tp5825162p5825891.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Tweaking Foreign Keys for larger tables

From

Simon Riggs

Date:

06 November 2014, 08:50:05

On 5 November 2014 21:15, Peter Eisentraut <peter_e@gmx.net> wrote:

>> ON DELETE IGNORE
>> ON UPDATE IGNORE
>> If we allow this specification then the FK is "one way" - we check the
>> existence of a row in the referenced table, but there is no need for a
>> trigger on the referenced table to enforce an action on delete or
>> update, so no need to lock the referenced table when adding FKs.
>
> Are you worried about locking the table at all, or about having to lock
> many rows?

This is useful for smaller, highly referenced tables that don't change
much, if ever.

In that case the need for correctness thru locking is minimal. If we
do lock it will cause very high multixact traffic, so that is worth
avoiding alone.

The main issue is referencing a table many times. Getting a full table
lock can halt all FK checks, so skipping adding the trigger altogether
avoids freezing up everything just for a trigger that doesn't actually
do much.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Tweaking Foreign Keys for larger tables

From

Simon Riggs

Date:

06 November 2014, 08:58:44

On 5 November 2014 21:15, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/31/14 6:19 AM, Simon Riggs wrote:
>> Various ways of tweaking Foreign Keys are suggested that are helpful
>> for larger databases.
>
>> *    INITIALLY NOT ENFORCED
>> FK created, but is not enforced during DML.
>> Will be/Must be marked NOT VALID when first created.
>> We can run a VALIDATE on the constraint at any time; if it passes the
>> check it is marked VALID and presumed to stay that way until the next
>> VALIDATE run.
>
> Does that mean the FK would become invalid after every DML operation,
> until you expicitly revalidate it?  Is that practical?

I think so.

We store the validity on the relcache entry.

Constraint would add a statement-level after trigger for insert,
update, delete and trigger, which issues a relcache invalidation if
the state was marked valid. Marked as deferrable initially deferred.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Tweaking Foreign Keys for larger tables

From

Jim Nasby

Date:

06 November 2014, 17:28:25

On 11/6/14, 2:58 AM, Simon Riggs wrote:
> On 5 November 2014 21:15, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 10/31/14 6:19 AM, Simon Riggs wrote:
>>> Various ways of tweaking Foreign Keys are suggested that are helpful
>>> for larger databases.
>>
>>> *    INITIALLY NOT ENFORCED
>>> FK created, but is not enforced during DML.
>>> Will be/Must be marked NOT VALID when first created.
>>> We can run a VALIDATE on the constraint at any time; if it passes the
>>> check it is marked VALID and presumed to stay that way until the next
>>> VALIDATE run.
>>
>> Does that mean the FK would become invalid after every DML operation,
>> until you expicitly revalidate it?  Is that practical?
>
> I think so.
>
> We store the validity on the relcache entry.
>
> Constraint would add a statement-level after trigger for insert,
> update, delete and trigger, which issues a relcache invalidation if
> the state was marked valid. Marked as deferrable initially deferred.

I don't think you'd need to invalidate on insert, or on an update that didn't touch a referenced key.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Tweaking Foreign Keys for larger tables

From

David G Johnston

Date:

06 November 2014, 17:49:42

On Thu, Nov 6, 2014 at 10:29 AM, Jim Nasby-5 [via PostgreSQL] <[hidden email]> wrote:

On 11/6/14, 2:58 AM, Simon Riggs wrote:

> On 5 November 2014 21:15, Peter Eisentraut <[hidden email]> wrote:
>> On 10/31/14 6:19 AM, Simon Riggs wrote:
>>> Various ways of tweaking Foreign Keys are suggested that are helpful
>>> for larger databases.
>>
>>> * INITIALLY NOT ENFORCED
>>> FK created, but is not enforced during DML.
>>> Will be/Must be marked NOT VALID when first created.
>>> We can run a VALIDATE on the constraint at any time; if it passes the
>>> check it is marked VALID and presumed to stay that way until the next
>>> VALIDATE run.
>>
>> Does that mean the FK would become invalid after every DML operation,
>> until you expicitly revalidate it? Is that practical?
>
> I think so.
>
> We store the validity on the relcache entry.
>
> Constraint would add a statement-level after trigger for insert,
> update, delete and trigger, which issues a relcache invalidation if
> the state was marked valid. Marked as deferrable initially deferred.
I don't think you'd need to invalidate on insert,

Why? Since the FK is not enforced there is no guarantee that what you just inserted is valid

or on an update that didn't touch a referenced key.

OK - but you would still need the trigger on the FK columns

DELETE is OK as well since you cannot invalidate the constraint by simply removing the referencing row.

David J.

View this message in context: Re: Tweaking Foreign Keys for larger tables
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Tweaking Foreign Keys for larger tables

From

Alvaro Herrera

Date:

06 November 2014, 20:55:58

Simon Riggs wrote:
> On 5 November 2014 21:15, Peter Eisentraut <peter_e@gmx.net> wrote:
> 
> >> ON DELETE IGNORE
> >> ON UPDATE IGNORE
> >> If we allow this specification then the FK is "one way" - we check the
> >> existence of a row in the referenced table, but there is no need for a
> >> trigger on the referenced table to enforce an action on delete or
> >> update, so no need to lock the referenced table when adding FKs.
> >
> > Are you worried about locking the table at all, or about having to lock
> > many rows?
> 
> This is useful for smaller, highly referenced tables that don't change
> much, if ever.
> 
> In that case the need for correctness thru locking is minimal. If we
> do lock it will cause very high multixact traffic, so that is worth
> avoiding alone.

This seems like a can of worms to me.  How about the ability to mark a
table READ ONLY, so that insert/update/delete operations on it raise an
error?  For such tables, you can just assume that tuples never go away,
which can help optimize some ri_triggers.c queries by doing plain
SELECT, not SELECT FOR KEY SHARE.

If you later need to add rows to the table, you set it READ WRITE, and
then ri_triggers.c automatically start using FOR KEY SHARE; add/modify
to your liking, then set READ ONLY again.  So you incur the cost of
tuple locking only while you have the table open for writes.

This way we don't get into the mess of reasoning about foreign keys that
might be violated some of the time.

There's a side effect of tables being READ ONLY which is that tuple
freezing can be optimized as well.  I vaguely recall we have discussed
this.  It's something like SET READ ONLY, then freeze it, which sets its
relfrozenxid to 0 or maybe FrozenXid; vacuum knows it can ignore the
table for freezing purposes.  When SET READ WRITE, relfrozenxid jumps to
RecentXmin.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Tweaking Foreign Keys for larger tables

From

Kevin Grittner

Date:

06 November 2014, 21:17:34

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Simon Riggs wrote:
>> On 5 November 2014 21:15, Peter Eisentraut <peter_e@gmx.net> wrote:
>>
>>>> ON DELETE IGNORE
>>>> ON UPDATE IGNORE
>>>> If we allow this specification then the FK is "one way" - we check the
>>>> existence of a row in the referenced table, but there is no need for a
>>>> trigger on the referenced table to enforce an action on delete or
>>>> update, so no need to lock the referenced table when adding FKs.
>>>
>>> Are you worried about locking the table at all, or about having to lock
>>> many rows?
>>
>> This is useful for smaller, highly referenced tables that don't change
>> much, if ever.
>>
>> In that case the need for correctness thru locking is minimal. If we
>> do lock it will cause very high multixact traffic, so that is worth
>> avoiding alone.
>
> This seems like a can of worms to me.  How about the ability to mark a
> table READ ONLY, so that insert/update/delete operations on it raise an
> error?  For such tables, you can just assume that tuples never go away,
> which can help optimize some ri_triggers.c queries by doing plain
> SELECT, not SELECT FOR KEY SHARE.
>
> If you later need to add rows to the table, you set it READ WRITE, and
> then ri_triggers.c automatically start using FOR KEY SHARE; add/modify
> to your liking, then set READ ONLY again.  So you incur the cost of
> tuple locking only while you have the table open for writes.
>
> This way we don't get into the mess of reasoning about foreign keys that
> might be violated some of the time.

On its face, that sounds more promising to me.

> There's a side effect of tables being READ ONLY which is that tuple
> freezing can be optimized as well.  I vaguely recall we have discussed
> this.  It's something like SET READ ONLY, then freeze it, which sets its
> relfrozenxid to 0 or maybe FrozenXid; vacuum knows it can ignore the
> table for freezing purposes.  When SET READ WRITE, relfrozenxid jumps to
> RecentXmin.

It could also allow a (potentially large) optimization to 
serializable transactions -- there is no need to take any predicate 
locks on a table or its indexes if it is read only.  To safely 
transition a table from read only to read write you would need at 
least two flags (similar in some ways to indisvalid and indisready) 
-- one to say whether any of these read only optimizations are 
allowed, and another flag that would only be set after all 
transactions which might have seen the read only state have 
completed which actually allows writes.  Or that could be done with 
a "char" column with three states.  So on transition to read only 
you would flag it as non-writable, and after all transactions which 
might have seen it in a writable state complete you flag it as 
allowing read only optimizations.  To transition to read write you 
disable the optimizations first and wait before actually flagging 
it as read write.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Tweaking Foreign Keys for larger tables

From

Jim Nasby

Date:

06 November 2014, 22:08:42

On 11/6/14, 11:49 AM, David G Johnston wrote:
>     > Constraint would add a statement-level after trigger for insert,
>     > update, delete and trigger, which issues a relcache invalidation if
>     > the state was marked valid. Marked as deferrable initially deferred.
>     I don't think you'd need to invalidate on insert,
>
>
> Why?  Since the FK is not enforced there is no guarantee that what you just inserted is valid

I'm talking about the referenced (aka 'parent') table, not the referring table.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Tweaking Foreign Keys for larger tables

From

Simon Riggs

Date:

07 November 2014, 07:15:26

On 6 November 2014 20:47, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Simon Riggs wrote:
...
>> In that case the need for correctness thru locking is minimal. If we
>> do lock it will cause very high multixact traffic, so that is worth
>> avoiding alone.
>
> This seems like a can of worms to me.  How about the ability to mark a
> table READ ONLY, so that insert/update/delete operations on it raise an
> error?  For such tables, you can just assume that tuples never go away,
> which can help optimize some ri_triggers.c queries by doing plain
> SELECT, not SELECT FOR KEY SHARE.
>
> If you later need to add rows to the table, you set it READ WRITE, and
> then ri_triggers.c automatically start using FOR KEY SHARE; add/modify
> to your liking, then set READ ONLY again.  So you incur the cost of
> tuple locking only while you have the table open for writes.

How about we set lock level on each Foreign Key like this

[USING LOCK [lock level]]

level is one of
KEY - [FOR KEY SHARE] - default
ROW -  [FOR SHARE]
TABLE SHARE - [ ]
TABLE EXCLUSIVE - [FOR TABLE EXCLUSIVE]

which introduces these new level descriptions
TABLE SHARE - is default behavior of SELECT
TABLE EXCLUSIVE - we lock the referenced table against all writes -
this allows the table to be fully cached for use in speeding up checks            [FOR TABLE EXCLUSIVE] - uses
ShareRowExclusiveLock

The last level is like "Read Only tables" apart from the fact that
they can be written to when needed, but we optimize things on the
assumption that such writes are very rare.

We could also add Read Only tables as well, but I don't see as much
use for them. Sounds like you'd spend a lot of time with ALTER TABLE
as you turn it on and off. I'd like to be able to do that
automatically as needed.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Tweaking Foreign Keys for larger tables

From

Andreas Karlsson

Date:

07 November 2014, 10:47:19

On 11/07/2014 08:15 AM, Simon Riggs wrote:
> How about we set lock level on each Foreign Key like this
>
> [USING LOCK [lock level]]
>
> level is one of
> KEY - [FOR KEY SHARE] - default
> ROW -  [FOR SHARE]
> TABLE SHARE - [ ]
> TABLE EXCLUSIVE - [FOR TABLE EXCLUSIVE]

I like the idea and thinks it solves the problem in a pretty neat way, 
but I do not see any practical need for other levels than the highest 
and the lowest of those.

Andreas