Thread: Order of enforcement of CHECK constraints?
My Salesforce colleagues noticed some tests flapping as a result of table CHECK constraints not always being enforced in the same order; ie, if a tuple insertion/update violates more than one CHECK constraint, it's not deterministic which one is reported. This is evidently because relcache.c's CheckConstraintFetch() just happily loads up the constraints in whatever order it happens to find them in pg_constraint. There's at least one regression test case where this can happen, so we've been lucky so far that this hasn't caused buildfarm noise. We could fix it by, say, having CheckConstraintFetch() sort the constraints by name after loading them. In principle the same problem could occur for domain CHECK constraints, though the odds of multiple CHECKs failing are probably a lot lower. Do people think this is worth fixing? regards, tom lane
On Fri, Mar 20, 2015 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We could fix it by, say, having CheckConstraintFetch() sort the > constraints by name after loading them. What not by OID, as with indexes? Are you suggesting that this would become documented behavior? -- Peter Geoghegan
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > My Salesforce colleagues noticed some tests flapping as a result of table > CHECK constraints not always being enforced in the same order; ie, if a > tuple insertion/update violates more than one CHECK constraint, it's not > deterministic which one is reported. This is evidently because > relcache.c's CheckConstraintFetch() just happily loads up the constraints > in whatever order it happens to find them in pg_constraint. > > There's at least one regression test case where this can happen, so we've > been lucky so far that this hasn't caused buildfarm noise. > > We could fix it by, say, having CheckConstraintFetch() sort the > constraints by name after loading them. > > In principle the same problem could occur for domain CHECK constraints, > though the odds of multiple CHECKs failing are probably a lot lower. > > Do people think this is worth fixing? Yes... I had thought they were sorted and enforced in alphabetical order similar to how triggers are fired. Having non-deterministic check constraint firing seems bad to me. Thanks, Stephen
<div dir="ltr"><br />On Fri, Mar 20, 2015 at 4:19 PM, Peter Geoghegan <<a href="mailto:pg@heroku.com">pg@heroku.com</a>>wrote:<br />><br />> On Fri, Mar 20, 2015 at 12:15 PM, Tom Lane <<ahref="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br />> > We could fix it by, say, having CheckConstraintFetch()sort the<br />> > constraints by name after loading them.<br />><br />><br />> Whatnot by OID, as with indexes? Are you suggesting that this would<br />> become documented behavior?<br />><br /> <br/>I think they should be executed in alphabetical order like triggers.<br /><br />Regards,<br /><br />--<br />Fabríziode Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div>
Fabrízio de Royes Mello <fabriziomello@gmail.com> writes: > On Fri, Mar 20, 2015 at 4:19 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Fri, Mar 20, 2015 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> We could fix it by, say, having CheckConstraintFetch() sort the >>> constraints by name after loading them. >> What not by OID, as with indexes? Are you suggesting that this would >> become documented behavior? > I think they should be executed in alphabetical order like triggers. Yeah. We already have a comparable, and documented, behavior for triggers, so if we're going to do anything about this I'd vote for sorting by name (or more specifically, by strcmp()). regards, tom lane
On 21/03/15 08:15, Tom Lane wrote: > My Salesforce colleagues noticed some tests flapping as a result of table > CHECK constraints not always being enforced in the same order; ie, if a > tuple insertion/update violates more than one CHECK constraint, it's not > deterministic which one is reported. This is evidently because > relcache.c's CheckConstraintFetch() just happily loads up the constraints > in whatever order it happens to find them in pg_constraint. > > There's at least one regression test case where this can happen, so we've > been lucky so far that this hasn't caused buildfarm noise. > > We could fix it by, say, having CheckConstraintFetch() sort the > constraints by name after loading them. > > In principle the same problem could occur for domain CHECK constraints, > though the odds of multiple CHECKs failing are probably a lot lower. > > Do people think this is worth fixing? > > regards, tom lane > > I think that this is a good idea, I would have implicitly assumed that it was deterministic. Additionally, people could then name CHECK constraints in a way to get whatever order they wanted. The documentation of CREATE TRIGGER says (reading Fabrizio's post inspired me to look it up): "If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name." So I would have implicitly assumed the same for CHECK constraints, had I recently read that. So I think the current situation is a violation of POLA. Cheers, Gavin
On 3/20/15 3:37 PM, Tom Lane wrote: > Fabrízio de Royes Mello <fabriziomello@gmail.com> writes: >> On Fri, Mar 20, 2015 at 4:19 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> On Fri, Mar 20, 2015 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> We could fix it by, say, having CheckConstraintFetch() sort the >>>> constraints by name after loading them. > >>> What not by OID, as with indexes? Are you suggesting that this would >>> become documented behavior? > >> I think they should be executed in alphabetical order like triggers. > > Yeah. We already have a comparable, and documented, behavior for > triggers, so if we're going to do anything about this I'd vote for > sorting by name (or more specifically, by strcmp()). > > regards, tom lane +1 for strcmp() ordering. Not only is this logical and consistent with the way triggers are fired, names can be manipulated by the user to force order when desired. Not sure if that's as important for check constraints as it is for triggers but it might be useful, even if only for things like unit tests. -- - David Steele david@pgmasters.net
On Sunday, March 22, 2015, David Steele <david@pgmasters.net> wrote:
On 3/20/15 3:37 PM, Tom Lane wrote:
> Fabrízio de Royes Mello <fabriziomello@gmail.com> writes:
>> On Fri, Mar 20, 2015 at 4:19 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Fri, Mar 20, 2015 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> We could fix it by, say, having CheckConstraintFetch() sort the
>>>> constraints by name after loading them.
>
>>> What not by OID, as with indexes? Are you suggesting that this would
>>> become documented behavior?
>
>> I think they should be executed in alphabetical order like triggers.
>
> Yeah. We already have a comparable, and documented, behavior for
> triggers, so if we're going to do anything about this I'd vote for
> sorting by name (or more specifically, by strcmp()).
>
> regards, tom lane
+1 for strcmp() ordering. Not only is this logical and consistent with
the way triggers are fired, names can be manipulated by the user to
force order when desired. Not sure if that's as important for check
constraints as it is for triggers but it might be useful, even if only
for things like unit tests.
--
- David Steele
david@pgmasters.net
It would be nice to know that, at scale, the added comparisons are negligible since it almost all cases all checks are run and pass and the order is irrelevant. Would it be possible for all check constraints to always be run and the resultant error outputs stored in an array which could then be sorted before it is printed? You get better and stable output for errors while minimally impacting good inserts.
David J.
I might be only one objecting here but ...
--
On Sat, Mar 21, 2015 at 12:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My Salesforce colleagues noticed some tests flapping as a result of table
CHECK constraints not always being enforced in the same order; ie, if a
tuple insertion/update violates more than one CHECK constraint, it's not
deterministic which one is reported. This is evidently because
relcache.c's CheckConstraintFetch() just happily loads up the constraints
in whatever order it happens to find them in pg_constraint.
Why is it important to report in deterministic manner? If it really matters, we should probably report all the failing constraints. A comparable example would be compiler throwing errors.
There's at least one regression test case where this can happen, so we've
been lucky so far that this hasn't caused buildfarm noise.
We could fix it by, say, having CheckConstraintFetch() sort the
constraints by name after loading them.
In principle the same problem could occur for domain CHECK constraints,
though the odds of multiple CHECKs failing are probably a lot lower.
Do people think this is worth fixing?
Downthread, parallels are being drawn with triggers. The order in which triggers are fired matters since the triggers can affect each other's results but constraint don't do that. Do they? So, why should we waste some cycles in sorting the constraints?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > I might be only one objecting here but ... > On Sat, Mar 21, 2015 at 12:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My Salesforce colleagues noticed some tests flapping as a result of table >> CHECK constraints not always being enforced in the same order; ie, if a >> tuple insertion/update violates more than one CHECK constraint, it's not >> deterministic which one is reported. This is evidently because >> relcache.c's CheckConstraintFetch() just happily loads up the constraints >> in whatever order it happens to find them in pg_constraint. > Why is it important to report in deterministic manner? If nothing else, so as not to have regression-test failures. > If it really > matters, we should probably report all the failing constraints. That wouldn't in itself make the output deterministic (you'd still have to sort); and in any case that's not going to happen because it would require running each CHECK constraint in its own subtransaction. Catching errors that way is *expensive*. And there's been zero field demand for such a behavior, so I don't see us adding cycles for something no one's asked for. Sorting the check constraints during relcache load, on the other hand, is a negligible burden compared to the cost of reading pg_constraint. regards, tom lane
<div dir="ltr"><div class="gmail_extra"><br />On Fri, Mar 20, 2015 at 4:37 PM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br />><br />> Fabrízio de Royes Mello <<a href="mailto:fabriziomello@gmail.com">fabriziomello@gmail.com</a>>writes:<br />> > On Fri, Mar 20, 2015 at 4:19PM, Peter Geoghegan <<a href="mailto:pg@heroku.com">pg@heroku.com</a>> wrote:<br />> >> On Fri, Mar 20,2015 at 12:15 PM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br />> >>>We could fix it by, say, having CheckConstraintFetch() sort the<br />> >>> constraints by name afterloading them.<br />><br />> >> What not by OID, as with indexes? Are you suggesting that this would<br />>>> become documented behavior?<br />><br />> > I think they should be executed in alphabetical orderlike triggers.<br />><br />> Yeah. We already have a comparable, and documented, behavior for<br />> triggers,so if we're going to do anything about this I'd vote for<br />> sorting by name (or more specifically, by strcmp()).<br/>><br /><br /></div><div class="gmail_extra">Isn't better do this to read pg_constraint in name order?<br/><br />- conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,<br />+ conscan = systable_beginscan(conrel,ConstraintNameNspIndexId, true,<br /><br /></div><div class="gmail_extra">Regards,<br /></div><divclass="gmail_extra"><br />--<br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>>Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br />>> Blog: <a href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>
Fabrízio de Royes Mello <fabriziomello@gmail.com> writes: > On Fri, Mar 20, 2015 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> We could fix it by, say, having CheckConstraintFetch() sort the >>>> constraints by name after loading them. > Isn't better do this to read pg_constraint in name order? > - conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true, > + conscan = systable_beginscan(conrel, ConstraintNameNspIndexId, true, Surely not. That would end up having to read *all* of pg_constraint, not only the rows applicable to the current relation. We could get the index to do the work for us if we changed it from an index on conrelid to one on conrelid, conname. However, seeing that that would bloat the index by a factor of sixteen, it hardly sounds like a free fix either. I really think that a quick application of qsort is the best-performing way to do this. regards, tom lane
On Mon, Mar 23, 2015 at 7:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> I might be only one objecting here but ...
> On Sat, Mar 21, 2015 at 12:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My Salesforce colleagues noticed some tests flapping as a result of table
>> CHECK constraints not always being enforced in the same order; ie, if a
>> tuple insertion/update violates more than one CHECK constraint, it's not
>> deterministic which one is reported. This is evidently because
>> relcache.c's CheckConstraintFetch() just happily loads up the constraints
>> in whatever order it happens to find them in pg_constraint.
> Why is it important to report in deterministic manner?
If nothing else, so as not to have regression-test failures.
May be then we are writing tests which are not doing the intended thing. Unless the test is explicitly testing the behaviour in case of more than one failing constraint, it looks like the test is not doing the right thing. If it's testing that case explicitly, by imposing an order, we are still testing a single constraint failure case.
In case of million inserts or bulk load with constraints on, these few cycles spent in ordering the constraints might be problematic, unless the ordering happens only once for a series of inserts.
> If it really
> matters, we should probably report all the failing constraints.
That wouldn't in itself make the output deterministic (you'd still have to
sort); and in any case that's not going to happen because it would require
running each CHECK constraint in its own subtransaction. Catching errors
that way is *expensive*. And there's been zero field demand for such a
behavior, so I don't see us adding cycles for something no one's asked
for. Sorting the check constraints during relcache load, on the other
hand, is a negligible burden compared to the cost of reading
pg_constraint.
regards, tom lane
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 24 March 2015 at 17:51, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
In case of million inserts or bulk load with constraints on, these few cycles spent in ordering the constraints might be problematic, unless the ordering happens only once for a series of inserts.
As far as I can see this sort only occurs when a new relation is loaded into the relcache. Relations are cached in the relcache when they're accessed for the first time in each backend, and the cache will be invalidated when the current backend or another backend perform DDL on that relation.
Without relcache we'd need to lookup the system catalogs for just about every SQL statement... This would be very slow indeed!
Regards
David Rowley
On Mon, Mar 23, 2015 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Fabrízio de Royes Mello <fabriziomello@gmail.com> writes:
> > On Fri, Mar 20, 2015 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>> We could fix it by, say, having CheckConstraintFetch() sort the
> >>>> constraints by name after loading them.
>
> > Isn't better do this to read pg_constraint in name order?
>
> > - conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
> > + conscan = systable_beginscan(conrel, ConstraintNameNspIndexId, true,
>
> Surely not. That would end up having to read *all* of pg_constraint, not
> only the rows applicable to the current relation.
>
> We could get the index to do the work for us if we changed it from an
> index on conrelid to one on conrelid, conname. However, seeing that that
> would bloat the index by a factor of sixteen, it hardly sounds like a
> free fix either.
>
But in this way we can save some cicles as Ashutosh complains... or am I missing something?
> I really think that a quick application of qsort is the best-performing
> way to do this.
>
Something like the attached?
With current master:
fabrizio=# create table foo(a integer, b integer);
CREATE TABLE
fabrizio=# alter table foo add constraint aa check(a>0);
ALTER TABLE
fabrizio=# alter table foo add constraint bb check(b>0);
ALTER TABLE
fabrizio=# insert into foo values (0,0);
ERROR: new row for relation "foo" violates check constraint "bb"
DETAIL: Failing row contains (0, 0).
fabrizio=# create table foo(a integer, b integer);
CREATE TABLE
fabrizio=# alter table foo add constraint aa check(a>0);
ALTER TABLE
fabrizio=# alter table foo add constraint bb check(b>0);
ALTER TABLE
fabrizio=# insert into foo values (0,0);
ERROR: new row for relation "foo" violates check constraint "bb"
DETAIL: Failing row contains (0, 0).
With the attached patch:
fabrizio=# create table foo(a integer, b integer);
CREATE TABLE
fabrizio=# alter table foo add constraint aa check(a>0);
ALTER TABLE
fabrizio=# alter table foo add constraint bb check(b>0);
ALTER TABLE
fabrizio=# insert into foo values (0,0);
ERROR: new row for relation "foo" violates check constraint "aa"
DETAIL: Failing row contains (0, 0).
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment
On Tue, Mar 24, 2015 at 4:28 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2015 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Fabrízio de Royes Mello <fabriziomello@gmail.com> writes:
> > > On Fri, Mar 20, 2015 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >>>> We could fix it by, say, having CheckConstraintFetch() sort the
> > >>>> constraints by name after loading them.
> >
> > > Isn't better do this to read pg_constraint in name order?
> >
> > > - conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
> > > + conscan = systable_beginscan(conrel, ConstraintNameNspIndexId, true,
> >
> > Surely not. That would end up having to read *all* of pg_constraint, not
> > only the rows applicable to the current relation.
> >
>
> Yeah... you're correct... we need the oid in the index.
>
>
> > We could get the index to do the work for us if we changed it from an
> > index on conrelid to one on conrelid, conname. However, seeing that that
> > would bloat the index by a factor of sixteen, it hardly sounds like a
> > free fix either.
> >
>
> But in this way we can save some cicles as Ashutosh complains... or am I missing something?
>
>
> > I really think that a quick application of qsort is the best-performing
> > way to do this.
> >
>
> Something like the attached?
>
> With current master:
>
> fabrizio=# create table foo(a integer, b integer);
> CREATE TABLE
> fabrizio=# alter table foo add constraint aa check(a>0);
> ALTER TABLE
> fabrizio=# alter table foo add constraint bb check(b>0);
> ALTER TABLE
> fabrizio=# insert into foo values (0,0);
> ERROR: new row for relation "foo" violates check constraint "bb"
> DETAIL: Failing row contains (0, 0).
>
>
> With the attached patch:
>
> fabrizio=# create table foo(a integer, b integer);
> CREATE TABLE
> fabrizio=# alter table foo add constraint aa check(a>0);
> ALTER TABLE
> fabrizio=# alter table foo add constraint bb check(b>0);
> ALTER TABLE
> fabrizio=# insert into foo values (0,0);
> ERROR: new row for relation "foo" violates check constraint "aa"
> DETAIL: Failing row contains (0, 0).
>
Regards,
[1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e5f455f59fed0632371cddacddd79895b148dc07
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello