Thread: Implementing SQL ASSERTION

Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi all,

I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?

I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for
DatabaseProfessionals” by Toon Koppelaars and Lex de Haan.  I’ve gotten as far as execution model 3 and am now looking
atderiving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint
Maintenance”,Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to
tryand implement those, too. 

If there are other people working on this stuff it would be great to collaborate.

Regards.
-Joe


Re: Implementing SQL ASSERTION

From
Robert Haas
Date:
On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
<joe-postgresql.com@elusive.cx> wrote:
> I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
>
> I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for
DatabaseProfessionals” by Toon Koppelaars and Lex de Haan.  I’ve gotten as far as execution model 3 and am now looking
atderiving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint
Maintenance”,Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to
tryand implement those, too. 
>
> If there are other people working on this stuff it would be great to collaborate.

I don't know of anyone working on this.  It sounds very difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
> On 1 May 2015, at 19:51, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
> <joe-postgresql.com@elusive.cx> wrote:
>> I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
>>
>> I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for
DatabaseProfessionals” by Toon Koppelaars and Lex de Haan.  I’ve gotten as far as execution model 3 and am now looking
atderiving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint
Maintenance”,Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to
tryand implement those, too. 
>>
>> If there are other people working on this stuff it would be great to collaborate.
>
> I don't know of anyone working on this.  It sounds very difficult.

The book I mention details a series of execution models, where each successive model aims to validate the assertion in
amore efficient manner than the last. This is achieved by performing static analysis of the assertion's expression to
determineunder what circumstances the assertion need be (re)checked. Briefly: 

EM1: after all DML statements;
EM2: only after DML statements involving tables mentioned in the assertion expression;
EM3: only after DML statements involving the columns mentioned in the assertion expression;
EM4: only after DML statements involving the columns, plus if the statement has a “polarity” that may affect the
assertionexpression. 

“Polarity" here means that one is able to (statically) determine if only INSERTS and not DELETES can affect an
expressionor vice-versa. 

EMs 5 and 6 are further enhancements that make use of querying the “transition effect” data of what actually changed in
astatement, to determine if the assertion expression need be validated. I’ve not done as much reading around this topic
yetso am concentrating on EMs 1-4. 

I agree it is a difficult problem but there are a fair number of published academic papers relating to this topic. The
AM4DPbook draws a lot of this research together and presents the executions models. 

I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.

Regards.
-Joe




Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Sat, May 02, 2015 at 10:42:24PM +0100, Joe Wildish wrote:
> 
> > On 1 May 2015, at 19:51, Robert Haas <robertmhaas@gmail.com> wrote:
> > 
> > On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
> > <joe-postgresql.com@elusive.cx> wrote:
> >> I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
> >> 
> >> I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for
DatabaseProfessionals” by Toon Koppelaars and Lex de Haan.  I’ve gotten as far as execution model 3 and am now looking
atderiving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint
Maintenance”,Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to
tryand implement those, too.
 
> >> 
> >> If there are other people working on this stuff it would be great to collaborate.
> > 
> > I don't know of anyone working on this.  It sounds very difficult.
> 
> The book I mention details a series of execution models, where each successive model aims to validate the assertion
ina more efficient manner than the last. This is achieved by performing static analysis of the assertion's expression
todetermine under what circumstances the assertion need be (re)checked. Briefly:
 
> 
> EM1: after all DML statements;
> EM2: only after DML statements involving tables mentioned in the assertion expression;
> EM3: only after DML statements involving the columns mentioned in the assertion expression;
> EM4: only after DML statements involving the columns, plus if the statement has a “polarity” that may affect the
assertionexpression.
 
> 
> “Polarity" here means that one is able to (statically) determine if only INSERTS and not DELETES can affect an
expressionor vice-versa.
 
> 
> EMs 5 and 6 are further enhancements that make use of querying the “transition effect” data of what actually changed
ina statement, to determine if the assertion expression need be validated. I’ve not done as much reading around this
topicyet so am concentrating on EMs 1-4.
 
> 
> I agree it is a difficult problem but there are a fair number of published academic papers relating to this topic.
TheAM4DP book draws a lot of this research together and presents the executions models.
 
> 
> I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.

I suspect that you would get a lot further with a PoC patch including
the needed documentation.  Remember to include how this would work at
all the transaction isolation levels and combinations of same that we
support.  Recall also to include the lock strength needed.  Just about
anything can be done with a database-wide lock :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
> On 3 May 2015, at 02:42, David Fetter <david@fetter.org> wrote:
>
> On Sat, May 02, 2015 at 10:42:24PM +0100, Joe Wildish wrote:
>>
>> I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.
>
> I suspect that you would get a lot further with a PoC patch including
> the needed documentation.  Remember to include how this would work at
> all the transaction isolation levels and combinations of same that we
> support.  Recall also to include the lock strength needed.  Just about
> anything can be done with a database-wide lock :)

Thanks David. I’m obviously new here so I not that familiar with how one starts contributing.

Once I get to a decent level with the EM4 PoC I will post the details to this list.  The general idea is that upon
assertioncreation, the expression is analysed to determine when it needs to be validated — corresponding internal
"afterstatement” triggers are then created.  There will definitely need to be some serialisation take place on the
basisof when an assertion has been validated, but I’ve not got that far yet.  I’ll be sure to include the details when
Ipost though. 

Regards.
-Joe





Re: Implementing SQL ASSERTION

From
Peter Eisentraut
Date:
On 4/30/15 6:36 PM, Joe Wildish wrote:
> I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?

I was the last one, probably:
<http://www.postgresql.org/message-id/1384486216.5008.17.camel@vanquo.pezone.net>.I intend to pick up that work
sometime,but feel free to review the
 
thread for a start.  The main question was how to manage transaction
isolation.





Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hackers,

Attached is a WIP patch for SQL assertion. I am posting it for anyone who might be interested in seeing it, for
comments/feedback,and to see if others are keen to collaborate on taking it further. It is not near production-ready
(seethoughts on that below). 

The patch builds on the work posted by Peter back in 2013. I've taken his code and updated it to conform to some
generalchanges made to the codebase since then. The bulk of the new work I have done is around when an assertion needs
tobe checked. Essentially it is an implementation of the algorithm described by Ceri & Widom in "Deriving Production
Rulesfor Constraint Maintenance” — http://infolab.stanford.edu/pub/papers/constraint-maintenance.ps 

The general idea is to traverse the expression tree and derive the set of potentially invalidating operations. These
operationsare used to determine when the constraint trigger fires and causes a re-check. The detail is in the paper but
someexamples are: 

* insertion into the subject of an exists cannot be invalidating;
* deletion from the subject of a not exists cannot be invalidating;
* update of columns in the target list of an exists cannot be invalidating;
* certain combinations of aggregates with comparison operations cannot be invalidating.

As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a
deleteor an update but can be invalidated by an insert. 

I have implemented most of the optimisations mentioned in the paper. There are one or two that I am unsure about,
specificallyhow to deal with set-operations that are the subject of an exists. According to the paper, these are
optimisablewhen they're the subject of an exists, but I think it is only applicable for union and not intersect or
except,so I have skipped that particular optimisation for the time being. 

The algorithm works under the assumption that when a recheck occurs the previous check result was true (the research
reportby Ceri & Widom does acknolwedge this assumption). However, unfortunately the SQL specification requires that
bothtrue and unknown be valid results for an assertion's check expression. This doesn't play too well with the
algorithmso for the time being I have disallowed null. I think the solution here may be that when a null result for a
checkoccurs, the assertion is changed to trigger on all operations against the involved tables; once it returns to
true,the triggers can be returned to fire only on the derived invalidating operations. More thought required though.
(note:having just written this paragraph, I've realised I can't right now think of a concrete example illustrating the
point,so it may be that I'm wrong on this). 

The paper does mention a set of optimisations that I have not yet attempted to implement. These are essentially the
techniqueof evaluating the expression against the deltas of a change rather than the full tables. Clearly there is a
largeoverlap with incremental maintainence of views and actually the two authors of the paper have a similiarly named
papercalled "Deriving Production Rules for Incremental View Maintanence". Although I have yet to finish reviewing all
theliterature on the subject, I suspect that realistically for this to make it into production, we'd need some
implementationof these techniques to make the performance palatable. 

Cheers,
-Joe


Attachment

Re: Implementing SQL ASSERTION

From
Fabien COELHO
Date:
Hello Joe,

Just a reaction to the example, which is maybe addressed in the patch 
which I have not investigated.

> * certain combinations of aggregates with comparison operations cannot 
> be invalidating.
>
> As an example of the last point, the expression "CHECK (10 > (SELECT 
> COUNT(*) FROM t))" cannot be invalidated by a delete or an update but 
> can be invalidated by an insert.

I'm wondering about the effect of MVVC on this: if the check is performed 
when the INSERT is done, concurrent inserting transactions would count the 
current status which would be ok, but on commit all concurrent inserts 
would be there and the count could not be ok anymore?

Maybe if the check was deferred, but this is not currently possible with 
pg (eg the select can simply be put in a function), and I there might be 
race conditions. ISTM that such a check would imply non trivial locking to 
be okay, it is not just a matter of deciding whether to invoke the check 
or not.

-- 
Fabien.


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi Fabien,

* certain combinations of aggregates with comparison operations cannot be invalidating.

As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a delete or an update but can be invalidated by an insert.

I'm wondering about the effect of MVVC on this: if the check is performed when the INSERT is done, concurrent inserting transactions would count the current status which would be ok, but on commit all concurrent inserts would be there and the count could not be ok anymore?

Yes, there was quite a bit of discussion in the original thread about concurrency. See here:


The patch doesn’t attempt to address concurrency (beyond the obvious benefit of reducing the circumstances under which the assertion is checked). I am working under the assumption that we will find some acceptable way for that to be resolved :-) And at the moment, working in serialisable mode addresses this issue. I think that is suggested in the thread actually (essentially, if you want to use assertions, you require that transactions be performed at serialisable isolation level). 

Maybe if the check was deferred, but this is not currently possible with pg (eg the select can simply be put in a function), and I there might be race conditions. ISTM that such a check would imply non trivial locking to be okay, it is not just a matter of deciding whether to invoke the check or not.

I traverse into SQL functions so that the analysis can capture invalidating operations from the expression inside the function. Only internal and SQL functions are considered legal. Other languages are rejected.

-Joe


Re: Implementing SQL ASSERTION

From
Fabien COELHO
Date:
>> I'm wondering about the effect of MVVC on this: if the check is 
>> performed when the INSERT is done, concurrent inserting transactions 
>> would count the current status which would be ok, but on commit all 
>> concurrent inserts would be there and the count could not be ok 
>> anymore?

> The patch doesn’t attempt to address concurrency (beyond the obvious 
> benefit of reducing the circumstances under which the assertion is 
> checked). I am working under the assumption that we will find some 
> acceptable way for that to be resolved :-) And at the moment, working in 
> serialisable mode addresses this issue. I think that is suggested in the 
> thread actually (essentially, if you want to use assertions, you require 
> that transactions be performed at serialisable isolation level).

Thanks for the pointers. The "serializable" isolation level restriction 
sounds reasonnable.

-- 
Fabien.

Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Mon, Jan 15, 2018 at 03:40:57PM +0100, Fabien COELHO wrote:
> 
> >>I'm wondering about the effect of MVVC on this: if the check is
> >>performed when the INSERT is done, concurrent inserting transactions
> >>would count the current status which would be ok, but on commit all
> >>concurrent inserts would be there and the count could not be ok anymore?
> 
> >The patch doesn’t attempt to address concurrency (beyond the obvious
> >benefit of reducing the circumstances under which the assertion is
> >checked). I am working under the assumption that we will find some
> >acceptable way for that to be resolved :-) And at the moment, working in
> >serialisable mode addresses this issue. I think that is suggested in the
> >thread actually (essentially, if you want to use assertions, you require
> >that transactions be performed at serialisable isolation level).
> 
> Thanks for the pointers. The "serializable" isolation level restriction
> sounds reasonnable.

It sounds reasonable enough that I'd like to make a couple of Modest
Proposals™, to wit:

- We follow the SQL standard and make SERIALIZABLE the default
  transaction isolation level, and

- We disallow writes at isolation levels other than SERIALIZABLE when
  any ASSERTION could be in play.

That latter could range in implementation from crashingly unsubtle to
very precise.  

Crashingly Unsubtle:

    Disallow writes at any isolation level other than SERIALIZABLE.

Very Precise:

    Disallow writes at any other isolation level when the ASSERTION
    could come into play using the same machinery that enforces the
    ASSERTION in the first place.

What say?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi David,

On 15 Jan 2018, at 16:35, David Fetter <david@fetter.org> wrote:

It sounds reasonable enough that I'd like to make a couple of Modest
Proposals™, to wit:

- We follow the SQL standard and make SERIALIZABLE the default
 transaction isolation level, and

- We disallow writes at isolation levels other than SERIALIZABLE when
 any ASSERTION could be in play.

Certainly it would be easy to put a test into the assertion check function to require the isolation level be serialisable. I didn’t realise that that was also the default level as per the standard. That need not necessarily be changed, of course; it would be obvious to the user that it was a requirement as the creation of an assertion would fail without it, as would any subsequent attempts to modify the involved tables.

-Joe

Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Mon, Jan 15, 2018 at 09:14:02PM +0000, Joe Wildish wrote:
> Hi David,
> 
> > On 15 Jan 2018, at 16:35, David Fetter <david@fetter.org> wrote:
> > 
> > It sounds reasonable enough that I'd like to make a couple of Modest
> > Proposals™, to wit:
> > 
> > - We follow the SQL standard and make SERIALIZABLE the default
> >  transaction isolation level, and
> > 
> > - We disallow writes at isolation levels other than SERIALIZABLE when
> >  any ASSERTION could be in play.
> 
> Certainly it would be easy to put a test into the assertion check
> function to require the isolation level be serialisable. I didn’t
> realise that that was also the default level as per the standard.
> That need not necessarily be changed, of course; it would be obvious
> to the user that it was a requirement as the creation of an
> assertion would fail without it, as would any subsequent attempts to
> modify the involved tables.

This patch no longer applies.  Any chance of a rebase?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi David,

>
> This patch no longer applies.  Any chance of a rebase?
>



Of course. I’ll look at it this weekend,

Cheers,
-Joe



Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Thu, Mar 08, 2018 at 09:11:58PM +0000, Joe Wildish wrote:
> Hi David,
> 
> > 
> > This patch no longer applies.  Any chance of a rebase?
> 
> Of course. I’ll look at it this weekend,

Much appreciate it!

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Robert Haas
Date:
On Mon, Jan 15, 2018 at 11:35 AM, David Fetter <david@fetter.org> wrote:
> - We follow the SQL standard and make SERIALIZABLE the default
>   transaction isolation level, and

The consequences of such a decision would include:

- pgbench -S would run up to 10x slower, at least if these old
benchmark results are still valid:

https://www.postgresql.org/message-id/CA+TgmoZog1wFbyrqzJUkiLSXw5sDUjJGUeY0c2BqSG-tciSB7w@mail.gmail.com

- pgbench without -S would fail outright, because it doesn't have
provision to retry failed transactions.

https://commitfest.postgresql.org/16/1419/

- Many user applications would probably also experience similar difficulties.

- Parallel query would no longer work by default, unless this patch
gets committed:

https://commitfest.postgresql.org/17/1004/

I think a good deal of work to improve the performance of serializable
would need to be done before we could even think about making it the
default -- and even then, the fact that it really requires the
application to be retry-capable seems like a pretty major obstacle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Implementing SQL ASSERTION

From
Thomas Munro
Date:
On Sat, Mar 10, 2018 at 6:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 15, 2018 at 11:35 AM, David Fetter <david@fetter.org> wrote:
>> - We follow the SQL standard and make SERIALIZABLE the default
>>   transaction isolation level, and
>
> The consequences of such a decision would include:
>
> - pgbench -S would run up to 10x slower, at least if these old
> benchmark results are still valid:
>
> https://www.postgresql.org/message-id/CA+TgmoZog1wFbyrqzJUkiLSXw5sDUjJGUeY0c2BqSG-tciSB7w@mail.gmail.com
>
> - pgbench without -S would fail outright, because it doesn't have
> provision to retry failed transactions.
>
> https://commitfest.postgresql.org/16/1419/
>
> - Many user applications would probably also experience similar difficulties.
>
> - Parallel query would no longer work by default, unless this patch
> gets committed:
>
> https://commitfest.postgresql.org/17/1004/
>
> I think a good deal of work to improve the performance of serializable
> would need to be done before we could even think about making it the
> default -- and even then, the fact that it really requires the
> application to be retry-capable seems like a pretty major obstacle.

Also:

- It's not available on hot standbys.  Experimental patches have been
developed based on the read only safe snapshot concept, but some
tricky problems remain unsolved.

- Performance is terrible (conflicts are maximised) if you use any
index type except btree, unless some of these get committed:

https://commitfest.postgresql.org/17/1172/
https://commitfest.postgresql.org/17/1183/
https://commitfest.postgresql.org/17/1466/

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
>
>>
>> This patch no longer applies.  Any chance of a rebase?
>>
>


Attached is a rebased version of this patch. It takes into account the ACL checking changes and a few other minor
amendments.

Cheers,
-Joe




Attachment

Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Sun, Mar 18, 2018 at 12:29:50PM +0000, Joe Wildish wrote:
> > 
> >> 
> >> This patch no longer applies.  Any chance of a rebase?
> >> 
> > 
> 
> 
> Attached is a rebased version of this patch. It takes into account the ACL checking changes and a few other minor
amendments.

Thanks!

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Sun, Mar 18, 2018 at 12:29:50PM +0000, Joe Wildish wrote:
> > 
> >> 
> >> This patch no longer applies.  Any chance of a rebase?
> 
> Attached is a rebased version of this patch. It takes into account
> the ACL checking changes and a few other minor amendments.

Sorry to bother you again, but this now doesn't compile atop master.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
On 28 Mar 2018, at 16:13, David Fetter <david@fetter.org> wrote:
> 
> Sorry to bother you again, but this now doesn't compile atop master.

Attached is a rebased patch for the prototype.

Cheers,
-Joe





Attachment

Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Sun, Apr 29, 2018 at 07:18:00PM +0100, Joe Wildish wrote:
> On 28 Mar 2018, at 16:13, David Fetter <david@fetter.org> wrote:
> > 
> > Sorry to bother you again, but this now doesn't compile atop master.
> 
> Attached is a rebased patch for the prototype.

Thanks!

This is great timing for the 12 cycle :)

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Peter Eisentraut
Date:
On 29/04/2018 20:18, Joe Wildish wrote:
> On 28 Mar 2018, at 16:13, David Fetter <david@fetter.org> wrote:
>>
>> Sorry to bother you again, but this now doesn't compile atop master.
> 
> Attached is a rebased patch for the prototype.

I took a look at this.

This has been lying around for a few months, so it will need to be
rebased again.  I applied this patch on top of
68e7e973d22274a089ce95200b3782f514f6d2f8, which was the HEAD around the
time this patch was created, and it applies cleanly there.

Please check you patch for whitespace errors:

warning: squelched 13 whitespace errors
warning: 18 lines add whitespace errors.

Also, reduce the amount of useless whitespace changes in the patch.

There are some compiler warnings:

constraint.c: In function 'CreateAssertion':
constraint.c:1211:2: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]

constraint.c: In function 'oppositeDmlOp':
constraint.c:458:1: error: control reaches end of non-void function
[-Werror=return-type]

The version check in psql's describeAssertions() needs to be updated.
Also, you should use formatPGVersionNumber() to cope with two-part and
one-part version numbers.

All this new code in constraint.c that checks the assertion expression
needs more comments and documentation.

Stuff like this isn't going to work:

static int
funcMaskForFuncOid(Oid funcOid)
{
    char *name = get_func_name(funcOid);

    if (name == NULL)
        return OTHER_FUNC;
    else if (strncmp(name, "min", strlen("min")) == 0)
        return MIN_AGG_FUNC;
    else if (strncmp(name, "max", strlen("max")) == 0)
        return MAX_AGG_FUNC;

You can assume from the name of a function what it's going to do.
Solving this properly might be hard.

The regression test crashes for me around

    frame #4: 0x000000010d3a4cdc postgres`castNodeImpl(type=T_SubLink,
ptr=0x00007ff27006d230) at nodes.h:582
    frame #5: 0x000000010d3a61c6
postgres`visitSubLink(node=0x00007ff270034040, info=0x00007ffee2a23930)
at constraint.c:843

This ought to be reproducible for you if you build with assertions.


My feeling is that if we want to move forward on this topic, we need to
solve the concurrency question first.  All these optimizations for when
we don't need to check the assertion are cool, but they are just
optimizations that we can apply later on, once we have solved the
critical problems.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi Peter,

> On 24 Sep 2018, at 15:06, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 29/04/2018 20:18, Joe Wildish wrote:
>>
>> Attached is a rebased patch for the prototype.
>
> I took a look at this.

Thank you for reviewing.

> This has been lying around for a few months, so it will need to be
> rebased again.
>
> 8< - - - snipped for brevity - - - 8<
>
> All this new code in constraint.c that checks the assertion expression
> needs more comments and documentation.

All agreed.  I’ll give the patch some TLC and get a new version that
addresses the above.

> Stuff like this isn't going to work:
>
> static int
> funcMaskForFuncOid(Oid funcOid)
> {
>    char *name = get_func_name(funcOid);
>
>    if (name == NULL)
>        return OTHER_FUNC;
>    else if (strncmp(name, "min", strlen("min")) == 0)
>        return MIN_AGG_FUNC;
>    else if (strncmp(name, "max", strlen("max")) == 0)
>        return MAX_AGG_FUNC;
>
> You can assume from the name of a function what it's going to do.
> Solving this properly might be hard.

Agreed. My assumption was that we would record in the data dictionary the
behaviour (or “polarity") of each aggregate function with respect to the
various operators. Column in pg_aggregate? I don’t know how we’d record it
exactly. A bitmask would be a possibility. Also, I don’t know what we’d do
with custom aggregate functions (or indeed custom operators). Allowing end
users to determine the value would potentially lead to assertion checks
being incorrectly skipped. Maybe we’d say that custom aggregates always
have a neutral polarity and are therefore not subject to this
optimisation.

> This ought to be reproducible for you if you build with assertions.

Yes. I shall correct this when I do the aforementioned rebase and
application of TLC.

> My feeling is that if we want to move forward on this topic, we need to
> solve the concurrency question first.  All these optimizations for when
> we don't need to check the assertion are cool, but they are just
> optimizations that we can apply later on, once we have solved the
> critical problems.

I obviously agree that the concurrency issue needs solving. But I don’t
see that at all as a separate matter from the algos. Far from being merely
optimisations, the research indicates we can go a lot further toward
reducing the need for rechecks and, therefore, reducing the chance of
concurrency conflicts from occurring in the first place. This is true
regardless of whatever mechanism we use to enforce correct behaviour under
concurrent modifications -- e.g. a lock on the ASSERTION object itself,
enforced use of SERIALIZABLE, etc.

By way of example (lifted directly from the AM4DP book):

CREATE TABLE employee (
  id INTEGER PRIMARY KEY,
  dept INTEGER NOT NULL,
  job TEXT NOT NULL
);

CREATE ASSERTION department_managers_need_administrators CHECK
  (NOT EXISTS
    (SELECT dept
       FROM employee a
      WHERE EXISTS (SELECT * FROM employee b
                     WHERE a.dept = b.dept
                       AND b.job IN ('Manager', 'Senior Manager'))
        AND NOT EXISTS (SELECT * FROM employee b
                         WHERE a.dept = b.dept
                           AND b.job = 'Administrator')));

The current implementation derives "DELETE(employee), INSERT(employee) and
UPDATE(employee.dept, employee.job)" as the set of invalidating operations
and triggers accordingly. However, in this case, we can supplement the
triggers by having them inspect the transition tables to see if the actual
data from the triggering DML statement could in fact affect the truth of
the expression: specifically, only do the recheck on DELETE of an
"Administrator", INSERT of a "Manager" or "Senior Manager", or UPDATE when
the new job is a "Manager" or "Senior Manager" or the old job was an
"Administrator".

Now, if this is a company with 10,000 employees, and would therefore
presumably only require a handful of managers, right? ;-), then the
potential for a concurrency conflict is massively reduced when compared to
rechecking every time the employee table is touched.

(This optimisation has some caveats and is reliant upon being able to
derive the key of an expression from the underlying base tables plus some
stuff about functional dependencies. I have started work on it but sadly
not had time to progress it in recent months).

Having said all that: there are obviously going to be some expressions
that cannot be proven to have no potential for invalidating the assertion
truth. I guess this is the prime concern from a concurrency PoV? Example:

CREATE TABLE t (
  b BOOLEAN NOT NULL,
  n INTEGER NOT NULL,
  PRIMARY KEY (b, n)
);

CREATE ASSERTION sum_per_b_less_than_10 CHECK
  (NOT EXISTS
    (SELECT FROM (SELECT b, SUM(n)
                    FROM t
                   GROUP BY b) AS v(b, sum_n)
      WHERE sum_n > 10));

Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". I guess the
interesting case, from a concurrency perspective, is how do we avoid an
INSERT WHERE b IS TRUE from blocking an INSERT WHERE B IS FALSE? I don’t
have an answer to that unfortunately. Although my understanding was that
SSI could help in these sorts of cases, but I really haven't read or
looked into the detail (yet). Thoughts?

-Joe




Re: Implementing SQL ASSERTION

From
Andrew Gierth
Date:
>>>>> "Joe" == Joe Wildish <joe-postgresql.org@elusive.cx> writes:

 Joe> Agreed. My assumption was that we would record in the data
 Joe> dictionary the behaviour (or “polarity") of each aggregate
 Joe> function with respect to the various operators. Column in
 Joe> pg_aggregate? I don’t know how we’d record it exactly.

I haven't looked at the background of this, but if what you want to know
is whether the aggregate function has the semantics of min() or max()
(and if so, which) then the place to look is pg_aggregate.aggsortop.

(For a given aggregate foo(x), the presence of an operator oid in
aggsortop means something like "foo(x) is equivalent to (select x from
... order by x using OP limit 1)", and the planner will replace the
aggregate by the applicable subquery if it thinks it'd be faster.)

As for operators, you can only make assumptions about their meaning if
the operator is a member of some opfamily that assigns it some
semantics. For example, the planner can assume that WHERE x=y AND x=1
implies that y=1 (assuming x and y are of appropriate types) not because
it assumes that "=" is the name of a transitive operator, but because
the operators actually selected for (x=1) and (x=y) are both "equality"
members of the same btree operator family. Likewise proving that (a>2)
implies (a>1) requires knowing that > is a btree comparison op.

--
Andrew (irc:RhodiumToad)


Re: Implementing SQL ASSERTION

From
Peter Eisentraut
Date:
On 25/09/2018 01:04, Joe Wildish wrote:
> Having said all that: there are obviously going to be some expressions
> that cannot be proven to have no potential for invalidating the assertion
> truth. I guess this is the prime concern from a concurrency PoV?

Before we spend more time on this, I think we need to have at least a
plan for that.  Perhaps we could should disallow cases that we can't
handle otherwise.  But even that would need some analysis of which
practical cases we can and cannot handle, how we could extend support in
the future, etc.

In the meantime, I have committed parts of your gram.y changes that seem
to come up every time someone dusts off an assertions patch.  Keep that
in mind when you rebase.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Implementing SQL ASSERTION

From
David Fetter
Date:
On Tue, Sep 25, 2018 at 12:04:12AM +0100, Joe Wildish wrote:
> Hi Peter,
> 
> > My feeling is that if we want to move forward on this topic, we need to
> > solve the concurrency question first.  All these optimizations for when
> > we don't need to check the assertion are cool, but they are just
> > optimizations that we can apply later on, once we have solved the
> > critical problems.
> 
> Having said all that: there are obviously going to be some expressions
> that cannot be proven to have no potential for invalidating the assertion
> truth. I guess this is the prime concern from a concurrency PoV? Example:
> 
> CREATE TABLE t (
>   b BOOLEAN NOT NULL,
>   n INTEGER NOT NULL,
>   PRIMARY KEY (b, n)
> );
> 
> CREATE ASSERTION sum_per_b_less_than_10 CHECK
>   (NOT EXISTS
>     (SELECT FROM (SELECT b, SUM(n)
>                     FROM t
>                    GROUP BY b) AS v(b, sum_n)
>       WHERE sum_n > 10));

> 
> Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)".

So would DELETE(t), assuming n can be negative.

Is there some interesting and fairly easily documented subset of
ASSERTIONs that wouldn't have the "can't prove" property?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
On 26 Sep 2018, at 12:36, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 25/09/2018 01:04, Joe Wildish wrote:
>> Having said all that: there are obviously going to be some expressions
>> that cannot be proven to have no potential for invalidating the assertion
>> truth. I guess this is the prime concern from a concurrency PoV?
>
> Before we spend more time on this, I think we need to have at least a
> plan for that.

Having thought about this some more: the answer could lie in using predicate
locks, and enforcing that the transaction be SERIALIZABLE whenever an ASSERTION
is triggered.

To make use of the predicate locks we'd do a transformation on the ASSERTION
expression. I believe that there is derivation, similar to the one mentioned
up-thread re: "managers and administrators", that would essentially push
predicates into the expression on the basis of the changed data. The semantics
of the expression would remain unchanged, but it would mean that when the
expression is rechecked, the minimal set of data is read and would therefore not
conflict with other DML statements that had triggered the same ASSERTION but had
modified unrelated data. Example:

CREATE TABLE t
 (n INTEGER NOT NULL,
  m INTEGER NOT NULL,
  k INTEGER NOT NULL,
 PRIMARY KEY (n, m));

CREATE ASSERTION sum_k_at_most_10 CHECK
  (NOT EXISTS
    (SELECT * FROM
      (SELECT n, sum(k)
         FROM t
        GROUP BY n)
         AS r(n, ks)
      WHERE ks > 10));

On an INSERT/DELETE/UPDATE of "t", we would transform the inner-most expression
of the ASSERTION to have a predicate of "WHERE n = NEW.n". In my experiments I
can see that doing so allows concurrent transactions to COMMIT that have
modified unrelated segments of "t" (assuming the planner uses Index Scan). The
efficacy of this would be dictated by the granularity of the SIREAD locks; my
understanding is that this can be as low as tuple-level in the case where Index
Scans are used (and this is borne out in my experiments - ie. you don't want a
SeqScan).

> Perhaps we could should disallow cases that we can't
> handle otherwise.  But even that would need some analysis of which
> practical cases we can and cannot handle, how we could extend support in
> the future, etc.


The optimisation I mentioned up-thread, plus the one hypothesised here, both
rely on being able to derive the key of an expression from the underlying base
tables/other expressions. We could perhaps disallow ASSERTIONS that don't have
such properties?

Beyond that I think it starts to get difficult (impossible?) to know which
expressions are likely to be costly on the basis of static analysis. It could be
legitimate to have an ASSERTION defined over what turns out to be a small subset
of a very large table, for example.

-Joe





Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi Andrew,

On 25 Sep 2018, at 01:51, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> I haven't looked at the background of this, but if what you want to know
> is whether the aggregate function has the semantics of min() or max()
> (and if so, which) then the place to look is pg_aggregate.aggsortop.

Thanks for the pointer. I've had a quick look at pg_aggregate, and back
at my code, but I think there is more to it than just the sorting property.
Specifically we need to know about the aggregate function when combined with
connectors <, <=, < ANY, <= ANY, < ALL and <= ALL (and their equivalents
with ">" and ">="). Also, it looks like COUNT and SUM don't have a sortop
(the other aggregates I've catered for do though).

When I come to do the rework of the patch I'll take a more in-depth look
though, and see if this can be utilised.

> As for operators, you can only make assumptions about their meaning if
> the operator is a member of some opfamily that assigns it some
> semantics. 

I had clocked the BT semantics stuff when doing the PoC patch. I have used
the "get_op_btree_interpretation" function for determining operator meaning.

-Joe



Re: Implementing SQL ASSERTION

From
Joe Wildish
Date:
Hi David,

> On 26 Sep 2018, at 19:47, David Fetter <david@fetter.org> wrote:
> 
>> Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)".
> 
> So would DELETE(t), assuming n can be negative.

Oops, right you are. Bug in my implementation :-) 

> Is there some interesting and fairly easily documented subset of
> ASSERTIONs that wouldn't have the "can't prove" property?

We can certainly know at the time the ASSERTION is created if we
can use the transition table optimisation, as that relies upon
the expression being written in such a way that a key can be
derived for each expression.

We could warn or disallow the creation on that basis. Ceri & Widom
mention this actually in their papers, and their view is that most
real-world use cases do indeed allow themselves to be optimised
using the transition tables.

-Joe




Re: Implementing SQL ASSERTION

From
Andrew Gierth
Date:
>>>>> "Joe" == Joe Wildish <joe-postgresql.org@elusive.cx> writes:

 >> I haven't looked at the background of this, but if what you want to
 >> know is whether the aggregate function has the semantics of min() or
 >> max() (and if so, which) then the place to look is
 >> pg_aggregate.aggsortop.

 Joe> Thanks for the pointer. I've had a quick look at pg_aggregate, and
 Joe> back at my code, but I think there is more to it than just the
 Joe> sorting property. Specifically we need to know about the aggregate
 Joe> function when combined with connectors <, <=, < ANY, <= ANY, < ALL
 Joe> and <= ALL (and their equivalents with ">" and ">=").

The presence of an aggsortop means "this aggregate function is
interchangeable with (select x from ... order by x using OP limit 1)",
with all of the semantic consequences that implies. Since OP must be the
"<" or ">" member of a btree index opclass, the semantics of its
relationships with other members of the same opfamily can be deduced
from that.

 Joe> Also, it looks like COUNT and SUM don't have a sortop

Right, because those currently have no semantics that PG needs to know
about or describe.

-- 
Andrew (irc:RhodiumToad)


Re: Implementing SQL ASSERTION

From
Dmitry Dolgov
Date:
> On Tue, Sep 25, 2018 at 1:04 AM Joe Wildish <joe-postgresql.org@elusive.cx>
> wrote:
>
> All agreed.  I’ll give the patch some TLC and get a new version that
> addresses the above.

Hi,

Just a reminder, that the patch still needs to be rebased, could you please do
this? I'm moving the item to the next CF.


Re: Implementing SQL ASSERTION

From
Andres Freund
Date:
Hi,

On 2018-11-29 16:54:14 +0100, Dmitry Dolgov wrote:
> > On Tue, Sep 25, 2018 at 1:04 AM Joe Wildish <joe-postgresql.org@elusive.cx>
> > wrote:
> >
> > All agreed.  I’ll give the patch some TLC and get a new version that
> > addresses the above.
> 
> Hi,
> 
> Just a reminder, that the patch still needs to be rebased, could you please do
> this? I'm moving the item to the next CF.

As nothing has happened, I'm marking this patch as returned with feedback.

Greetings,

Andres Freund