Re: Implementing SQL ASSERTION - Mailing list pgsql-hackers

From Joe Wildish
Subject Re: Implementing SQL ASSERTION
Date
Msg-id 985632EC-3E39-4C51-B47A-ED0ABF63D64F@elusive.cx
Whole thread Raw
In response to Re: Implementing SQL ASSERTION  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Implementing SQL ASSERTION
List pgsql-hackers
Hackers,

Attached is a WIP patch for SQL assertion. I am posting it for anyone who might be interested in seeing it, for
comments/feedback,and to see if others are keen to collaborate on taking it further. It is not near production-ready
(seethoughts on that below). 

The patch builds on the work posted by Peter back in 2013. I've taken his code and updated it to conform to some
generalchanges made to the codebase since then. The bulk of the new work I have done is around when an assertion needs
tobe checked. Essentially it is an implementation of the algorithm described by Ceri & Widom in "Deriving Production
Rulesfor Constraint Maintenance” — http://infolab.stanford.edu/pub/papers/constraint-maintenance.ps 

The general idea is to traverse the expression tree and derive the set of potentially invalidating operations. These
operationsare used to determine when the constraint trigger fires and causes a re-check. The detail is in the paper but
someexamples are: 

* insertion into the subject of an exists cannot be invalidating;
* deletion from the subject of a not exists cannot be invalidating;
* update of columns in the target list of an exists cannot be invalidating;
* certain combinations of aggregates with comparison operations cannot be invalidating.

As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a
deleteor an update but can be invalidated by an insert. 

I have implemented most of the optimisations mentioned in the paper. There are one or two that I am unsure about,
specificallyhow to deal with set-operations that are the subject of an exists. According to the paper, these are
optimisablewhen they're the subject of an exists, but I think it is only applicable for union and not intersect or
except,so I have skipped that particular optimisation for the time being. 

The algorithm works under the assumption that when a recheck occurs the previous check result was true (the research
reportby Ceri & Widom does acknolwedge this assumption). However, unfortunately the SQL specification requires that
bothtrue and unknown be valid results for an assertion's check expression. This doesn't play too well with the
algorithmso for the time being I have disallowed null. I think the solution here may be that when a null result for a
checkoccurs, the assertion is changed to trigger on all operations against the involved tables; once it returns to
true,the triggers can be returned to fire only on the derived invalidating operations. More thought required though.
(note:having just written this paragraph, I've realised I can't right now think of a concrete example illustrating the
point,so it may be that I'm wrong on this). 

The paper does mention a set of optimisations that I have not yet attempted to implement. These are essentially the
techniqueof evaluating the expression against the deltas of a change rather than the full tables. Clearly there is a
largeoverlap with incremental maintainence of views and actually the two authors of the paper have a similiarly named
papercalled "Deriving Production Rules for Incremental View Maintanence". Although I have yet to finish reviewing all
theliterature on the subject, I suspect that realistically for this to make it into production, we'd need some
implementationof these techniques to make the performance palatable. 

Cheers,
-Joe


Attachment

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Logical decoding fast-forward and slot advance
Next
From: Edmund Horner
Date:
Subject: Re: PATCH: psql tab completion for SELECT