Thread: WIP: generalized index constraints

WIP: generalized index constraints

From
Jeff Davis
Date:
This is a follow up to my old proposal here:

http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php

Top pointed out a few problems here:

http://archives.postgresql.org/pgsql-hackers/2008-06/msg00427.php

Here are my updated answers:

1. Not a problem with the new design, which checks the constraints from
ExecInsertIndexTuples().

2. Not a problem for similar reasons.

3. I don't have an answer here yet, but I have a few thoughts. I see it
as a separate proposal. My hand-waving answer is that it should be just
as possible as before to append index constraint failures to a big list,
and loop through it as long as we're making progress. If I need a more
solid proposal for this problem before my generalized constraints
proposal is considered, let me know.

To try out my patch:

(1) Apply patch to 8.5-devel and Init DB

(2) Install contrib/btree_gist (only necessary for this example, patch
works with Btree and GIN, too).

(3)
  => create table test(i int, c circle);
  => create index test_idx on test using gist(i, c);
  => UPDATE pg_index SET indconstrats = '3 3'
     WHERE indexrelid='test_idx'::regclass;

  In the above query, 3 is the equality strategy number for the GiST
opclass for integers, and 3 is also the "overlaps" strategy number for
the GiST opclass for circles, so we put a 3 for each attribute. What
this will mean is that it will reject any new tuple when there is
already another tuple in the table with an equal value of i AND an
overlapping value of c. Concurrency should behave identically to UNIQUE
on a btree.

(4) Now, try some inserts (concurrent or otherwise) and see what
happens.

Ultimately, I think the language for this might shape up something like:

CREATE INDEX test_idx ON test USING gist
  (i CONSTRAINT =, c CONSTRAINT &&);

which would avoid the need for updating the catalog, of course.

Limitations:

 * Still not deferrable, even 'til the end of the command.
 * Your constraint must be symmetric (if tuple A conflicts with tuple B,
tuple B must conflict with tuple A).
 * The types have to match between the left and right arguments in the
operator class and the type of the column in the table. This is normally
true, but the GIN Array opclass works on type "anyarray", but the table
has a normal type, which causes a problem. Maybe it's possible to be
smarter about this, but the workaround is to just create more opclasses
(I believe).

Any input is appreciated (design problems, implementation, language
ideas, or anything else). I'd like to get it into shape for the July 15
commitfest if no major problems are found.

Regards,
    Jeff Davis

Attachment

Re: WIP: generalized index constraints

From
Simon Riggs
Date:
On Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote:
> This is a follow up to my old proposal here:
> 
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
> 

> Any input is appreciated (design problems, implementation, language
> ideas, or anything else). I'd like to get it into shape for the July
> 15 commitfest if no major problems are found.

I was concerned that your definition of concurrently inserted didn't
seem to match the size of the shared memory array required.

How will you cope with a large COPY? Surely there can be more than one
concurrent insert from any backend?


It would be useful to see a real example of what this can be used for.


I think it will be useful to separate the concepts of a constraint from
the concept of an index. It seems possible to have a UNIQUE constraint
that doesn't help at all in locating rows, just in proving that the rows
are unique.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: WIP: generalized index constraints

From
Greg Stark
Date:
On Mon, Jul 6, 2009 at 11:56 AM, Simon Riggs<simon@2ndquadrant.com> wrote:
> How will you cope with a large COPY? Surely there can be more than one
> concurrent insert from any backend?

He only needs to handle inserts for the period they're actively being
inserted into the index. Once they're in the index he'll find them
using the index scan. In other words this is all a proxy for the way
btree locks index pages while it looks for a unique key violation.

I'm a bit concerned about the use of tid. You might have to look at a
lot of heap pages to check for conflicts. I suppose they're almost
certainly all in shared memory though. Also, it sounds like you're
anticipating the possibility of dead entries in the array but if you
do then you need to store the xmin also to protect against a tuple
that's been vacuumed and had its line pointer reused since. But I
don't see the necessity for that anyways since you can just clean up
the entry on abort.


-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: WIP: generalized index constraints

From
David Fetter
Date:
On Mon, Jul 06, 2009 at 11:56:41AM +0100, Simon Riggs wrote:
> On Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote:
> > This is a follow up to my old proposal here:
> > 
> > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
> > 
> 
> > Any input is appreciated (design problems, implementation,
> > language ideas, or anything else). I'd like to get it into shape
> > for the July 15 commitfest if no major problems are found.
> 
> I was concerned that your definition of concurrently inserted didn't
> seem to match the size of the shared memory array required.
> 
> How will you cope with a large COPY? Surely there can be more than
> one concurrent insert from any backend?
> 
> It would be useful to see a real example of what this can be used
> for.

Constraints like "these intervals can't overlap" would be one.  It's
handy in calendaring applications, for example.

> I think it will be useful to separate the concepts of a constraint
> from the concept of an index.  It seems possible to have a UNIQUE
> constraint that doesn't help at all in locating rows, just in
> proving that the rows are unique.

Interesting idea.  Are you thinking of this in terms of things the
planner can do once it knows a set is all distinct values, or...?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Mon, 2009-07-06 at 12:28 +0100, Greg Stark wrote:
> He only needs to handle inserts for the period they're actively being
> inserted into the index. Once they're in the index he'll find them
> using the index scan. In other words this is all a proxy for the way
> btree locks index pages while it looks for a unique key violation.

Exactly, that was my design:

/** We have to find all tuples, even those not visible* yet. Other transactions may have inserted many tuples (or* the
transactionmight be a prepared transaction), so there* may be some tuples that are not in the shared memory* structure
andnot visible.*/
 

> I'm a bit concerned about the use of tid. You might have to look at a
> lot of heap pages to check for conflicts. I suppose they're almost
> certainly all in shared memory though.

That was my hope.

The 8.4 bulk insert code might defeat that to some degree, however.
Maybe that could be disabled when inserting into an index with
constraints? I didn't think about it before, but the bulk insert buffer
ring would affect unique btrees, too, right?

> Also, it sounds like you're
> anticipating the possibility of dead entries in the array but if you
> do then you need to store the xmin also to protect against a tuple
> that's been vacuumed and had its line pointer reused since. But I
> don't see the necessity for that anyways since you can just clean up
> the entry on abort.

Can you tell me a little more specifically the problem you're worried
about? If the tuple has been VACUUMed and removed, then the TID search
will either find a tuple, and do a spurious constraint check against it;
or not find a tuple, and just move on.

I could have the commit and abort paths clear the entry, which might
optimize away some of the TransactionIdIsInProgress() calls for
transactions that ended normally. But that didn't strike me as a big
cost compared to the index scan.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote:
> I think it will be useful to separate the concepts of a constraint from
> the concept of an index. It seems possible to have a UNIQUE constraint
> that doesn't help at all in locating rows, just in proving that the rows
> are unique.

That would be interesting. Do you have a use case? Checking the
constraint would surely be slower in a lot of cases.

I could imagine different constraint-checking schemes that could be fast
against a heap. For instance, if it's greater than the max or less than
the min value, that would be cheap to check. That might be an
interesting way to handle the constraint for a sequence-generated
column, or timestamp column that is always ascending.

However, the problem is I don't see a lot of room for a practical use
case. In the above situations, you'd almost certainly want indexes
anyway: what's the point of a sequence number unless you're going to do
lookups? And if you have an ascending timestamp column, I would think
that you might do range lookups occasionally (which will be even better
because the heap will be clustered).

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Mon, 2009-07-06 at 07:30 -0700, David Fetter wrote:
> > It would be useful to see a real example of what this can be used
> > for.
> 
> Constraints like "these intervals can't overlap" would be one.  It's
> handy in calendaring applications, for example.

Exactly, you already know my use case ;) My goal is a "temporal key",
where you can't have overlapping intervals of time, e.g. the constraint
"nobody can be two places at the same time".

> > I think it will be useful to separate the concepts of a constraint
> > from the concept of an index.  It seems possible to have a UNIQUE
> > constraint that doesn't help at all in locating rows, just in
> > proving that the rows are unique.
> 
> Interesting idea.  Are you thinking of this in terms of things the
> planner can do once it knows a set is all distinct values, or...?

I think that's an orthogonal idea.

It's a good idea though, I would like the planner to be smarter about
those kinds of things. A simple example is that if a table has a
non-partial unique constraint anywhere, then "select * from foo union
select * from foo" can be transformed into "select * from
foo" (eliminating the expensive union).

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Greg Stark
Date:
On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@j-davis.com> wrote:
>
> Exactly, you already know my use case ;) My goal is a "temporal key",
> where you can't have overlapping intervals of time, e.g. the constraint
> "nobody can be two places at the same time".

Incidentally to handle non-overlapping ranges you don't need GIST, you
can actually use a plain btree. Since there are no overlapping ranges
the ranges have a complete ordering and you can get that by just
sorting by either endpoint. To enforce the constraint you only have to
compare with the previous and following element in the btree.

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote:
> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@j-davis.com> wrote:
> >
> > Exactly, you already know my use case ;) My goal is a "temporal key",
> > where you can't have overlapping intervals of time, e.g. the constraint
> > "nobody can be two places at the same time".
> 
> Incidentally to handle non-overlapping ranges you don't need GIST, you
> can actually use a plain btree. Since there are no overlapping ranges
> the ranges have a complete ordering and you can get that by just
> sorting by either endpoint. To enforce the constraint you only have to
> compare with the previous and following element in the btree.

What if you have an entire index full of overlapping dead tuples, and a
few live ones? How would search work?

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Simon Riggs
Date:
On Mon, 2009-07-06 at 08:50 -0700, Jeff Davis wrote:
> On Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote:
> > I think it will be useful to separate the concepts of a constraint from
> > the concept of an index. It seems possible to have a UNIQUE constraint
> > that doesn't help at all in locating rows, just in proving that the rows
> > are unique.
> 
> That would be interesting. Do you have a use case? Checking the
> constraint would surely be slower in a lot of cases.
> 
> I could imagine different constraint-checking schemes that could be fast
> against a heap. For instance, if it's greater than the max or less than
> the min value, that would be cheap to check. That might be an
> interesting way to handle the constraint for a sequence-generated
> column, or timestamp column that is always ascending.

Yes.

> However, the problem is I don't see a lot of room for a practical use
> case. In the above situations, you'd almost certainly want indexes
> anyway: what's the point of a sequence number unless you're going to do
> lookups? And if you have an ascending timestamp column, I would think
> that you might do range lookups occasionally (which will be even better
> because the heap will be clustered).

In many cases, people add unique indexes solely to allow replication to
work correctly. The index itself may never be used, especially in high
volume applications.

How do you handle uniqueness within a stream? Presumably it is possible
and useful to have a stream of data that can be guaranteed unique, yet a
stream would never be uniquely targeted for lookups because of the
volume of data involved.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: WIP: generalized index constraints

From
Greg Stark
Date:
On Mon, Jul 6, 2009 at 6:20 PM, Jeff Davis<pgsql@j-davis.com> wrote:
> On Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote:
>> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@j-davis.com> wrote:
>> >
>> > Exactly, you already know my use case ;) My goal is a "temporal key",
>> > where you can't have overlapping intervals of time, e.g. the constraint
>> > "nobody can be two places at the same time".
>>
>> Incidentally to handle non-overlapping ranges you don't need GIST, you
>> can actually use a plain btree. Since there are no overlapping ranges
>> the ranges have a complete ordering and you can get that by just
>> sorting by either endpoint. To enforce the constraint you only have to
>> compare with the previous and following element in the btree.
>
> What if you have an entire index full of overlapping dead tuples, and a
> few live ones? How would search work?

I should clarify I didn't mean you could implement it in SQL using
Postgres btrees. I just meant that a tree data structure was
sufficient, you don't need the power of GIST. It's probably easier to
implement it in GIST in Postgres since it's there though.

So it would work just like regular btrees, you only consider it a
conflict if there's a live value that conflicts.



-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
> In many cases, people add unique indexes solely to allow replication to
> work correctly. The index itself may never be used, especially in high
> volume applications.

Interesting. Maybe we should at least try to leave room for this feature
to be added later. I agree that, from a theoretical perspective,
requiring a UNIQUE constraint to use an index is wrong. For one thing,
you can't ensure the uniqueness without defining some total order
(although you can define an arbitrary total order for cases with no
meaningful total order).

> How do you handle uniqueness within a stream? Presumably it is possible
> and useful to have a stream of data that can be guaranteed unique, yet a
> stream would never be uniquely targeted for lookups because of the
> volume of data involved.

[ Simon is asking me because I work for Truviso, but my response is not
officially from Truviso ]

There are a few cases worth mentioning here. First, if you have a stream
that's backed by a table, you can use a table constraint. Second, you
might choose to have an "in-order" constraint (not necessary, the system
can fix out-of-order data), which could be a unique constraint that's
very cheap to test.

Additionally, this is not strictly a constraint, but if you have
downstream operators, like COUNT(DISTINCT...), that can be seen as being
similar to a constraint. These will often be over a limited span of
time, say, a minute or an hour, and we can keep the necessary state. If
there are a huge number of distinct values there, then it's a challenge
to avoid keeping a lot of state.

There are a few other specialized methods that we can use for specific
use-cases.

Regards,Jeff Davis






Re: WIP: generalized index constraints

From
Teodor Sigaev
Date:
> CREATE INDEX test_idx ON test USING gist
>   (i CONSTRAINT =, c CONSTRAINT &&);
> 
> which would avoid the need for updating the catalog, of course.
Hmm, looks like "index"-fied table's constrains

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
>> In many cases, people add unique indexes solely to allow replication to
>> work correctly. The index itself may never be used, especially in high
>> volume applications.

> Interesting. Maybe we should at least try to leave room for this feature
> to be added later. I agree that, from a theoretical perspective,
> requiring a UNIQUE constraint to use an index is wrong. For one thing,
> you can't ensure the uniqueness without defining some total order
> (although you can define an arbitrary total order for cases with no
> meaningful total order).

This seems a bit pointless.  There is certainly not any use case for a
constraint without an enforcement mechanism (or at least none the PG
community is likely to consider legitimate ;-)).  And it's not very
realistic to suppose that you'd check a constraint by doing a seqscan
every time.  Therefore there has to be an index underlying the
constraint somehow.  Jeff's complaint about total order is not an
argument against having an index, it's just pointing out that btree is
not the only possible type of index.  It's perfectly legitimate to
imagine using a hash index to enforce uniqueness, for example.  If hash
indexes had better performance we'd probably already have been looking
for a way to do that, and wanting some outside-the-AM mechanism for it
so we didn't have to duplicate code from btree.

Also, if hash indexes were a realistic alternative to btree for this,
we'd already have come up against the problem that the CONSTRAINT syntax
doesn't provide any way to specify what kind of index you want to use
underneath the constraint.  I think it might be interesting to turn
around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
declaration should depend on some previously added index, eg
something like
ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index

Not sure how that squares exactly with the question of variant
definitions of uniqueness semantics (as opposed to purely implementation
decisions like hash vs btree).  But it's a different way to come at it.
        regards, tom lane


Re: WIP: generalized index constraints

From
Simon Riggs
Date:
On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index

This would be very useful, though perhaps only because we do not have
REINDEX CONCURRENTLY.

It is likely to be useful in the future to allow an index with N
columns, yet which can provide uniqueness with < N of those columns.
This capability is known as covered indexes and will be important if
Heikki writes his index-only scan code.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: WIP: generalized index constraints

From
Simon Riggs
Date:
On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
> >> In many cases, people add unique indexes solely to allow replication to
> >> work correctly. The index itself may never be used, especially in high
> >> volume applications.
> 
> > Interesting. Maybe we should at least try to leave room for this feature
> > to be added later. I agree that, from a theoretical perspective,
> > requiring a UNIQUE constraint to use an index is wrong. For one thing,
> > you can't ensure the uniqueness without defining some total order
> > (although you can define an arbitrary total order for cases with no
> > meaningful total order).
> 
> This seems a bit pointless.  There is certainly not any use case for a
> constraint without an enforcement mechanism (or at least none the PG
> community is likely to consider legitimate ;-)).  And it's not very
> realistic to suppose that you'd check a constraint by doing a seqscan
> every time.  Therefore there has to be an index underlying the
> constraint somehow.  

I think the idea has been misconstrued.

Obviously a constraint requires an enforcement mechanism. That doesn't
imply that the enforcement mechanism must be fully usable as an index.

The example being discussed was enforcing uniqueness on monotonically
increasing columns. If we knew that a column value was GENERATED ALWAYS
using a sequence, then we could simply skip the uniqueness check
altogether. No index, yet an enforced unique constraint.

Yes, we would need to understand the relationship between the sequence
and the table and throw an error in certain sequence update cases (and
we may need to check those with a seq scan). But that seems a small
price to pay for the avoidance of a potentially very large index that
may have no purpose.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: WIP: generalized index constraints

From
Greg Stark
Date:
On Tue, Jul 7, 2009 at 6:22 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
> This seems a bit pointless.  There is certainly not any use case for a
> constraint without an enforcement mechanism (or at least none the PG
> community is likely to consider legitimate ;-)).  And it's not very
> realistic to suppose that you'd check a constraint by doing a seqscan
> every time.  Therefore there has to be an index underlying the
> constraint somehow.

I'm not entirely convinced that running a full scan to enforce
constraints is necessarily such a crazy idea. It may well be the most
efficient approach after a major bulk load. And consider a read-only
database where the only purpose of the constraint is to inform the
optimizer that it can trust the property to hold.

That said this seems like an orthogonal issue to me.

> Jeff's complaint about total order is not an
> argument against having an index, it's just pointing out that btree is
> not the only possible type of index.  It's perfectly legitimate to
> imagine using a hash index to enforce uniqueness, for example.  If hash
> indexes had better performance we'd probably already have been looking
> for a way to do that, and wanting some outside-the-AM mechanism for it
> so we didn't have to duplicate code from btree.

I'm a bit at a loss why we need this extra data structure though. The
whole duplicated code issue seems to me to be one largely of code
structure. If we hoisted the heap-value rechecking code out of the
btree AM then the hash AM could reuse it just fine.

Both the hash and btree AMs would have to implement some kind of
"insert-unique-key" operation which would hold some kind of lock
preventing duplicate unique keys from being inserted but both btree
and hash could implement that efficiently by locking one page or one
hash value.

GIST would need something like this "store the key value or tid in
shared memory" mechanism. But that could be implemented as an external
facility which GIST then made use of -- just the way every part of the
system makes use of other parts. It doesn't mean we have to make
"prevent concurrent unique inserts" not the responsibility of the AM
which knows best how to handle that.

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote:
> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> It is likely to be useful in the future to allow an index with N
> columns, yet which can provide uniqueness with < N of those columns.
> This capability is known as covered indexes and will be important if
> Heikki writes his index-only scan code.

My patch offers this capability, and the language I suggested would
support it.

In the current version of the patch, just use InvalidStrategy (0)
instead of, say, BTEqualStrategyNumber (3) for the attributes that you
don't want to be a part of the constraint. Some of the proper error
checking is not done yet, but it will work.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote:
>> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
>> It is likely to be useful in the future to allow an index with N
>> columns, yet which can provide uniqueness with < N of those columns.
>> This capability is known as covered indexes and will be important if
>> Heikki writes his index-only scan code.

> My patch offers this capability, and the language I suggested would
> support it.

> In the current version of the patch, just use InvalidStrategy (0)
> instead of, say, BTEqualStrategyNumber (3) for the attributes that you
> don't want to be a part of the constraint. Some of the proper error
> checking is not done yet, but it will work.

I don't think this even approximates the need --- in particular it's not
clear what the semantics of combination across different index columns
are.  I assume you've hot-wired it so that several BTEqualStrategyNumber
columns will work like a normal multicolumn uniqueness constraint (IOW
it's okay as long as at least one column is NULL or isn't equal).  But
I'm not at all sure that's what I'd want for some other operator type.

Also, what happens if you want to use the same index to support more
than one logical constraint?  This is impossible if you put the
information into pg_index, I think.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-07-07 at 14:57 -0400, Tom Lane wrote:
> I don't think this even approximates the need --- in particular it's not
> clear what the semantics of combination across different index columns
> are.  I assume you've hot-wired it so that several BTEqualStrategyNumber
> columns will work like a normal multicolumn uniqueness constraint (IOW
> it's okay as long as at least one column is NULL or isn't equal).  But
> I'm not at all sure that's what I'd want for some other operator type.

If any input attributes are NULL, the constraint automatically succeeds
(I think that follows the SQL philosophy). Perhaps we should be able to
override this behavior somehow?

Now, for each attribute where a constraint is defined, it becomes a new
scan key used in the index lookup to enforce the constraint. So, the
more attributes in the constraint, the more permissive the constraint
(just like UNIQUE). 

I can't think of another good interpretation of the constraint. If a
constraint on (x,y) meant "x is unique, and y is unique", it would be
hard to scan the index for y's uniqueness. If you want to say that both
are independently unique, I believe that requires two indexes, and so it
would probably be best to just require the constraints to be specified
separately. Thoughts?

> Also, what happens if you want to use the same index to support more
> than one logical constraint?  This is impossible if you put the
> information into pg_index, I think.
> 

I like the idea to store the information outside of pg_index. It means
that I don't have to build the index and check the constraint at the
same time. It would be more like adding a CHECK constraint.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> Also, if hash indexes were a realistic alternative to btree for this,
> we'd already have come up against the problem that the CONSTRAINT syntax
> doesn't provide any way to specify what kind of index you want to use
> underneath the constraint.  I think it might be interesting to turn
> around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
> declaration should depend on some previously added index, eg
> something like
> 
>     ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index
> 

How about:
ALTER TABLE tab ADD INDEX CONSTRAINT [name]    (col1 [op], col2 [op]) USING index

And then just validate the constraint at creation time, and store the
information in pg_constraint.

Regards,Jeff Davis




Re: WIP: generalized index constraints

From
Dean Rasheed
Date:
Tom Lane wrote:
> ...  I think it might be interesting to turn
> around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
> declaration should depend on some previously added index, eg
> something like
> 
>     ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index
> 

Is there any reason to limit UNIQUE constraints to lists of table
columns? If you can build a unique index on an expression, why not a
unique constraint?

A quick test defining an index and manually adding catalog entries for
the constraint and depends showed that it appears to work fine (and it's
compatible with my deferrable unique constraints patch :-) )
- Dean



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
Right now this patch does not support GIN because GIN doesn't support
amgettuple.

It could be made to support GIN by doing a bitmap index scan, manually
fetching the next tuple (or, if it's lossy, the next one on the page),
checking it against the snapshot, and then rechecking it to make sure it
still matches.

The API I'm looking for is essentially the same as index_getnext(),
which makes the most sense for finding constraint violations.

Is it possible to re-add amgettuple to GIN, and just set the cost high
so it's not chosen by the planner? Or is there some reason this is
fundamentally a bad idea (or won't work at all)?

I know we removed it in 8.4:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg01123.php
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

but the reasoning seemed mostly because:
1. GIN could not support amgettuple efficiently (lots of rechecking)
2. Normal index scans did not fit a common use case for GIN, anyway

However, if my feature needs to perform this check anyway (to support
GIN, that is), it seems like it could be re-added. There was also some
resistance to removing it in the first place (e.g. for anti-joins), so
perhaps it can be made to be efficient again during the 8.5 cycle.

Regards,Jeff Davis





Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Is it possible to re-add amgettuple to GIN, and just set the cost high
> so it's not chosen by the planner? Or is there some reason this is
> fundamentally a bad idea (or won't work at all)?

We wouldn't have deleted it if it were practical to make it work.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-07-11 at 19:06 -0400, Tom Lane wrote:
> > Is it possible to re-add amgettuple to GIN, and just set the cost high

> We wouldn't have deleted it if it were practical to make it work.

Can you elaborate a little?

Following the thread, I see:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

"What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions."

And the next message refers to:

http://archives.postgresql.org/message-id/4974B002.3040202@sigaev.ru

"amgettuple interface hasn't possibility to work with page-wide result
instead of exact ItemPointer. amgettuple can not return just a block
number as amgetbitmap can."

I see why it's impractical to make it efficient, but the way I see it we
can make gingettuple just a wrapper around gingetbitmap, which just
iterates through the bitmap. It would not have any performance benefit
over gingetbitmap, obviously. But if my index constraints patch is going
to support GIN (which seems like an interesting application), I would
need to implement a function which does this anyway (efficiently or
not).

Regards,Jeff Davis





Re: WIP: generalized index constraints

From
Jeff Davis
Date:
Updated patch attached.

Changes:
 * Added syntax support:
     CREATE INDEX foo_idx ON foo ... (a CONSTRAINT =, b CONSTRAINT &&);
 * More aggressively clear the shared memory entries to avoid
   unnecessary checks
 * Code cleanup

TODO:
 * When adding constraint to table with data already in it, verify that
   existing data satisfies constraint.
 * Clean up error messages a little
 * Docs

The following are possible TODO items, but I'd like to get some feedback
first:
 * It seems like an alternative language would be better:
     ALTER TABLE foo ADD INDEX CONSTRAINT optional_name (a =, b &&)
       USING foo_idx;
   This language would be more like a table constraint that happens to
   use an index. I think it's better because it allows multiple
   constraints to be enforced by the same index.
 * Right now it only supports index AMs that offer amgettuple, which
   excludes GIN. Consider adding a crude implementation of gingettuple
   that just calls gingetbitmap internally (obviously providing no
   performance advantage over gingetbitmap).

Regards,
    Jeff Davis

Attachment

Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/7/15 Jeff Davis <pgsql@j-davis.com>:
> Updated patch attached.
>
> Changes:
>  * Added syntax support:
>     CREATE INDEX foo_idx ON foo ... (a CONSTRAINT =, b CONSTRAINT &&);
>  * More aggressively clear the shared memory entries to avoid
>   unnecessary checks
>  * Code cleanup
>
> TODO:
>  * When adding constraint to table with data already in it, verify that
>   existing data satisfies constraint.
>  * Clean up error messages a little
>  * Docs

Hi Jeff,

I've been assigned to do an initial review of your patch.  Because
this patch is still WIP, there's not a lot for me to do.  I see from
the thread that there are still some design questions that remain
open, but I won't be addressing those.  The internals of indexes and
constraints is not an area of the code I have any particular insight
about.

The patch applies cleanly, builds successfully and passes regression
tests.  I read through the code changes, and didn't notice any code
convention violations.

I had a play around with the feature in psql.  I think the syntax is
okay, but using "ALTER TABLE ... ADD" as you mentioned upthread could
be a better option.

I noticed that there's no change to the output of \d in psql to show
the constraint, so when I do a \d on my test table, I can see that
there's a gist index there, but I can't tell that there is also a
constraint on it.  This seems like a pretty significant shortcoming.
Essentially once you've created one of these index constraints, it
vanishes into the catalogs and becomes invisible to the user.  This
might call for a modification of pg_get_indexdef()?

You've already noted this as a TODO, but clearly the error messages
need some serious help.  If I deliberately violate an index constraint
I get:

ERROR:  conflict detected 3

At minimum the message should use the term "constraint" and give the
name of the index that has detected the conflict.  I think something
like this would be okay:

ERROR:  new record violates constraint on index "circle_idx"

I also noticed that the patch does not include any changes or
additions to the regression test suite.  Perhaps it ought to?

That's all the feedback I have for the moment.  I hope you find my
comments constructive.

Cheers,
BJ


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Thu, 2009-07-16 at 15:22 +1000, Brendan Jurd wrote:
> I had a play around with the feature in psql.  I think the syntax is
> okay, but using "ALTER TABLE ... ADD" as you mentioned upthread could
> be a better option.

Ok, I think we're pretty much settled on that option then.

Another idea that I thought about is that:
  ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;

could be a shorthand for:
  ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;

The benefit is that it could go over GiST indexes or hash indexes, not
just btrees. The syntax could also be useful to turn an existing btree
into a unique btree.

> I noticed that there's no change to the output of \d in psql to show
> the constraint, so when I do a \d on my test table, I can see that
> there's a gist index there, but I can't tell that there is also a
> constraint on it.  This seems like a pretty significant shortcoming.
> Essentially once you've created one of these index constraints, it
> vanishes into the catalogs and becomes invisible to the user.  This
> might call for a modification of pg_get_indexdef()?

I agree, that's important. Psql support, regression tests, and docs are
all intertwined somewhat with the syntax, so I held off on that work
until I got a little feedback. I will get to work and see if I can put
together a more complete version in the next few days.

If you happen to have time, you can see if you can break my current
patch. I expect the basic algorithm to remain about the same for my next
version, so if you see any problems with that, please let me know. Also,
if you see any possible improvements that could make it useful for more
situations, that would be helpful, too.

But I think I have enough information to move forward, so if you want to
move on to a more complete patch, feel free.

Thanks for the review!

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/7/17 Jeff Davis <pgsql@j-davis.com>:
> Another idea that I thought about is that:
>
>   ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;
>
> could be a shorthand for:
>
>   ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;
>
> The benefit is that it could go over GiST indexes or hash indexes, not
> just btrees. The syntax could also be useful to turn an existing btree
> into a unique btree.

I like that idea ... although how would this interact (if at all) with
the existing pg_index.isunique flag?  Would it become deprecated in
favour of using indconstrats, or would you actually look at switching
isunique to TRUE if somebody applies a constraint which is made up
entirely of equality ops?

Cheers,
BJ


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Fri, 2009-07-17 at 09:51 +1000, Brendan Jurd wrote:
> I like that idea ... although how would this interact (if at all) with
> the existing pg_index.isunique flag?  Would it become deprecated in
> favour of using indconstrats, or would you actually look at switching
> isunique to TRUE if somebody applies a constraint which is made up
> entirely of equality ops?

If this ALTER TABLE ADD UNIQUE ... USING syntax is really shorthand for
my special index constraints, it would probably have to use the general
mechanism. Otherwise there would be no way to use the general mechanism
over a btree, which I think should be possible (if nothing else it would
be good to allow apples-to-apples performance testing of my patch).

But I guess it doesn't have to be directly shorthand, ALTER TABLE ADD
UNIQUE ... USING could choose to turn an existing index unique when
possible (e.g. btree), otherwise use the general mechanism.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
I'm going to try to get this patch ready for the 9-15 commitfest. Here
are a few design updates:

(1) Language:

I think that the new language should be a table constraint, and I think
there's a consensus on that. The specific language I have in mind is:
 CREATE TABLE (   ...,   INDEX CONSTRAINT (<attname> <op>, ...) USING INDEX <indexname> );
 ALTER TABLE ADD INDEX CONSTRAINT (<attname> <op>, ...)   USING INDEX <indexname>;

Where <op> is the constraint operator. For example, if all <op>s are
"=" (or whatever the operator for BTEqualStragey is for that type), that
would be semantically identical to a UNIQUE constraint.

(2) Enforce while creating constraint:

To enforce the constraint while adding it to a table with existing data,
I'd treat it somewhat similar to a CHECK constraint: exclusive lock on
the table, and check every tuple. 

Note that this is going to be O( n * log n ), assuming a search time of
O( log n ) on the index. A CHECK constraint is obviously just O(n).

(3) Catalog change:

I'll need to add a column pg_constraint.constrategies of type
int2vector. This will hold the strategy numbers of the constraint
operators. For instance, if there's a 2-column index constraint on a
BTree index, and both constraint operators are "=", it will be a vector
containing [BTEqualStrategy, BTEqualStrategy].

I will add a column pg_class.relindconstraints which will be similar to
relchecks (see below).

Also, introduce a new "contype 'i'" meaning that it's an index
constraint.

The patch relies on the existence of pg_constraint.conindid, which is
already committed for 8.5.

(4) Enforce on insert procedure:

This is mostly the same as the previous patch. However, I think I need
to avoid catalog lookups in the index insert path, which is a problem
Tom pointed out in the deferrable unique constraints discussion.

My plan is to make it a part of ResultRelInfo and initialize it in a way
similar to a CHECK constraint (in ExecRelCheck() if it's not already
initialized).

I would use relindconstraints to prevent going into that path during the
common case where there is no index constraint (in particular, when
bootstrapping).

Comments, suggestions, ideas?

Regards,Jeff Davis







Re: WIP: generalized index constraints

From
Heikki Linnakangas
Date:
Jeff Davis wrote:
> I'm going to try to get this patch ready for the 9-15 commitfest. Here
> are a few design updates:
> 
> (1) Language:
> 
> I think that the new language should be a table constraint, and I think
> there's a consensus on that. The specific language I have in mind is:
> 
>   CREATE TABLE (
>     ...,
>     INDEX CONSTRAINT (<attname> <op>, ...) USING INDEX <indexname>
>   );

That sounds like the constraint is based on an existing index, but there
can't be any existing indexes on a table that hasn't been created yet.
If this creates the index, then the syntax needs to support specifying
index access method and an opclass for all the columns.

>   ALTER TABLE ADD INDEX CONSTRAINT (<attname> <op>, ...)
>     USING INDEX <indexname>;
> 
> Where <op> is the constraint operator. For example, if all <op>s are
> "=" (or whatever the operator for BTEqualStragey is for that type), that
> would be semantically identical to a UNIQUE constraint.

This makes more sense.

It would be nice to have syntax to create the index and constraint in
one command, so that the constraint can be checked while the index is
being created. Otherwise you need an extra heap scan to check it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Thu, 2009-08-20 at 11:47 +0300, Heikki Linnakangas wrote:
> That sounds like the constraint is based on an existing index, but there
> can't be any existing indexes on a table that hasn't been created yet.
> If this creates the index, then the syntax needs to support specifying
> index access method and an opclass for all the columns.

Of course, thanks for pointing that out. To make it work at CREATE TABLE
time, the language would have to specify the index access method, and
the index name should be optional. Do you think it's worthwhile adjust
the syntax for that, or would it just bloat the CREATE TABLE syntax for
no reason?

I'm leaning toward not allowing it at CREATE TABLE time.

> It would be nice to have syntax to create the index and constraint in
> one command, so that the constraint can be checked while the index is
> being created. Otherwise you need an extra heap scan to check it.

I can leave the old syntax in:

CREATE INDEX <indexname> ON <tablename> USING <method> (a CONSTRAINT <op>, b CONSTRAINT <op>) ...;

and allow both.

However, I'm not sure if it's very easy to provide support for
concurrent index building. Should I block it, or is it worth
investigating further?

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Heikki Linnakangas
Date:
Jeff Davis wrote:
> On Thu, 2009-08-20 at 11:47 +0300, Heikki Linnakangas wrote:
>> That sounds like the constraint is based on an existing index, but there
>> can't be any existing indexes on a table that hasn't been created yet.
>> If this creates the index, then the syntax needs to support specifying
>> index access method and an opclass for all the columns.
> 
> Of course, thanks for pointing that out. To make it work at CREATE TABLE
> time, the language would have to specify the index access method, and
> the index name should be optional. Do you think it's worthwhile adjust
> the syntax for that, or would it just bloat the CREATE TABLE syntax for
> no reason?
> 
> I'm leaning toward not allowing it at CREATE TABLE time.

Seems reasonable to me too.

> However, I'm not sure if it's very easy to provide support for
> concurrent index building. Should I block it, or is it worth
> investigating further?

Dunno. It sure would be nice, but it's not a showstopper.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/8/21 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> Jeff Davis wrote:
>> I'm leaning toward not allowing it at CREATE TABLE time.
>
> Seems reasonable to me too.
>

+1

There are plenty of other things to do with tables that you can't mix
directly into a CREATE TABLE statement (grant permissions, create
triggers, change owner, to name a few) so this would not be a surprise
-- or a hardship -- for users IMO.

As an aside, Jeff, have you considered how this feature would interact
with CREATE TABLE ... LIKE parent_table [ { INCLUDING | EXCLUDING } {
DEFAULTS | CONSTRAINTS | INDEXES } ] ... }?  What if someone asks to
include indexes but not constraints?  Vice-versa?  Will these cases be
handled gracefully?

Cheers,
BJ


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Fri, 2009-08-21 at 11:14 +1000, Brendan Jurd wrote:
> As an aside, Jeff, have you considered how this feature would interact
> with CREATE TABLE ... LIKE parent_table [ { INCLUDING | EXCLUDING } {
> DEFAULTS | CONSTRAINTS | INDEXES } ] ... }?  What if someone asks to
> include indexes but not constraints?  Vice-versa?  Will these cases be
> handled gracefully?

I hadn't considered that yet, thanks for bringing it to my attention.

>From the docs on CREATE TABLE (... LIKE ...):

"Not-null constraints are always copied to the new table. CHECK
constraints will only be copied if INCLUDING CONSTRAINTS is specified;
other types of constraints will never be copied."

If they include constraints and not indexes, nothing special.

If they include indexes and not constraints, I think we should follow
the same policy as unique constraints, and create the index and the
constraint.

The behavior seems a little strange to me, but that's the current
behavior for unique indexes.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/8/21 Jeff Davis <pgsql@j-davis.com>:
> If they include indexes and not constraints, I think we should follow
> the same policy as unique constraints, and create the index and the
> constraint.
>
> The behavior seems a little strange to me, but that's the current
> behavior for unique indexes.

This may be an opportunity to fix it.

The current behaviour seems to be predicated on the unique constraint
being an integral part of the index itself.  While this might be true
from a system catalog point of view (pg_index.indisunique), if a user
says that they want to copy a table's structure INCLUDING INDEXES
EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
clear.  They'd expect it to create an index sans the unique
constraint.  Ignoring the user's intention and copying the index as-is
(including the unique constraint) would be unfriendly.

Unless the SQL spec demands that we do so?

Cheers,
BJ


Re: WIP: generalized index constraints

From
Greg Stark
Date:
On Fri, Aug 21, 2009 at 3:23 AM, Brendan Jurd<direvus@gmail.com> wrote:
> They'd expect it to create an index sans the unique
> constraint.  Ignoring the user's intention and copying the index as-is
> (including the unique constraint) would be unfriendly.
>
> Unless the SQL spec demands that we do so?

There are no indexes in the SQL spec.

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
> This may be an opportunity to fix it.
> 
> The current behaviour seems to be predicated on the unique constraint
> being an integral part of the index itself.  While this might be true
> from a system catalog point of view (pg_index.indisunique), if a user
> says that they want to copy a table's structure INCLUDING INDEXES
> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
> clear.  They'd expect it to create an index sans the unique
> constraint.  Ignoring the user's intention and copying the index as-is
> (including the unique constraint) would be unfriendly.

I don't have strong feelings either way. I think that's probably a
separate patch, and a fairly small patch.

Using the principle of least surprise, if a user specified one of
INDEXES or CONSTRAINTS, but not both, and there is a unique index, we
should raise an ERROR (or at least a WARNING).

There is not much of a problem with backwards compatibility. LIKE is
shorthand (not stored in catalogs), so it doesn't affect
pg_dump/restore. And hopefully there aren't a lot of apps out there
creating tables dynamically using the LIKE syntax.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/8/21 Jeff Davis <pgsql@j-davis.com>:
> On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
>> The current behaviour seems to be predicated on the unique constraint
>> being an integral part of the index itself.  While this might be true
>> from a system catalog point of view (pg_index.indisunique), if a user
>> says that they want to copy a table's structure INCLUDING INDEXES
>> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
>> clear.  They'd expect it to create an index sans the unique
>> constraint.  Ignoring the user's intention and copying the index as-is
>> (including the unique constraint) would be unfriendly.
>
> I don't have strong feelings either way. I think that's probably a
> separate patch, and a fairly small patch.
>

Yeah, as I was writing the above I was thinking that it might end up a
separate patch.  However I was also concerned that it might be less
disruptive if we implement your patch with the less-astonishing
behaviour and fix the unique index case in passing, than to commit
your patch with the bad behavior and then fix both.

Up to you.

> Using the principle of least surprise, if a user specified one of
> INDEXES or CONSTRAINTS, but not both, and there is a unique index, we
> should raise an ERROR (or at least a WARNING).

Actually for what it's worth I would expect INCLUDING INDEXES (with no
mention of what to do about constraints) to go ahead and include
constraints on indexes.  I only have strong feelings where the user
has gone to the trouble of explicitly stating that they want indexes
but *not* constraints and we include the constraints anyway.

I would be fine with a NOTICE in the former case, so something like
this would be cool

# CREATE TABLE foo (LIKE bar INCLUDING INDEXES);
NOTICE: INCLUDING INDEXES will also include any constraints on those indexes.
HINT: Specify EXCLUDING CONSTRAINTS to omit them.

To my mind the severity is similar to such notices as "NOTICE: CREATE
TABLE / UNIQUE will create implicit index ...".

i.e., "this is probably what you wanted us to do, but just in case you
weren't expecting this side-effect, we're letting you know about it".

>
> There is not much of a problem with backwards compatibility. LIKE is
> shorthand (not stored in catalogs), so it doesn't affect
> pg_dump/restore. And hopefully there aren't a lot of apps out there
> creating tables dynamically using the LIKE syntax.
>

Well, it wouldn't surprise me if people are using LIKE to produce
tables for partitioning arrangements.

Cheers,
BJ


Re: WIP: generalized index constraints

From
Dimitri Fontaine
Date:
Hi,

Le 21 août 09 à 06:04, Jeff Davis a écrit :
> There is not much of a problem with backwards compatibility. LIKE is
> shorthand (not stored in catalogs), so it doesn't affect
> pg_dump/restore. And hopefully there aren't a lot of apps out there
> creating tables dynamically using the LIKE syntax.

I for one use this a lot, every time I'm doing partitioning. What I do
is a plpgsql function creating partitions for a given period
(create_parts(date, date) and default interval with
create_parts(date)), and the function will EXECUTE something like this:
  CREATE TABLE schema.partition_YYYYMM (    LIKE schema.parent INCLUDING DEFAULTS INCLUDING INDEXES INCLUDING
CONSTRAINTS,    CHECK ( partition check expression )  )  INHERITS( schema.parent );

The reason to do this is that inherits won't care at all about the
indexes, defaults and constraints. The drawback to doing it this way
is the cheer number of NOTICEs you get back at inherits time when PG
is so verbose about finding that child already has all the parents
columns. From 8.3 onwards it's possible to trick the system though:
  CREATE FUNCTION ... ()   RETURNS ...   LANGUAGE plpgsql   SET client_min_messages TO warning  AS $$  $$;

Regards,
--
dim



Re: WIP: generalized index constraints

From
Alvaro Herrera
Date:
Brendan Jurd escribió:

> I would be fine with a NOTICE in the former case, so something like
> this would be cool
> 
> # CREATE TABLE foo (LIKE bar INCLUDING INDEXES);
> NOTICE: INCLUDING INDEXES will also include any constraints on those indexes.
> HINT: Specify EXCLUDING CONSTRAINTS to omit them.
> 
> To my mind the severity is similar to such notices as "NOTICE: CREATE
> TABLE / UNIQUE will create implicit index ...".
> 
> i.e., "this is probably what you wanted us to do, but just in case you
> weren't expecting this side-effect, we're letting you know about it".

NOTICEs is what we do with index creation on primary key, unique
indexes, and sequences on serial columns, and I think they are seen as
just noise by everyone except novices.  Do we want to add more?

Maybe they should be INFO, so that they are shown to the client but not
sent to the server log.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: WIP: generalized index constraints

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> NOTICEs is what we do with index creation on primary key, unique
> indexes, and sequences on serial columns, and I think they are seen as
> just noise by everyone except novices.  Do we want to add more?

> Maybe they should be INFO, so that they are shown to the client but not
> sent to the server log.

There was some discussion awhile back of creating a new NOVICE message
level.  INFO strikes me as a completely bad idea here, because it would
actually make it harder for non-novices to suppress the messages.
        regards, tom lane


Re: WIP: generalized index constraints

From
David Fetter
Date:
On Fri, Aug 21, 2009 at 12:23:15PM +1000, Brendan Jurd wrote:
> 2009/8/21 Jeff Davis <pgsql@j-davis.com>:
> > If they include indexes and not constraints, I think we should
> > follow the same policy as unique constraints, and create the index
> > and the constraint.
> >
> > The behavior seems a little strange to me, but that's the current
> > behavior for unique indexes.
> 
> This may be an opportunity to fix it.
> 
> The current behaviour seems to be predicated on the unique
> constraint being an integral part of the index itself.  While this
> might be true from a system catalog point of view
> (pg_index.indisunique), if a user says that they want to copy a
> table's structure INCLUDING INDEXES EXCLUDING CONSTRAINTS then IMO
> they've made their intention perfectly clear.  They'd expect it to
> create an index sans the unique constraint.  Ignoring the user's
> intention and copying the index as-is (including the unique
> constraint) would be unfriendly.
> 
> Unless the SQL spec demands that we do so?

SQL:2008, like its predecessors, does not mention indexing at all.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/8/21 Brendan Jurd <direvus@gmail.com>:
> 2009/8/21 Jeff Davis <pgsql@j-davis.com>:
>> On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
>>> The current behaviour seems to be predicated on the unique constraint
>>> being an integral part of the index itself.  While this might be true
>>> from a system catalog point of view (pg_index.indisunique), if a user
>>> says that they want to copy a table's structure INCLUDING INDEXES
>>> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
>>> clear.  They'd expect it to create an index sans the unique
>>> constraint.  Ignoring the user's intention and copying the index as-is
>>> (including the unique constraint) would be unfriendly.
>>
>> I don't have strong feelings either way. I think that's probably a
>> separate patch, and a fairly small patch.
>>
>
> Yeah, as I was writing the above I was thinking that it might end up a
> separate patch.  However I was also concerned that it might be less
> disruptive if we implement your patch with the less-astonishing
> behaviour and fix the unique index case in passing, than to commit
> your patch with the bad behavior and then fix both.
>
> Up to you.
>

Hi Jeff,

Any update on this patch?  The discussion appeared to trail off around
21 Aug with some inconclusive thoughts about handling the corner cases
in CREATE TABLE LIKE.

The September CF starts in a couple of days, so this patch is in
danger of missing the boat.

The unresolved points seem to be:
* What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
Postgres gets this wrong for unique indexes currently.  Should we
persist with the existing behaviour or fix it as part of this patch?
My personal feeling was +1 for fixing it in this patch.
* Should we emit some sort of message when the user specifies
INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both?  I didn't
have strong feelings about this one but there was some differing
thoughts about what log level to use.  I thought NOTICE but Alvaro
reckons we've got too many of those already.  Tom mentioned the
suggested (but unimplemented) NOVICE level, which seems like a good
move but doesn't resolve the problem of what to do in this patch.  One
option would be to add a message at the NOTICE level with a TODO to
downgrade it to NOVICE if/when that becomes available.

Cheers,
BJ


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> The September CF starts in a couple of days, so this patch is in
> danger of missing the boat.

Thanks for keeping track. I accomplished a significant amount today, so
there's still hope for 9/15.

I will most likely just focus on the core functionality so that I have
something complete and reviewable.

> The unresolved points seem to be:
> 
>  * What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
> Postgres gets this wrong for unique indexes currently.  Should we
> persist with the existing behaviour or fix it as part of this patch?
> My personal feeling was +1 for fixing it in this patch.

I don't think that it should make a difference whether "EXCLUDING
CONSTRAINTS" is specified or omitted. There is no "[INCLUDING|EXCLUDING]
CONSTRAINTS" option in the standard, but for the other LIKE options,
EXCLUDING is implied when INCLUDING is not specified.

So, I think we have to make a decision:1. If INCLUDING CONSTRAINTS is specified, but not INCLUDING INDEXES,   do we:
copythe indexes silently; or emit a nice message; or throw    an ERROR?2. What if INCLUDING INDEXES is specified, but
notINCLUDING    CONSTRAINTS?
 

>  * Should we emit some sort of message when the user specifies
> INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both?  I didn't
> have strong feelings about this one but there was some differing
> thoughts about what log level to use.  I thought NOTICE but Alvaro
> reckons we've got too many of those already.  Tom mentioned the
> suggested (but unimplemented) NOVICE level, which seems like a good
> move but doesn't resolve the problem of what to do in this patch.  One
> option would be to add a message at the NOTICE level with a TODO to
> downgrade it to NOVICE if/when that becomes available.

I don't think either of these things are a huge amount of work; they are
mostly just decisions that need to be made. I'll start off implementing
whatever is easiest/cleanest, and we'll continue the discussion.

Regards,Jeff Davis





Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> Any update on this patch?

Attached is the latest version.

Changes:

 * Merged with HEAD
 * Changed from storing the information in pg_index to pg_constraint.
   This required rewriting a large portion of the patch, so it's not a
   clean diff from the previous patch.
 * Implemented the language using ALTER TABLE to add the constraint, as
   discussed (with a slight change to avoid the extra "INDEX" keyword).
 * Added docs
 * Added tests
 * All relations with relindconstraints == 0 do not pass through the
   index constraints enforcement code at all. I did this a little
   differently than what I laid out in the design, but the idea is the
   same and it should avoid any problems.

That's the good news. The bad news:

 * No pg_dump support yet (shouldn't be too hard)
 * Creating a new constraint does not check the existing data for
   conflicts.
 * Doesn't work like the new deferrable unique constraints yet (also
   shouldn't be too hard).
 * I didn't make any changes to the behavior of LIKE (also not hard).
 * Can't be specified at CREATE INDEX time. I don't think this is a
   showstopper, and it will take some significant effort. The advantage
   of allowing this is that the constraints can be checked more quickly
   (in theory) while building the index, and it also might be handy
   shorthand. However, it suffers from a number of problems:
     1. Extra syntax that is almost entirely redundant.
     2. More work.
     3. The performance gains will probably be pretty marginal. We have
        to do N index scans anyway; the savings would only be due to
        the better caching impact and the fact that the index in the
        process of being built is smaller than an already-built index.
   So, right now I'm not in a hurry to fix this last point.

I realize that some of the things missing make the patch uncomittable in
its current form. However, I would still appreciate a review of what I
have ready.

Regards,
    Jeff Davis

Attachment

Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/9/15 Jeff Davis <pgsql@j-davis.com>:
> Attached is the latest version.

Hi Jeff,

I'm just getting started reviewing this version now.  I noticed that
your patch seems to have been generated by git.  Are you hosting this
work on a public repo somewhere that I can pull from?  Also I think
the committers generally prefer context diffs (pipe it through
"filterdiff --format=context --strip=1") in submissions.

Regarding the documentation updates, I think you might want to add
some commentary to Chapter 11: Indexes -- perhaps add a new section
after 11.6 Unique Indexes to talk about general index constraints,
and/or update the wording of 11.6 to reflect your changes.

The paragraph explaining index_constraints in ALTER TABLE may be a
little bit confusing.

<quote>
ADD index_constraint
   This form adds a new index constraint to the table which is
enforced by the given index. The operator provided must be usable as a
search strategy for the index, and it also must detect conflicts
symmetrically. The semantics are similar to a unique index but it
opens up new possibilities. A unique index can only detect conflicts
when the two values are equal, and that behavior is equivalent to an
index constraint where all operators are the equality operator.
   However, an index constraint allows you to use other operators as
well, such as the overlaps operator provided for the circle data type.
The index constraint will ensure that no two circles overlap. See
example below.
</quote>

My eyes started to cross in the second sentence.  "Detect conflicts
symmetrically"?  I have actually *used* this feature successfully in
testing the patch, and I still don't know quite what to make of that
phrase.  You might need to dumb it down.

It might also be good to be a bit more explicit about the way the
choice of operators works.  It is the inverse of the logic used to
express an ordinary value constraint.  E.g., when you use the equality
operator in an index constraint you are in effect saying that new rows
MUST NOT satisfy this operator for any existing rows.

I'll continue reviewing and post more comments on the code itself shortly.

Cheers,
BJ


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/9/15 Jeff Davis <pgsql@j-davis.com>:
> Attached is the latest version.
>

The new error message for a conflict is:

ERROR:  index constraint violation detected
DETAIL:  tuple conflicts with existing data

How about also including the name of the constraint (or index) that
was violated?  I could imagine this error message being frustrating
for someone who had a table with multiple index constraints, as they
wouldn't know which one had raised the conflict.

Also, the DETAIL part should be written as a full sentence with
leading capital and full stop [1], see

I deliberately tried to create an index constraint using a bogus
operator, to see what would happen:

postgres=# alter table circles add constraint circles_overlap (c <->)
using index circle_idx;
ERROR:  no strategy found for operator 1520 in operator family 2595

The error message is pretty unfriendly, but I'm ambivalent about
whether it's worth doing anything about this particular case.

One of the comments I made in my original review [2] was that "\d" in
psql should show the constraint.  I don't think you've addressed this
in the current version.

Cheers,
BJ

[1] http://www.postgresql.org/docs/current/static/error-style-guide.html
[2] http://archives.postgresql.org/message-id/37ed240d0907152222w7ccfc13i8ce8d11a0c517e8@mail.gmail.com


Re: WIP: generalized index constraints

From
Joshua Tolley
Date:
On Tue, Sep 15, 2009 at 11:21:14PM +1000, Brendan Jurd wrote:
> 2009/9/15 Jeff Davis <pgsql@j-davis.com>:
> > Attached is the latest version.
> >
>
> The new error message for a conflict is:
>
> ERROR:  index constraint violation detected
> DETAIL:  tuple conflicts with existing data
>
> How about also including the name of the constraint (or index) that
> was violated?  I could imagine this error message being frustrating
> for someone who had a table with multiple index constraints, as they
> wouldn't know which one had raised the conflict.

Perhaps the tuple that caused the violation as well, like UNIQUE index
violations already do? Even if we know what constraint has been tripped, we
might not know what value did it.

josh@josh# create table a (a integer);
josh@josh*# create unique index a_unique on a (a);
josh@josh*# insert into a values (1), (2), (3);
josh@josh*# insert into a values (8), (3), (4);
ERROR:  duplicate key value violates unique constraint "a_unique"
DETAIL:  Key (a)=(3) already exists.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
> I'm just getting started reviewing this version now.  I noticed that
> your patch seems to have been generated by git.  Are you hosting this
> work on a public repo somewhere that I can pull from?

I just requested a public repo. I will publish there as soon as its
approved.

> Also I think
> the committers generally prefer context diffs (pipe it through
> "filterdiff --format=context --strip=1") in submissions.

Thanks, I will do that for my future patch submissions.

> Regarding the documentation updates, I think you might want to add
> some commentary to Chapter 11: Indexes -- perhaps add a new section
> after 11.6 Unique Indexes to talk about general index constraints,
> and/or update the wording of 11.6 to reflect your changes.

Will do.

> My eyes started to cross in the second sentence.  "Detect conflicts
> symmetrically"?  I have actually *used* this feature successfully in
> testing the patch, and I still don't know quite what to make of that
> phrase.  You might need to dumb it down.

Will do.

> It might also be good to be a bit more explicit about the way the
> choice of operators works.  It is the inverse of the logic used to
> express an ordinary value constraint.  E.g., when you use the equality
> operator in an index constraint you are in effect saying that new rows
> MUST NOT satisfy this operator for any existing rows.

I'll include that, thanks.

I appreciate the quick feedback; I'll make these changes tonight.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 23:21 +1000, Brendan Jurd wrote:
> How about also including the name of the constraint (or index) that
> was violated?  I could imagine this error message being frustrating
> for someone who had a table with multiple index constraints, as they
> wouldn't know which one had raised the conflict.

Yes, that makes sense. As Joshua Tolley mentions, I'll also include the
tuples that caused the conflict.

> Also, the DETAIL part should be written as a full sentence with
> leading capital and full stop [1], see

Oh, I haven't seen that document before. Thanks.

> postgres=# alter table circles add constraint circles_overlap (c <->)
> using index circle_idx;
> ERROR:  no strategy found for operator 1520 in operator family 2595
> 
> The error message is pretty unfriendly, but I'm ambivalent about
> whether it's worth doing anything about this particular case.

I think I could make that error a little better by providing a detail
message explaining what the operator should be able to do.

> One of the comments I made in my original review [2] was that "\d" in
> psql should show the constraint.  I don't think you've addressed this
> in the current version.

I have psql on my list along with pg_dump, but unfortunately I haven't
done either yet. I don't think it will take too much work, so I'll fix
this as soon as I can.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 08:08 -0600, Joshua Tolley wrote:
> Perhaps the tuple that caused the violation as well, like UNIQUE index
> violations already do? Even if we know what constraint has been tripped, we
> might not know what value did it.

Or, even better, include both tuples. With these new constraints the
conflicting tuples aren't necessarily equal.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 12:18 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
>> I'm just getting started reviewing this version now.  I noticed that
>> your patch seems to have been generated by git.  Are you hosting this
>> work on a public repo somewhere that I can pull from?
>
> I just requested a public repo. I will publish there as soon as its
> approved.
>
>> Also I think
>> the committers generally prefer context diffs (pipe it through
>> "filterdiff --format=context --strip=1") in submissions.
>
> Thanks, I will do that for my future patch submissions.
>
>> Regarding the documentation updates, I think you might want to add
>> some commentary to Chapter 11: Indexes -- perhaps add a new section
>> after 11.6 Unique Indexes to talk about general index constraints,
>> and/or update the wording of 11.6 to reflect your changes.
>
> Will do.
>
>> My eyes started to cross in the second sentence.  "Detect conflicts
>> symmetrically"?  I have actually *used* this feature successfully in
>> testing the patch, and I still don't know quite what to make of that
>> phrase.  You might need to dumb it down.
>
> Will do.
>
>> It might also be good to be a bit more explicit about the way the
>> choice of operators works.  It is the inverse of the logic used to
>> express an ordinary value constraint.  E.g., when you use the equality
>> operator in an index constraint you are in effect saying that new rows
>> MUST NOT satisfy this operator for any existing rows.
>
> I'll include that, thanks.
>
> I appreciate the quick feedback; I'll make these changes tonight.

Instead of calling these generalized index constraints, I wonder if we
oughtn't to be calling them something like "don't-overlap constraints"
(that's a bad name, but something along those lines).  They're not
really general at all, except compared to uniqueness constraints (and
they aren't called generalized unique-index constraints, just
generalized index constraints).

I didn't realize understand what this was all for until I read Brendan's review.

...Robert


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/9/16 Robert Haas <robertmhaas@gmail.com>:
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

Well "generalized index constraints" is what we're calling the patch,
but I don't think they are called by that name anywhere in the
proposed documentation changes.  In the extension to ALTER TABLE
syntax, they are simply referred to as "index_constraint".

Cheers,
BJ


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

What would you like to be able to enforce using an index that can't be
solved by this patch? It only works for constraints entirely within a
single table, can you think of a way to express that better?

In the code/docs, mostly I call them just "index constraints" or some
variation thereof. But for the lists, I think that might be too vague.

I don't want to call them "don't overlap constraints", because it's not
limited to a non-overlapping constraint. I also don't think "generalized
unique-index constraints" is a good name: it's confusing and it makes it
sound like it is some new way to use a unique index.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
>> Instead of calling these generalized index constraints, I wonder if we
>> oughtn't to be calling them something like "don't-overlap constraints"
>> (that's a bad name, but something along those lines).  They're not
>> really general at all, except compared to uniqueness constraints (and
>> they aren't called generalized unique-index constraints, just
>> generalized index constraints).
>
> What would you like to be able to enforce using an index that can't be
> solved by this patch? It only works for constraints entirely within a
> single table, can you think of a way to express that better?
>
> In the code/docs, mostly I call them just "index constraints" or some
> variation thereof. But for the lists, I think that might be too vague.
>
> I don't want to call them "don't overlap constraints", because it's not
> limited to a non-overlapping constraint.

Oh.  What else can you do with it?

> I also don't think "generalized
> unique-index constraints" is a good name: it's confusing and it makes it
> sound like it is some new way to use a unique index.

I agree.

...Robert


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/9/16 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> I don't want to call them "don't overlap constraints", because it's not
>> limited to a non-overlapping constraint.
>
> Oh.  What else can you do with it?

Anything that there is an operator for.

Cheers,
BJ


Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 1:14 PM, Brendan Jurd <direvus@gmail.com> wrote:
> 2009/9/16 Robert Haas <robertmhaas@gmail.com>:
>> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> I don't want to call them "don't overlap constraints", because it's not
>>> limited to a non-overlapping constraint.
>>
>> Oh.  What else can you do with it?
>
> Anything that there is an operator for.

Uhh.... so what happens if I create an index constraint using the
+(integer, integer) operator?

...Robert


Re: WIP: generalized index constraints

From
Brendan Jurd
Date:
2009/9/16 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Sep 15, 2009 at 1:14 PM, Brendan Jurd <direvus@gmail.com> wrote:
>> 2009/9/16 Robert Haas <robertmhaas@gmail.com>:
>>> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>>> I don't want to call them "don't overlap constraints", because it's not
>>>> limited to a non-overlapping constraint.
>>>
>>> Oh.  What else can you do with it?
>>
>> Anything that there is an operator for.
>
> Uhh.... so what happens if I create an index constraint using the
> +(integer, integer) operator?

Okay, so my first answer was a simplification.  You can use any
operator that has an appropriate index strategy entry.


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
> Uhh.... so what happens if I create an index constraint using the
> +(integer, integer) operator?

You can use any operator that has an index search strategy. Overlaps is
probably the most useful, but you could imagine other operators, like a
bi-directional containment operator (either LHS is contained in RHS, or
vice-versa).

You can also get creative and have a "similarity" operator that
determines whether two tuples are "too similar". As long as it is
symmetric, the feature will work.

Or just use wrap random() in an operator and see what happens ;)

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 1:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
>> Uhh.... so what happens if I create an index constraint using the
>> +(integer, integer) operator?
>
> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).
>
> You can also get creative and have a "similarity" operator that
> determines whether two tuples are "too similar". As long as it is
> symmetric, the feature will work.

So it allows us to create constraints of the following form?

For all A in the index, there exists no B in the index such that the
given operator (which must be a binary operator returning boolean)
holds of A and B.

If that's correct, I think we should definitely at least mention the
word "overlap" somewhere in the documentation, because that's what
people are going to want to use it for, and it's hard to conceptualize
without examples, at least for me.  You may already be doing this, I
haven't read the patch.

Also, there are certainly other things you could want to do that can't
be handled by this approach.  Perhaps you'd like to create a
constraint that a given value can appear at most twice, or a two
column index (A, B) such that for any A the smallest value of B is
less than A.  These are certainly less common requirements than what
you're talking about here, and I don't think it's important to try to
support them - at least not at this point - but the word "generalized"
doesn't give me a clue that I won't be able to do those things but I
will be able to make an index that prevents my users from handing out
duplicate IP blocks.

...Robert


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 13:48 -0400, Robert Haas wrote:
> So it allows us to create constraints of the following form?
> 
> For all A in the index, there exists no B in the index such that the
> given operator (which must be a binary operator returning boolean)
> holds of A and B.

Yes. And it's slightly more complicated for multi-column constraints:

For all tuples A in the index with attributes 1 to N, there exists no
tuple B such that:  A1 op1 B1 AND  A2 op2 B2 AND  ...  AN op2 BN

If all operators are "=", and the index implements searching on
equality, it's semantically equivalent to a unique index.

> 
> If that's correct, I think we should definitely at least mention the
> word "overlap" somewhere in the documentation, because that's what
> people are going to want to use it for, and it's hard to conceptualize
> without examples, at least for me.  You may already be doing this, I
> haven't read the patch.

My current example uses "overlaps", but I will expand the documentation
to provide more clarity.

> Also, there are certainly other things you could want to do that can't
> be handled by this approach.  Perhaps you'd like to create a
> constraint that a given value can appear at most twice, or a two
> column index (A, B) such that for any A the smallest value of B is
> less than A.

The first is a good example, and actually I think that could be an
add-on to my patch without much difficulty.

The second can't be enforced with an index in nearly the same way
because deleting a tuple could violate the constraint. Also, it seems
like it would be hard to express that kind of constraint. But I agree
that, in principle, it is an index-enforceable constraint.

> These are certainly less common requirements than what
> you're talking about here, and I don't think it's important to try to
> support them - at least not at this point - but the word "generalized"
> doesn't give me a clue that I won't be able to do those things but I
> will be able to make an index that prevents my users from handing out
> duplicate IP blocks.

As far as the name goes, the best I've got so far are "index
constraints" and "generalized index constraints". I'm happy to change
the name if you have a reasonable suggestion.

I don't think the word "generalized" implies that it can do absolutely
anything possible. For the list discussion, I think it's appropriate to
use the term "generalized", because my patch generalizes index
constraints. However, I agree that we shouldn't use that too much in the
code/docs because someone might think of something more general later.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
>> Uhh.... so what happens if I create an index constraint using the
>> +(integer, integer) operator?

> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).

Does it behave sanely for operators that are non-commutative, such
as '>'?  (I'm not even very sure that I know what "sanely" would be
in such a case.)
        regards, tom lane


Re: WIP: generalized index constraints

From
David Fetter
Date:
On Tue, Sep 15, 2009 at 11:31:48AM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 13:48 -0400, Robert Haas wrote:
> > So it allows us to create constraints of the following form?
> > 
> > For all A in the index, there exists no B in the index such that the
> > given operator (which must be a binary operator returning boolean)
> > holds of A and B.
> 
> Yes. And it's slightly more complicated for multi-column constraints:
> 
> For all tuples A in the index with attributes 1 to N, there exists no
> tuple B such that:
>    A1 op1 B1 AND
>    A2 op2 B2 AND
>    ...
>    AN op2 BN
> 
> If all operators are "=", and the index implements searching on
> equality, it's semantically equivalent to a unique index.

Interesting :)  I take it op1..opN (it's opN, not op2, right?) need to
commute?

> > These are certainly less common requirements than what you're
> > talking about here, and I don't think it's important to try to
> > support them - at least not at this point - but the word
> > "generalized" doesn't give me a clue that I won't be able to do
> > those things but I will be able to make an index that prevents my
> > users from handing out duplicate IP blocks.
> 
> As far as the name goes, the best I've got so far are "index
> constraints" and "generalized index constraints". I'm happy to change
> the name if you have a reasonable suggestion.

Here's a couple:

* "generalized-uniqueness constraints"   the hyphen disambiguates

* "operator-based constraints"   A little math-ier, but talks about the API rather than details of   the server
implementation.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 3:03 PM, David Fetter <david@fetter.org> wrote:
> * "operator-based constraints"
>    A little math-ier, but talks about the API rather than details of
>    the server implementation.

Or operator-exclusion constraints?  Operator-based exclusion constraints?

I'm feeling exclusive.

...Robert


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 14:49 -0400, Tom Lane wrote:
> Does it behave sanely for operators that are non-commutative, such
> as '>'?  (I'm not even very sure that I know what "sanely" would be
> in such a case.)

One of the requirements is commutativity (I called it "symmetry" in the
docs, for some reason, I will change that).

I haven't explored in too much detail, but using "x >" (or maybe its
"<" ?) would basically mean that you can only insert increasing values
of x. There most likely be some serious problems there, for instance, if
you HOT update an old tuple's "y" attribute, everything is fine; if you
cold update it you would get an error.

Not exactly intuitive, but if you have lots of other requirements about
how things are updated, then I suppose it might be useful to someone.

If you try it, my current patch won't stop you. Maybe I should detect
the fact that the commutator of an operator is not the operator itself,
and throw an ERROR? Probably would be a good idea.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 12:03 -0700, David Fetter wrote:
> Interesting :)  I take it op1..opN (it's opN, not op2, right?) need to
> commute?

Yeah, it's opN.

And they should commute, but my current patch won't stop you. I think I
should stop that though, it's pretty difficult to think of a good
use-case for that and there is all kinds of danger.

> * "generalized-uniqueness constraints"
>     the hyphen disambiguates

I don't like using the word "unique" in the description, I think it only
adds to the confusion.

> * "operator-based constraints"
>     A little math-ier, but talks about the API rather than details of
>     the server implementation.

I like this much better. Maybe "index operator constraints" or "operator
index constraints"?

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2009-09-15 at 14:49 -0400, Tom Lane wrote:
>> Does it behave sanely for operators that are non-commutative, such
>> as '>'?  (I'm not even very sure that I know what "sanely" would be
>> in such a case.)

> If you try it, my current patch won't stop you. Maybe I should detect
> the fact that the commutator of an operator is not the operator itself,
> and throw an ERROR? Probably would be a good idea.

+1.  Otherwise people *will* try it, and then send us bug reports when
it doesn't behave sanely.
        regards, tom lane


Re: WIP: generalized index constraints

From
David Fetter
Date:
On Tue, Sep 15, 2009 at 12:22:46PM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 12:03 -0700, David Fetter wrote:
> > * "operator-based constraints"
> >     A little math-ier, but talks about the API rather than details of
> >     the server implementation.
> 
> I like this much better. Maybe "index operator constraints" or "operator
> index constraints"?

The word, "index" goes to implementation details, which may change.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 12:49 -0700, David Fetter wrote:
> > I like this much better. Maybe "index operator constraints" or "operator
> > index constraints"?
> 
> The word, "index" goes to implementation details, which may change.

Ok, let's vote on a name then:

operator constraints
operator exclusion constraints
operator conflict constraints
conflict operator constraints
operator index constraints
index constraints
generalized index constraints
something else?

Right now, I like "conflict operator constraints" for the long name
(e.g. feature title, long description in docs), and "operator
constraints" for short (e.g. in the code and some places in the docs).

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
> operator constraints
> operator exclusion constraints
> operator conflict constraints
> conflict operator constraints
> operator index constraints
> index constraints
> generalized index constraints
> something else?

Just to add a couple more permutations of Robert Haas's suggestions:
exclusion operator constraintsexclusive operator constraints

I also like those.

I think that using the word "operator" first makes it sound like the
operator is the thing being excluded, and adding "-based" makes it more
clear but it is too verbose.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
>> operator constraints
>> operator exclusion constraints
>> operator conflict constraints
>> conflict operator constraints
>> operator index constraints
>> index constraints
>> generalized index constraints
>> something else?

> Just to add a couple more permutations of Robert Haas's suggestions:

>  exclusion operator constraints
>  exclusive operator constraints

To my ear, "operator exclusion constraints" or "exclusive operator
constraints" seem reasonable; the other permutations of that phrase
simply aren't good English.

I'm not tremendously happy with any of them though...
        regards, tom lane


Re: WIP: generalized index constraints

From
Joshua Tolley
Date:
On Tue, Sep 15, 2009 at 05:52:35PM -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
> >> operator constraints
> >> operator exclusion constraints
> >> operator conflict constraints
> >> conflict operator constraints
> >> operator index constraints
> >> index constraints
> >> generalized index constraints
> >> something else?
>
> > Just to add a couple more permutations of Robert Haas's suggestions:
>
> >  exclusion operator constraints
> >  exclusive operator constraints
>
> To my ear, "operator exclusion constraints" or "exclusive operator
> constraints" seem reasonable; the other permutations of that phrase
> simply aren't good English.

I was having a hard time coming up with a name that was adequately
short-and-sweet, and still conveyed the idea of both "operator" and "index",
which seems important so as to designate between these and the constraints
we've had all along. Perhaps "indexed operator constraints"?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: WIP: generalized index constraints

From
Peter Eisentraut
Date:
On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

What they should be called is generalized unique constraints, without
reference to "index".  Because what they generalize is the operator by
which uniqueness is determined.

Don't all of these have the same effect at the end?

CREATE TABLE data (a int UNIQUE);

CREATE TABLE data (a int);
CREATE UNIQUE INDEX data_a_idx ON data USING btree (a);

CREATE TABLE data (a int);
CREATE INDEX data_a_idx ON data USING btree (a);
ALTER TABLE data ADD CONSTRAINT a_unique (a =) USING INDEX data_a_idx;

Which brings me to two concerns about the syntax.

First, we have so far been fairly consistent to document that unique
indexes are an implementation detail of unique constraints and should
usually not be used directly.  This new approach basically reverses that
and forces you to define your constraints by means of implementation
details rather than a logical description.  There is nothing in this
feature that makes it strikingly different from the existing constraint
types in a way that would prevent a reasonable syntax for defining the
constraint at table definition time.  Another problem this would lead to
is that a say dump of a table definition wouldn't actually contain all
the constraints that apply to the table anymore, because there might be
additional stuff such as this that can't be expressed that way.

If you look at the example from the documentation,

CREATE TABLE circles(c circle);
CREATE INDEX circles_idx ON circles USING gist (c);
ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
circles_idx;

the only variable pieces of information that need to be provided to the
table are the index type and the operator.  So why not just write it
like this

CREATE TABLE circles (c circle UNIQUE ON gist &&);

and let that create the required index.

And then traditional unique constraints would fit into this as well:

CREATE TABLE data (a int UNIQUE ON btree =);

The other problem I see with the syntax is that

ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
circles_idx;

doesn't seem very intuitive about what is actually being constrained.
For a while I was thinking that it was constraining the table to values
that are in the index or something.  So using a word such as UNIQUE
would help explain what is going on.



Re: WIP: generalized index constraints

From
Peter Eisentraut
Date:
On Tue, 2009-09-15 at 12:22 -0700, Jeff Davis wrote:
> I don't like using the word "unique" in the description, I think it
> only
> adds to the confusion.

It would emphasize that a unique constraint is a common special case of
the feature.



Re: WIP: generalized index constraints

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Sep 15, 2009 at 10:28:28AM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
> > Uhh.... so what happens if I create an index constraint using the
> > +(integer, integer) operator?
> 
> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).
> 
> You can also get creative and have a "similarity" operator that
> determines whether two tuples are "too similar". As long as it is
> symmetric, the feature will work.

One question: does the operator have to be reflexive? I.e. "A op A holds
for all A"?

I am thinking "proximity" or as you state above "similarity". May be
this is a good metaphor, leading to a good name.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKsOPyBcgs9XrR2kYRAhsUAJkBICYUMK0tDrycPbctiGF7YKI/9gCeLQzq
DPyAAkMgZJFn8BZ7P8119/g=
=lVz1
-----END PGP SIGNATURE-----


Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 7:02 PM, Joshua Tolley <eggyknap@gmail.com> wrote:
> On Tue, Sep 15, 2009 at 05:52:35PM -0400, Tom Lane wrote:
>> Jeff Davis <pgsql@j-davis.com> writes:
>> > On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
>> >> operator constraints
>> >> operator exclusion constraints
>> >> operator conflict constraints
>> >> conflict operator constraints
>> >> operator index constraints
>> >> index constraints
>> >> generalized index constraints
>> >> something else?
>>
>> > Just to add a couple more permutations of Robert Haas's suggestions:
>>
>> >  exclusion operator constraints
>> >  exclusive operator constraints
>>
>> To my ear, "operator exclusion constraints" or "exclusive operator
>> constraints" seem reasonable; the other permutations of that phrase
>> simply aren't good English.
>
> I was having a hard time coming up with a name that was adequately
> short-and-sweet, and still conveyed the idea of both "operator" and "index",
> which seems important so as to designate between these and the constraints
> we've had all along. Perhaps "indexed operator constraints"?

Really I suppose they are indexed operator exclusion constraints, but
that may be too wordy.

...Robert


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Wed, 2009-09-16 at 15:11 +0200, tomas@tuxteam.de wrote:
> One question: does the operator have to be reflexive? I.e. "A op A holds
> for all A"?

I don't think that reflexivity is a strict requirement. You could make
this a constraint over a boolean attribute such that false conflicts
with true and true conflicts with false. That would mean that your table
would have to consist of either all false or all true.

> I am thinking "proximity" or as you state above "similarity". May be
> this is a good metaphor, leading to a good name.

That's an interesting idea: "proximity constraint". I like it because
(a) "proximity" might reasonably be considered a more general form of
the word "unique", which might satisfy Peter's argument; (b) it conveys
the use case; and (c) it sounds good.

There are a couple bizarre cases where "proximity" doesn't quite fit,
like my boolean example above, but I'm OK with that.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Wed, 2009-09-16 at 10:14 +0300, Peter Eisentraut wrote:
> What they should be called is generalized unique constraints, without
> reference to "index".  Because what they generalize is the operator by
> which uniqueness is determined.

How about GUC, for short?  ;-)

Do you think that Tomás's suggestion of "proximity constraints" would
satisfy your requirement on the basis that "proximity" is a more general
form of the word "unique"?

> First, we have so far been fairly consistent to document that unique
> indexes are an implementation detail of unique constraints and should
> usually not be used directly.  This new approach basically reverses that
> and forces you to define your constraints by means of implementation
> details rather than a logical description.  There is nothing in this
> feature that makes it strikingly different from the existing constraint
> types in a way that would prevent a reasonable syntax for defining the
> constraint at table definition time.  Another problem this would lead to
> is that a say dump of a table definition wouldn't actually contain all
> the constraints that apply to the table anymore, because there might be
> additional stuff such as this that can't be expressed that way.

Those are all good points. I think ultimately we need to support this at
table creation time and make index specification optional.

We do need to allow the user to specify the index, however. An important
use case of this feature is defining an index on (A, B, C) and using it
to enforce UNIQUE(A,B) and UNIQUE(A,C).

> CREATE TABLE circles (c circle UNIQUE ON gist &&);

Should we use USING instead of ON to be consistent with CREATE INDEX?

Also, right now a UNIQUE by itself creates a btree. Let's say the user
has btree_gist installed and they declare (a =, b =) without specifying
the AM. Which one do we choose?

I suppose we can come up with a default order, like btree, hash, gist,
gin; and just choose the first one with a matching opclass.

Also, we should decide whether UNIQUE(a,b) should be shorthand for (a =,
b =), or whether they should be treated differently somehow. If so,
we'll need to come up with some kind of rules for how it determines that
UNIQUE chooses to use a btree with indisunique enforcement; and we need
to come up with a way to force it to choose the new enforcement
mechanism in my patch if the user wants to.

> CREATE TABLE data (a int UNIQUE ON btree =);

I think we should provide some way for the user to specify what
enforcement mechanism is used when multiple options are possible. In
this example should it use the existing mechanism or the new mechanism?

> ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
> circles_idx;
> 
> doesn't seem very intuitive about what is actually being constrained.
> For a while I was thinking that it was constraining the table to values
> that are in the index or something.  So using a word such as UNIQUE
> would help explain what is going on.

I'm still uncomfortable using the word UNIQUE. I see that you are taking
an approach similar to ORDER BY ... USING. However, I'm concerned
because we're using a special case word to describe the general feature,
and I think that's confusing.

On the other hand, it's hard to argue that (a &&, b =) is intuitive, so
I'll acquiesce if you get some agreement from someone else.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
> I'm just getting started reviewing this version now.  I noticed that
> your patch seems to have been generated by git.

Ok, I now have a public git repo on git.postgresql.org, and I rebased my
patch before I pushed it.

See updates in my "generalized-constraints" branch. I have psql support
now, so it might be slightly easier to review.

The language and name of the feature are going through a little turmoil
right now, as you can see, so I'm trying to keep up with that. As that
settles down I'll improve the docs.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Wed, Sep 16, 2009 at 3:14 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
>> Instead of calling these generalized index constraints, I wonder if we
>> oughtn't to be calling them something like "don't-overlap constraints"
>> (that's a bad name, but something along those lines).  They're not
>> really general at all, except compared to uniqueness constraints (and
>> they aren't called generalized unique-index constraints, just
>> generalized index constraints).
>
> What they should be called is generalized unique constraints, without
> reference to "index".  Because what they generalize is the operator by
> which uniqueness is determined.

Well, it should eventually be possible to use this feature to create
an index which excludes overlapping ranges  in fact, unless I
misunderstand, that's the principle likely use case.  Which is not
unique-ness at all.

...Robert


Re: WIP: generalized index constraints

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Sep 16, 2009 at 09:45:52AM -0700, Jeff Davis wrote:
> On Wed, 2009-09-16 at 15:11 +0200, tomas@tuxteam.de wrote:
> > One question: does the operator have to be reflexive? I.e. "A op A holds
> > for all A"?
> 
> I don't think that reflexivity is a strict requirement. You could make
> this a constraint over a boolean attribute such that false conflicts
> with true and true conflicts with false. That would mean that your table
> would have to consist of either all false or all true.

Let me see whether I've understood this: more in general, op could be
"not equal" -- i.e. <>. Then, once one value was inserted into the
column, all other values would have to be equal.

[...]

> That's an interesting idea: "proximity constraint". I like it because
> (a) "proximity" might reasonably be considered a more general form of
> the word "unique", which might satisfy Peter's argument; (b) it conveys
> the use case; and (c) it sounds good.

Yes, "riding" on this geometric metaphor (distance), I was trying to
visualize relations which are symmetric but not refexive and came up
with things like
 "point X is at least d1 far, at most d2 far from Y"

i.e. Y are all points which are whithin a ring centered on Y, with inner
radius d1 and outer radius d2. Special cases would be d1=0, d2>0 (that's
conventional proximity) -- but there are weirder cases, as your example
above (d2 possibly infinite, d1 small, giving a "punctured space").

> There are a couple bizarre cases where "proximity" doesn't quite fit,
> like my boolean example above, but I'm OK with that.

Hmmm.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKsT88Bcgs9XrR2kYRAlmcAJ9+rP7AkimXRPoKGaBoJkthX2LzggCfTWst
KF/XMRouhlEbQcORaeoIbc0=
=BBEF
-----END PGP SIGNATURE-----


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
I think we have a reasonable consensus around the name "operator
exclusion constraints", Robert Haas's suggestion. I am OK with that
name, and it got support from David Fetter and Tom Lane. As David Fetter
said, it's useful for the name to hint at the API.

Peter had some reasonable objections to that name, but the word "unique"
just doesn't cut it for this feature. My feature allows constraints
which are more restrictive than a unique constraint; but the final straw
was after a discussion with Tomás in which we determined that you can
also define constraints which are the opposite of unique: all values
must be the same (by using <> as the operator*).

I agree with Peter that we should support creating these constraints at
table creation time. This can be supported with the following syntax:
 CONSTRAINT foo_constr (a <op>, ...)   { USING INDEX foo_idx | USING method }

and it's also a more declarative syntax for the ALTER TABLE case, and
prevents a series of other problems that Peter pointed out.

There's an important unresolved question with this patch that I need to
address, which just came to light: what about functional/expression
indexes?

Say you have a table foo(a text, b text) and an index on: ((a||b)::circle)

You could define an operator constraint like: ((a||b)::circle &&)

and that would be sane. But I suppose I should also allow any expression
with the same signature, like: ((b||a)::circle &&)

[ not a very realistic example, but it seems like it may be useful ]

Does that make sense? Does someone have a better idea? Am I missing
other issues here?

How do I test if two functions/expressions: a. are identical? b. have matching signatures?

Regards,Jeff Davis

*: Understandably, there is no strategy for <> for most data types.
However, if your constraint is that all values must be the same, it's
quite reasonable to add one and be able to use an index to quickly find
values that are different.



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> There's an important unresolved question with this patch that I need to
> address, which just came to light: what about functional/expression
> indexes?

What about them?  It's not clear why you think this requires anything
special.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
> What about them?  It's not clear why you think this requires anything
> special.

>From a syntax standpoint, I need to represent one operator for every
index column involved in the constraint. So, if there's a functional
index on ((a||b)::circle), I clearly can't have an exclusion constraint
like (a =, b =).

I see two options:
1. (<expr> <op>), where <expr> is an expression over table attributes    that must have the exact signature as the
expressionfor the index.2. (<index_col> <op>), and then read the expression from the index
 

and in either case, use that expression for the extra checking that I
need to do: I need to check whether the input heap tuple conflicts with
concurrently inserting heap tuples, and I also need to do a recheck
step.

#1 seems like extra work and complexity, because I need to test for the
correct signature (maybe that's not difficult), and that extra
flexibility is pretty marginal -- I can't think of an obvious case where
you'd want different expressions. Also, it complicates the simple case
of wanting the expressions to match.

#2 is awkward because the expression columns of an index have generated
names, and you would have to write things like (pg_expression_1 &&).
Also, it makes the constraint too tied to the index, which is a valid
complaint Peter had.

Perhaps you can point me in the right direction to see if two
expressions/functions have matching signatures? Or, if that is too much
of a pain, perhaps I should just test for equal expressions?

Regards,Jeff Davis




Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
>> What about them?  It's not clear why you think this requires anything
>> special.

>> From a syntax standpoint, I need to represent one operator for every
> index column involved in the constraint. So, if there's a functional
> index on ((a||b)::circle), I clearly can't have an exclusion constraint
> like (a =, b =).

> I see two options:

>  1. (<expr> <op>), where <expr> is an expression over table attributes 
>     that must have the exact signature as the expression for the index.
>  2. (<index_col> <op>), and then read the expression from the index

You need to do (1), I think, because (2) seems to require using the
index column name.  We have generally felt that the names assigned
to index columns are implementation artifacts that the user ought not
rely on in SQL commands.

> and in either case, use that expression for the extra checking that I
> need to do: I need to check whether the input heap tuple conflicts with
> concurrently inserting heap tuples, and I also need to do a recheck
> step.

I haven't read the patch, but this whole discussion sounds to me like
it means you're trying to plug things in at the wrong level.  Indexes
generally don't care where the values they are storing came from ---
whether it's a simple column or a expression result, it's all the same
to the index.  I don't see why that shouldn't be true for exclusion
constraints too.

BTW, further betraying that I've not read the patch: what exactly are
you doing about the information_schema views?  If we are treating these
things as SQL constraints, one would expect them to show up in
information_schema; but I don't see how to represent them there in any
adequate fashion, even without the expression-index angle.  On the whole
I think we'd be a lot better off to NOT consider them to be constraints,
but just another option for CREATE INDEX.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 15:26 -0400, Tom Lane wrote:
> I haven't read the patch, but this whole discussion sounds to me like
> it means you're trying to plug things in at the wrong level.  Indexes
> generally don't care where the values they are storing came from ---
> whether it's a simple column or a expression result, it's all the same
> to the index.  I don't see why that shouldn't be true for exclusion
> constraints too.

The design is that one backend needs to be able to see values being
inserted by other backends before commit. There are two ways I can see
to do this:

(a) have all concurrent inserters serialize doing something like: 1. acquire exclusive LWLock 2. search index for
conflictswith dirty snapshot and recheck if    necessary 3. insert into index 4. release exclusive LWLock
 

(b) do what I do now, which is to: 1. acquire exlusive LWLock 2. put self in table of concurrent inserters, along with
TIDof heap     tuple I'm inserting 3. release exclusive LWLock 4. acquire shared LWLock 5. copy potential conflicts to
localmemory 6. release shared LWLock 7. test for real conflicts between my heap tuple and the potentially
conflictingheap tuple (which can be found by TID). 8. search index with dirty snapshot for conflicts and recheck if
necessary9. insert tuple into index10. acquire exclusive LWLock11. remove self from table of concurrent inserters12.
releaseexclusive LWLock
 

Design (b) offers better concurrency because all conflict testing, index
searching, and index insertion take place without a lock at all. So, I
chose design (b). This has been out there for quite a long time[1][2],
and if it is an unacceptable design I need to know soon in order for
this feature to make it.

However, the consequence of (b) is that ExecInsertIndexTuples needs to
know about the translation from a heap tuple to an index tuple so that
the conflicts can be checked.

> BTW, further betraying that I've not read the patch: what exactly are
> you doing about the information_schema views?  If we are treating these
> things as SQL constraints, one would expect them to show up in
> information_schema; but I don't see how to represent them there in any
> adequate fashion, even without the expression-index angle.

Nothing right now. I think they should just be omitted from
information_schema, which can only (almost by definition) represent the
lowest common denominator features.

> On the whole
> I think we'd be a lot better off to NOT consider them to be constraints,
> but just another option for CREATE INDEX.

You suggested allowing an ALTER TABLE representation[3] and that design
has floated around for quite some time as well.

ALTER TABLE also has a major advantage: multiple constraints can use the
same index. For instance, an index on (a, b, c) can be used to enforce
both (a =, b =) and (a =, c =). You can't do that with btree, and it
could be a powerful feature that might cause some people to choose my
mechanism for a regular UNIQUE constraint over btree's existing
uniqueness mechanism.

So, I actually switched over the ALTER TABLE as my primary syntactic
representation, and dropped the CREATE INDEX variant (I think that would
be worthwhile to bring back as an extra option, but I haven't yet). If I
need to drop ALTER TABLE, I need to know soon.

Regards,Jeff Davis

[1] http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
[2] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php
[3] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00406.php





operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 10:48 -0700, Jeff Davis wrote:
>   CONSTRAINT foo_constr (a <op>, ...)
>     { USING INDEX foo_idx | USING method }

I am updating the syntax to be:
 CONSTRAINT foo_constr   EXCLUSION (a <op>, ...) { USING method | INDEX foo_idx };

First, I think EXCLUSION makes a perfect noun to fit in that place (like
"FOREIGN KEY").

Second, this makes it possible to avoid specifying the index, and the
system can create one for you by knowing the access method. That makes
the feature a little more declarative.

However, it still doesn't provide a way to express two constraints using
one index all within CREATE TABLE, because the index would need to be
defined before the constraints in that case. I don't see that as a
problem, but Peter had the following concern:

"Another problem this would lead to is that a say dump of a table
definition wouldn't actually contain all the constraints that apply to
the table anymore, because there might be additional stuff such as this
that can't be expressed that way." [1]

I don't think that's a serious problem, I just need to ensure that
indexes referenced by a constraint are dumped before the constraint
itself. Then, I can dump the operator exclusion constraints (OXCs) as
ALTER TABLEs. The "-t" option to pg_dump appears to already dump
constraints as separate ALTER TABLEs. Is there something that I'm
missing?

Regards,Jeff Davis

[1] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01018.php  



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> The design is that one backend needs to be able to see values being
> inserted by other backends before commit.

I don't understand why this isn't handled exactly the way unique
constraints are done now.  Frankly, the amount of added complexity you
propose below is enough to make me want to reject the patch forthwith;
given that it's going to be a relatively little-used feature, the bugs
are never going to be out of it completely if we do it like this.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 16:43 -0400, Tom Lane wrote:
> I don't understand why this isn't handled exactly the way unique
> constraints are done now.  Frankly, the amount of added complexity you
> propose below is enough to make me want to reject the patch forthwith;
> given that it's going to be a relatively little-used feature, the bugs
> are never going to be out of it completely if we do it like this.

Unique constraints lock the index page while the insert is happening.
How am I supposed to do that, when the conflicting values might be
anywhere in the index (circles have no total order)?

It may sound complex, but it basically boils down to a two stage
process:1. test for conflicts with concurrently-inserting backends2. test for conflicts that already exist in the index
(dirtyor not)
 

I don't think that it's ridiculously complex. In fact, I think there are
relatively few scenarios that will make any real difference, and those
scenarios can be tested with gdb pretty thoroughly.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2009-09-19 at 16:43 -0400, Tom Lane wrote:
>> I don't understand why this isn't handled exactly the way unique
>> constraints are done now.  Frankly, the amount of added complexity you
>> propose below is enough to make me want to reject the patch forthwith;
>> given that it's going to be a relatively little-used feature, the bugs
>> are never going to be out of it completely if we do it like this.

> Unique constraints lock the index page while the insert is happening.
> How am I supposed to do that, when the conflicting values might be
> anywhere in the index (circles have no total order)?

Well, you can't do it *exactly* the same way btree does, but what
I would envision is first insert the index tuple and then do a
dirty-snapshot search for conflicting tuples.  The interlock against
conflicting concurrent inserts doesn't need all this new infrastructure
you propose; just wait to see if conflicting transactions commit, same
as we do now.  And I do maintain that that sort of code has a high risk
of undetected bugs.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I am updating the syntax to be:

>   CONSTRAINT foo_constr
>     EXCLUSION (a <op>, ...) { USING method | INDEX foo_idx };

I'm still acutely uncomfortable with using CONSTRAINT syntax for this.
It is not a constraint per standard, because it's not going to be
displayable in information_schema.  Furthermore, by extending
standardized syntax you run the risk of being blindsided by future
additions to the standard.

> ... Peter had the following concern:

> "Another problem this would lead to is that a say dump of a table
> definition wouldn't actually contain all the constraints that apply to
> the table anymore, because there might be additional stuff such as this
> that can't be expressed that way." [1]

> I don't think that's a serious problem,

That objection is completely bogus.  pg_dump does not, and AFAIR never
has, promised to emit everything in the CREATE TABLE command.  It's far
more efficient and practical to emit indexes and constraints as separate
commands later, after the data has been loaded.  In the case of say
foreign key constraints, it's absolutely necessary to do it separately,
else you can't implement circular constraint references.  Besides, we
already have many cases where indexes have to be emitted separately
because they don't fit into the CONSTRAINT syntax: expression indexes
and nondefault opclasses to name two.

The point about being able to support multiple constraints with one
index is kind of interesting, but I don't actually think that that's
so useful that it should override all other considerations about what
syntax we should pick.  I think we should drop the whole thing and
just treat this as an extension to the CREATE INDEX syntax.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 18:00 -0400, Tom Lane wrote:
> Well, you can't do it *exactly* the same way btree does, but what
> I would envision is first insert the index tuple and then do a
> dirty-snapshot search for conflicting tuples.  The interlock against
> conflicting concurrent inserts doesn't need all this new infrastructure
> you propose; just wait to see if conflicting transactions commit, same
> as we do now.  And I do maintain that that sort of code has a high risk
> of undetected bugs.

How do you prevent deadlocks in the following case?

T1: inserts into index
T2: inserts into index
T1: checks index for conflicts, finds T2
T2: checks index for conflicts, finds T1

We can't say "only wait if your xid is higher" because xid 200 may both
insert and check the index before xid 100 even inserts.

The way I solve this in my current patch is by assigning a sequence
number in a shared memory table for each insert. The sequence number
works because a higher sequence number will always be able to see a
lower sequence number's tuple, so we can safely say "only wait if you
have a higher sequence number".

I can tack the same solution onto your idea, but I would need to keep my
shared memory table and probably some other infrastructure. It may be
less complex than it is currently, however. Simpler ideas welcome.

And to clarify the syntax issue, I assume this means that: ((a||b)::circle &&)

would look for the column in the index that matches that expression, and
then use that attribute number when scanning the index? I'm OK with
that; I don't see a lot of obvious value in having separate expressions
for the constraint and the index (even if it did have value, it would
take some real creativity to find it ;)

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 18:35 -0400, Tom Lane wrote:
> I'm still acutely uncomfortable with using CONSTRAINT syntax for this.
> It is not a constraint per standard, because it's not going to be
> displayable in information_schema.  Furthermore, by extending
> standardized syntax you run the risk of being blindsided by future
> additions to the standard.

Ok.

> The point about being able to support multiple constraints with one
> index is kind of interesting, but I don't actually think that that's
> so useful that it should override all other considerations about what
> syntax we should pick.  I think we should drop the whole thing and
> just treat this as an extension to the CREATE INDEX syntax.

Perhaps ALTER INDEX ADD EXCLUSION CONSTRAINT or some other command? And
CREATE INDEX can offer the ability as a shorthand?

I would still really like to decouple this from CREATE INDEX because of
two reasons: 1. Cannot support multiple constraints per index very easily. I think     this is a significant feature.
2.Must decide to make constraint at the same time as making the     index, and once it's there, you can't remove it
withoutdropping     the index.
 

I think either of these still tie the concept to implementation, because
creating the index is always explicit. Peter seemed concerned about
that, and I think that concern is valid, but I can live with it. If we
really want them to be declarative, we could invent a new command.

Regards,Jeff Davis





Re: operator exclusion constraints [was: generalized index constraints]

From
David Fetter
Date:
On Sat, Sep 19, 2009 at 04:40:19PM -0700, Jeff Davis wrote:
> On Sat, 2009-09-19 at 18:35 -0400, Tom Lane wrote:
> > I'm still acutely uncomfortable with using CONSTRAINT syntax for this.
> > It is not a constraint per standard, because it's not going to be
> > displayable in information_schema.  Furthermore, by extending
> > standardized syntax you run the risk of being blindsided by future
> > additions to the standard.
> 
> Ok.

It just occurred to me that SQL:2008 ASSERTION might already fit this
feature. :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: generalized index constraints

From
Robert Haas
Date:
On Sat, Sep 19, 2009 at 2:51 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
>> What about them?  It's not clear why you think this requires anything
>> special.
>
> >From a syntax standpoint, I need to represent one operator for every
> index column involved in the constraint. So, if there's a functional
> index on ((a||b)::circle), I clearly can't have an exclusion constraint
> like (a =, b =).
>
> I see two options:
>
>  1. (<expr> <op>), where <expr> is an expression over table attributes
>    that must have the exact signature as the expression for the index.
>  2. (<index_col> <op>), and then read the expression from the index

I was wondering if we couldn't introduce a dummy tuple name similar to
OLD and NEW, called, say, OTHER.  Then instead of writing a =, you
could write a = OTHER.a ... or perhaps a = OTHER.b ... although that
might also open the door to more things than you want to support at
this point.

...Robert


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 23:15 -0400, Robert Haas wrote:
> I was wondering if we couldn't introduce a dummy tuple name similar to
> OLD and NEW, called, say, OTHER.  Then instead of writing a =, you
> could write a = OTHER.a ... or perhaps a = OTHER.b ... although that
> might also open the door to more things than you want to support at
> this point.

Interesting idea. At this point though, there is enough disagreement
over the language that I just want to take the least-invasive route that
has the lowest chance of causing a problem. It looks like ALTER INDEX
might be the path of least resistance.

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sat, 2009-09-19 at 19:23 -0700, David Fetter wrote:
> It just occurred to me that SQL:2008 ASSERTION might already fit this
> feature. :)

I think I would only be able to enforce very specific types of
assertions that match the template. As I said to Robert, I think I'm
going to use ALTER INDEX for the syntax because it appears to be the
path of least resistance.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2009-09-19 at 18:00 -0400, Tom Lane wrote:
>> Well, you can't do it *exactly* the same way btree does, but what
>> I would envision is first insert the index tuple and then do a
>> dirty-snapshot search for conflicting tuples.  The interlock against
>> conflicting concurrent inserts doesn't need all this new infrastructure
>> you propose; just wait to see if conflicting transactions commit, same
>> as we do now.  And I do maintain that that sort of code has a high risk
>> of undetected bugs.

> How do you prevent deadlocks in the following case?

> T1: inserts into index
> T2: inserts into index
> T1: checks index for conflicts, finds T2
> T2: checks index for conflicts, finds T1

You get a deadlock failure, because both transactions will wait for each
other.  So what?  It's an error in any case, and you can get a reported
deadlock in constraint-enforcement scenarios today (conflicting FK
changes, for instance).
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I would still really like to decouple this from CREATE INDEX because of
> two reasons:
>   1. Cannot support multiple constraints per index very easily. I think 
>      this is a significant feature.
>   2. Must decide to make constraint at the same time as making the 
>      index, and once it's there, you can't remove it without dropping 
>      the index.

I don't actually find either of those arguments to be credible in the
least.  I don't think that people will find it useful to enforce
multiple constraints with one index, and I don't believe that they'll
design an index without knowledge of the constraint they will enforce
with it.  The closest precedent we have is the UNIQUE constraint.
How often have we had requests to add or drop UNIQUE in an existing
index?  Maybe there were more than zero, but not by a lot.

As an example of why I don't believe the first item, consider something
likecreate index ... (a = , b = )
(or whatever the syntax is to exclude equality on each column
separately).  Yeah, it will work, but have you considered the efficiency
implications?  Searching such an index for b, independently of a, is
going to suck to such an extent that you'd be *far* better off building
two separate indexes.  We do not have, and are unlikely ever to have,
index types in which a search that doesn't constrain the first index
column is efficient.
        regards, tom lane


Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 12:31 -0400, Tom Lane wrote:
> > T1: inserts into index
> > T2: inserts into index
> > T1: checks index for conflicts, finds T2
> > T2: checks index for conflicts, finds T1
> 
> You get a deadlock failure, because both transactions will wait for each
> other.  So what?  It's an error in any case, and you can get a reported
> deadlock in constraint-enforcement scenarios today (conflicting FK
> changes, for instance).

Well, in theory, one of the transactions may have been destined to be
aborted later anyway, and the deadlock detector might kill the wrong
one. But I agree, perhaps I'm over-thinking this one.

Aside: I just realized that my solution to the deadlock problem won't
work with your simpler idea anyway. When reading the index we've long
since lost the information about the specific insert of the specific
command of the other transaction.

I'll make the change.

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Tom Lane
Date:
BTW, I just thought of an issue that might change some of these
conclusions: what about supporting deferred constraint checking,
as we just recently managed to do for straight UNIQUE constraints?
I don't say that this has to be in the first cut, but it's something
we ought to try to leave room for down the road.

The current infrastructure for deferred uniqueness requires that the
thing actually be a constraint, with an entry in pg_constraint that
can carry the deferrability options.  So unless we want to rethink
that, this might be a sufficient reason to override my arguments
about not wanting to use CONSTRAINT syntax.

As far as implementation goes, I think there would be very little
choice but to use the insert-the-index-entry-first, check-later
approach; so your ideas involving extra state in shared memory
seem to fall to the ground anyhow.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 12:45 -0400, Tom Lane wrote:
> How often have we had requests to add or drop UNIQUE in an existing
> index?  Maybe there were more than zero, but not by a lot.

Ok.

> As an example of why I don't believe the first item, consider something
> like
>     create index ... (a = , b = )
> (or whatever the syntax is to exclude equality on each column
> separately).  Yeah, it will work, but have you considered the efficiency
> implications?  Searching such an index for b, independently of a, is
> going to suck to such an extent that you'd be *far* better off building
> two separate indexes.  We do not have, and are unlikely ever to have,
> index types in which a search that doesn't constrain the first index
> column is efficient.

My use case was something else:

An index on (a, b, c) enforcing the constraints UNIQUE(a, b) and
UNIQUE(a, c).

UNIQUE(a, b) can be enforced efficiently. UNIQUE(a, c) might be less
efficient depending on the selectivity of "a", but as long as "a" is
selective I think it's useful. The alternative is updating two indices
on every insert.

You may still think this use case is too marginal to bother supporting,
but I never made an argument for the use case you described above.

If we move away from multiple constraints per index, are you suggesting
that I also move the constraints out of pg_constraint and back into
pg_index?

Regards,Jeff Davis



Re: WIP: generalized index constraints

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 13:01 -0400, Tom Lane wrote:
> The current infrastructure for deferred uniqueness requires that the
> thing actually be a constraint, with an entry in pg_constraint that
> can carry the deferrability options.  So unless we want to rethink
> that, this might be a sufficient reason to override my arguments
> about not wanting to use CONSTRAINT syntax.

Ok. Using the word EXCLUSION would hopefully guard us against future
changes to SQL, but you know more about the subtle dangers of language
changes than I do.

So, do I still omit it from information_schema?

> As far as implementation goes, I think there would be very little
> choice but to use the insert-the-index-entry-first, check-later
> approach; so your ideas involving extra state in shared memory
> seem to fall to the ground anyhow.

True.

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> My use case was something else:

> An index on (a, b, c) enforcing the constraints UNIQUE(a, b) and
> UNIQUE(a, c).

> UNIQUE(a, b) can be enforced efficiently. UNIQUE(a, c) might be less
> efficient depending on the selectivity of "a", but as long as "a" is
> selective I think it's useful. The alternative is updating two indices
> on every insert.

> You may still think this use case is too marginal to bother supporting,
> but I never made an argument for the use case you described above.

You're right, it still seems remarkably marginal.  I'm rethinking
my position on use of CONSTRAINT syntax because of the deferrability
issue, but I'm still unconvinced that we need to allow the constraints
to be decoupled from the indexes.
        regards, tom lane


Re: WIP: generalized index constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> So, do I still omit it from information_schema?

My thought is yes --- any representation of it within information_schema
would be so inaccurate/incomplete as to be worse than useless, IMO.
Peter might have a different idea though ...
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 13:13 -0400, Tom Lane wrote:
> You're right, it still seems remarkably marginal.  I'm rethinking
> my position on use of CONSTRAINT syntax because of the deferrability
> issue, but I'm still unconvinced that we need to allow the constraints
> to be decoupled from the indexes.

Ok, should I explicitly disallow multiple constraints on one index then?

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sun, 2009-09-20 at 13:13 -0400, Tom Lane wrote:
>> You're right, it still seems remarkably marginal.  I'm rethinking
>> my position on use of CONSTRAINT syntax because of the deferrability
>> issue, but I'm still unconvinced that we need to allow the constraints
>> to be decoupled from the indexes.

> Ok, should I explicitly disallow multiple constraints on one index then?

What I'm arguing for is a syntax in which the question doesn't even
arise, ie, a CONSTRAINT doesn't reference an existing index at all.
If that's not possible for whatever reason, then I think that
disallowing multiple references isn't going to buy any simplicity.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 13:28 -0400, Tom Lane wrote:
> What I'm arguing for is a syntax in which the question doesn't even
> arise, ie, a CONSTRAINT doesn't reference an existing index at all.
> If that's not possible for whatever reason, then I think that
> disallowing multiple references isn't going to buy any simplicity.

I believe that syntax is possible by specifying the index access method,
e.g.:
 CONSTRAINT <name> EXCLUSION (a =, b &&) USING gist;

versus:
 CONSTRAINT <name> EXCLUSION (a =, b &&) INDEX <indexname>;

And the former could build the index implicitly. I haven't written the
code yet, but I don't see any major problems.

So, should I eliminate the latter syntax and only support the former, or
should I support both?

Regards,Jeff Davis





Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I believe that syntax is possible by specifying the index access method,
> e.g.:

>   CONSTRAINT <name> EXCLUSION (a =, b &&) USING gist;

> versus:

>   CONSTRAINT <name> EXCLUSION (a =, b &&) INDEX <indexname>;

> And the former could build the index implicitly. I haven't written the
> code yet, but I don't see any major problems.

> So, should I eliminate the latter syntax and only support the former, or
> should I support both?

I'd vote for only supporting the former.

What worries me more about that syntax is the postfix-operator ambiguity
--- I think it'll be hard to expand it to expressions.  It might be
better to put the operator at the front; or maybe you need an extra
keyword in there.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 13:49 -0400, Tom Lane wrote:
> I'd vote for only supporting the former.

Ok.

I just did some brief non-scientific in-memory benchmarks. I think it
has promise, but for now I think it can safely be set aside.

Results appended.

> What worries me more about that syntax is the postfix-operator ambiguity
> --- I think it'll be hard to expand it to expressions.  It might be
> better to put the operator at the front; or maybe you need an extra
> keyword in there.

How about "OPERATOR", like:
 CONSTRAINT <name>   EXCLUSION (<expr> OPERATOR <op>, ...)   USING <method>;

I like it because it connects back to the name "operator exclusion
constraint".

Regards,Jeff Davis

---------------------------
Results (oversimplified benchmark):

As a control, two unique btrees (using old uniqueness mechanism) takes
37s.

DDL (old syntax, haven't changed it yet):
 create table one(a int, b int, c int); create index one_a_b_c_idx on one(a,b,c); alter table one add constraint
one_a_b_constr  exclusion (a =, b =) using one_a_b_c_idx; alter table one add constraint one_a_c_constr   exclusion (a
=,c =) index one_a_b_c_idx;
 
 create table two(a int, b int, c int); create index two_a_b_idx on two(a,b); create index two_a_c_idx on two(a,c);
altertable two add constraint two_a_c_constr   exclusion (a =, c =) index two_a_c_idx; alter table two add constraint
two_a_b_constr  exclusion (a =, b =) index two_a_b_idx;
 

Tests are of the form:
 -- test inserting into table with one big index with 10 "b"  -- values per "a" value insert into one select g1, g2, g2
  from generate_series(1,100000) g1, generate_series(1,10) g2;
 

n: number of "a" values per "b" value
t1: results for one-index solution
t2: results for two-index solution
   n     t1     t2
-------+------+------- 1000 | 105s | 57s  100 |  47s | 54s   10 |  44s | 53s    1 |  42s | 56s

So, the one-index solution shows about 10-20% benefit over the two-index
solution when the number of "b" values per "a" value drops to around
100. Not bad, but nothing to write home about, because it's still
outperformed by the existing btree enforcement mechinism. I think it has
promise for some situations though; such as larger key size, leaf pages
not in memory, etc.



operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
Update on operator exclusion constraints (OXC for short):

After a lot of discussion, I think a lot of progress has been made. Here
is my summary, please let me know if I've left anything out or not
addressed some concern.

1. Constraint syntax, part of CREATE/ALTER TABLE:
 [CONSTRAINT <name>] EXCLUSION (<expr> OPERATOR <op>, ...)   USING <method>   [ DEFERRABLE | NOT DEFERRABLE ]   [
INITIALLYDEFERRED | INITIALLY IMMEDIATE ];
 

Table constraint syntax was chosen because of the ability to support
DEFERRABLE[1] and the interest in a more declarative syntax[2].

We omitted the [INDEX <indexname>] clause because the usefulness of
defining multiple constraints using one index or defining the constraint
separately from the index was judged to be too marginal[3][4][5]. Some
brief benchmarks showed some promise[6], perhaps interesting to explore
later.

Also, we introduce the OPERATOR keyword in between the expression and
the operator to disambiguate the syntax[5]. Nobody has affirmed the use
of OPERATOR for the disambiguation, but it seems like the obvious choice
to me.

2. information_schema

We omit operator exclusion constraints from the information schema, on
the grounds that there is no way to represent them usefully there[7][8].

3. Simplify the constraint checking procedure itself

Tom suggested a simpler constraint-checking procedure[9]. It introduces
the rare possibility of deadlocks[10], but that possibility exists for
other constraints anyway[11]. My scheme for avoiding deadlocks was
significantly more complex, and would become even more complex for
deferrable constraints.

4. <expr> is an expression over the table's attributes and will be used
to generate a functional index with the same expression to enforce the
constraint.

5. We reject non-symmetric operators[12], like >, but allow
non-reflexive operators[13] like <>.

6. Semantics of constraint[14] are such that for any two tuples A and B,
and for a constraint:
 EXCLUSION (e1 OPERATOR <op1>, ..., eN OPERATOR <opN>)

the constraint is violated if:
 A.e1 <op1> B.e1 AND ...             AND A.eN <opN> B.eN

7. LIKE is still unresolved. I don't have a strong opinion here.

When INCLUDING CONSTRAINTS and INCLUDING INDEXES are both specified: a. copy all OXCs and indexes b. copy no OXCs or
indexes
When INCLUDING CONSTRAINTS is specified but not INCLUDING INDEXES: a. copy all OXCs and indexes b. copy no OXCs or
indexes
When INCLUDING INDEXES is specified but not INCLUDING CONSTRAINTS: a. copy all OXCs, including indexes b. copy all
indexescreated implicitly for OXCs, but not the     constraints themselves c. copy no OXCs or indexes
 

We can also emit various types of messages if we think the user is
making a mistake.

UNIQUE behavior here doesn't provide a good cue, because the constraint
is implemented inside the index, so copying the index does copy the
constraint. Brendan made a strong argument[15] that the behavior of LIKE
with UNIQUE is wrong, but I don't know if we want to try to fix that
now. I'd like some more input before I actually take care of this item.



The rest of the issues were mostly non-controversial. I will start
making some of these changes and post an updated patch and TODO list.

Regards,Jeff Davis




[1] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01352.php
[2] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01018.php
[3] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01348.php
[4] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php
[5] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01360.php
[6] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01369.php
[7] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01310.php
[8] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01356.php
[9] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01315.php
[10] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01317.php
[11] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01347.php
[12] http://archives.postgresql.org/pgsql-hackers/2009-09/msg00977.php
[13] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01039.php
[14] http://archives.postgresql.org/pgsql-hackers/2009-09/msg00971.php
[15] http://archives.postgresql.org/pgsql-hackers/2009-09/msg00755.php



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> 1. Constraint syntax, part of CREATE/ALTER TABLE:

>   [CONSTRAINT <name>] EXCLUSION (<expr> OPERATOR <op>, ...)

Have you actually built this grammar?  I don't think it avoids the
problem, because OPERATOR is possible within a_expr.

Also, don't forget the possibility of wanting a nondefault opclass.
(I'm wondering a bit if anyone will want a WHERE clause, too, though
adding that later shouldn't pose any big syntactic obstacles.)

> Brendan made a strong argument[15] that the behavior of LIKE
> with UNIQUE is wrong, but I don't know if we want to try to fix that
> now. I'd like some more input before I actually take care of this item.

That's really a separate issue, but I think we need to do something to
make it more consistent.  My first thought is that anything made
via CONSTRAINT syntax ought to be copied by LIKE INCLUDING CONSTRAINTS,
while LIKE INCLUDING INDEXES should copy anything you made via CREATE
INDEX.  But note this assumes that there is a clear distinction between
the two.  The constraint-depending-on-index design that you started
with would not permit such a rule, or at least it would mean that
INCLUDING CONSTRAINTS EXCLUDING INDEXES would have failure cases.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 17:54 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > 1. Constraint syntax, part of CREATE/ALTER TABLE:
> 
> >   [CONSTRAINT <name>] EXCLUSION (<expr> OPERATOR <op>, ...)
> 
> Have you actually built this grammar?  I don't think it avoids the
> problem, because OPERATOR is possible within a_expr.
> 
> Also, don't forget the possibility of wanting a nondefault opclass.
> (I'm wondering a bit if anyone will want a WHERE clause, too, though
> adding that later shouldn't pose any big syntactic obstacles.)

I suppose I should just allow any index_elem. The only way I was able to
make the grammar for that work is by using a reserved keyword. The
possibilities that make the most sense to me are:
 index_elem WITH any_operator index_elem WITH OPERATOR any_operator index_elem CHECK any_operator index_elem CHECK
OPERATORany_operator
 

Do any of these look acceptable?

Also, I should allow for a tablespace, as well. Because it's specified
with UNIQUE as "USING INDEX TABLESPACE foo", to be consistent I need to
move the "USING method" ahead like so:
 CONSTRAINT <name> EXCLUSION [USING method]   (<index_elem> CHECK <op>, ...)   [USING INDEX TABLESPACE
<tablespacename>]  [DEFERRABLE | NOT DEFERRABLE ]   [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 

Having the method before the attribute list makes it more consistent
with CREATE INDEX, as well.

> That's really a separate issue, but I think we need to do something to
> make it more consistent.  My first thought is that anything made
> via CONSTRAINT syntax ought to be copied by LIKE INCLUDING CONSTRAINTS,
> while LIKE INCLUDING INDEXES should copy anything you made via CREATE
> INDEX.

Works for me.

> But note this assumes that there is a clear distinction between
> the two.  The constraint-depending-on-index design that you started
> with would not permit such a rule, or at least it would mean that
> INCLUDING CONSTRAINTS EXCLUDING INDEXES would have failure cases.

Sounds reasonable. If we decide to support that kind of thing in the
future, we can handle that case somehow (an error seems reasonable to
me).

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I suppose I should just allow any index_elem. The only way I was able to
> make the grammar for that work is by using a reserved keyword. The
> possibilities that make the most sense to me are:

>   index_elem WITH any_operator
>   index_elem WITH OPERATOR any_operator
>   index_elem CHECK any_operator
>   index_elem CHECK OPERATOR any_operator

> Do any of these look acceptable?

I'd vote for "CHECK", out of that list.  WITH has no mnemonic value
whatever.

I'm not that thrilled with CHECK either, mainly because it seems like
it ought to check that the operator condition *does* hold, whereas
you're going to check that it *doesn't* hold.  But perhaps the EXCLUSION
up front will be enough to set people straight.

BTW, are you sure EXCLUSION doesn't have to become a reserved word for
this?  I notice that FOREIGN, CHECK, and UNIQUE all are, which makes me
suspicious ...
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 19:42 -0400, Tom Lane wrote:
> BTW, are you sure EXCLUSION doesn't have to become a reserved word for
> this?  I notice that FOREIGN, CHECK, and UNIQUE all are, which makes me
> suspicious ...

All of those (except FOREIGN) can be used as a column constraint as
well, and that might be necessary for a reason similar to the reason I
need to use a reserved word (i.e. they can come after a DEFAULT
expression). Is it possible that FOREIGN doesn't really have to be a
reserved word, but was just included because the others were?

I'm not an expert on the matter, but it does appear to compile and
recognize the grammar with EXCLUSION as an unreserved keyword. I'm in
the middle of changing a lot of things around, so I can't say that it
works beyond that.

Regards,Jeff Davis





Re: WIP: generalized index constraints

From
Peter Eisentraut
Date:
On Sun, 2009-09-20 at 10:08 -0700, Jeff Davis wrote:
> On Sun, 2009-09-20 at 13:01 -0400, Tom Lane wrote:
> > The current infrastructure for deferred uniqueness requires that the
> > thing actually be a constraint, with an entry in pg_constraint that
> > can carry the deferrability options.  So unless we want to rethink
> > that, this might be a sufficient reason to override my arguments
> > about not wanting to use CONSTRAINT syntax.
> 
> Ok. Using the word EXCLUSION would hopefully guard us against future
> changes to SQL, but you know more about the subtle dangers of language
> changes than I do.
> 
> So, do I still omit it from information_schema?

I would say yes.

Overall, I think this terminology is pretty good now.  We could say,
PostgreSQL has a new constraint type, exclusion constraint.  It shares
common properties with other constraint types, e.g., deferrability (in
the future), ADD/DROP CONSTRAINT, etc.  But because the standard does
not describe exclusion constraints, they are not listed in the
information schema.



Re: operator exclusion constraints [was: generalized index constraints]

From
Peter Eisentraut
Date:
On Sun, 2009-09-20 at 19:42 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > I suppose I should just allow any index_elem. The only way I was able to
> > make the grammar for that work is by using a reserved keyword. The
> > possibilities that make the most sense to me are:
> 
> >   index_elem WITH any_operator
> >   index_elem WITH OPERATOR any_operator
> >   index_elem CHECK any_operator
> >   index_elem CHECK OPERATOR any_operator
> 
> > Do any of these look acceptable?
> 
> I'd vote for "CHECK", out of that list.  WITH has no mnemonic value
> whatever.

Using CHECK as part of the syntax of an EXCLUSION constraint will surely
confuse the whole thing with CHECK constraints.

USING OPERATOR is available, I think.




Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Wed, 2009-09-23 at 15:10 +0300, Peter Eisentraut wrote:
> Using CHECK as part of the syntax of an EXCLUSION constraint will surely
> confuse the whole thing with CHECK constraints.
> 
> USING OPERATOR is available, I think.

USING won't work because one of the ways to specify the opclass in an
index_elem is something like:
 CREATE INDEX foo_idx on foo (i USING int4_ops);

which appears to be undocumented, and it's not obvious to me why that is
useful. The normal way is just:
 CREATE INDEX foo_idx on foo (i int4_ops);

Because I am allowing any index_elem for exclusion constraints, that
conflicts with the word USING.

We can either eliminate the USING variant from opt_class (unless it's
necessary for some reason or I missed it in the documentation), or we
can use another word (e.g. WITH or WITH OPERATOR) if you don't like
CHECK.

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> We can either eliminate the USING variant from opt_class (unless it's
> necessary for some reason or I missed it in the documentation), or we
> can use another word (e.g. WITH or WITH OPERATOR) if you don't like
> CHECK.

Hmm ... we don't seem to have documented the USING noise-word, so it
probably would be safe to remove it; but why take a chance?  I don't
particularly agree with Peter's objection to CHECK.  There are plenty
of examples in SQL of the same keyword being used for different purposes
in nearby places.  Indeed you could make about the same argument to
object to USING, since it'd still be there in "USING access_method"
elsewhere in the same command.

I think that USING is just about as content-free as WITH in this
particular example --- it doesn't give you any hint about what the
purpose of the operator is.
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Robert Haas
Date:
On Wed, Sep 23, 2009 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> We can either eliminate the USING variant from opt_class (unless it's
>> necessary for some reason or I missed it in the documentation), or we
>> can use another word (e.g. WITH or WITH OPERATOR) if you don't like
>> CHECK.
>
> Hmm ... we don't seem to have documented the USING noise-word, so it
> probably would be safe to remove it; but why take a chance?  I don't
> particularly agree with Peter's objection to CHECK.  There are plenty
> of examples in SQL of the same keyword being used for different purposes
> in nearby places.  Indeed you could make about the same argument to
> object to USING, since it'd still be there in "USING access_method"
> elsewhere in the same command.
>
> I think that USING is just about as content-free as WITH in this
> particular example --- it doesn't give you any hint about what the
> purpose of the operator is.

USING might be just as content-free as WITH, but USING OPERATOR seems
clearly better, at least IMO.

Also, this patch has not been updated in a week, and the clock is
ticking: if we don't have an updated version RSN, we need to move this
to Returned with Feedback and wait until next CommitFest.  That would
be too bad; this is an awesome feature.

Thanks,

...Robert


Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 23, 2009 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that USING is just about as content-free as WITH in this
>> particular example --- it doesn't give you any hint about what the
>> purpose of the operator is.

> USING might be just as content-free as WITH, but USING OPERATOR seems
> clearly better, at least IMO.

It's not enough better to justify the conflict with USING opclass, IMO.

An idea that just struck me is CHECK WITH, ie
EXCLUSION (expr CHECK WITH operator)
        regards, tom lane


Re: operator exclusion constraints [was: generalized index constraints]

From
Robert Haas
Date:
On Sun, Sep 27, 2009 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Sep 23, 2009 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think that USING is just about as content-free as WITH in this
>>> particular example --- it doesn't give you any hint about what the
>>> purpose of the operator is.
>
>> USING might be just as content-free as WITH, but USING OPERATOR seems
>> clearly better, at least IMO.
>
> It's not enough better to justify the conflict with USING opclass, IMO.
>
> An idea that just struck me is CHECK WITH, ie
>
>        EXCLUSION (expr CHECK WITH operator)

I don't like that as well as USING OPERATOR, but I like it far better
than any of the single-word choices, so maybe it's a reasonable
compromise.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
Attached is a new patch. I ran it through filterdiff, but in case that
didn't work for some reason, I attached a gzipped version of the
original unified diff produced by git.

 * Changed underlying algorithm to match Tom's suggestion: do the second
index lookup after already inserting into the first.
 * Language changes, including the latest "<expression> CHECK WITH <op>"
idea from Tom, seconded by Robert Haas.
 * automatically builds index for you, no need to create it separately,
just specify the index AM (or let it default to btree)
 * Only one constraint per index is allowed, and the index is treated
entirely as an internal implementation detail.
 * Support for predicates (partial constraints/partial index)
 * Support for expressions
 * Support for other index options, like WITH list and USING INDEX
TABLESPACE
 * Docs updated and improved
 * Tests updated
 * Full recheck support (the previous recheck mechanism didn't work for
expressions)
 * Make information_schema ignore operator exclusion constraints
 * error message improvements

When testing/reviewing, use the documentation from CREATE TABLE, but use
the ALTER TABLE variant instead. Right now the CREATE TABLE variant
doesn't work (see below).

There is still a significant TODO list:

 * CREATE TABLE -- right now, it only works for ALTER TABLE, and the
docs are lying. Coming soon.

 * psql - haven't updated it to keep up with the language changes

 * pg_dump

 * LIKE

 * Inheritance

 * Enforce on existing tuples when the constraint is created -- This is
intertwined with inheritance, I think, and I am still working on that.
Obviously, this is an important TODO item to get the patch ready for
commit.

 * Deferrability (optional for now) -- I need the trigger to be able to
perform the check as well. It looks like it has most of the information
necessary, but I'm trying to determine where would be the cleanest place
to export the constraint checking function so that it can be called by
the trigger as well as ExecInsertIndexTuples and the bulk checker (that
checks existing tuples at the time the constraint is added).

 * GIN support (optional for now) -- I need to create a gingettuple
method. It would have to be a wrapper around gingetbitmap, and would not
be any more efficient than gingetbitmap, but it would allow my patch to
work for GIN indexes.

I think I've made some progress this commitfest, both in terms of
decisions made (thanks to everyone who provided input and direction),
and also in terms of code. I would still appreciate more review during
this commitfest, but realistically, it will still be at least another
week before I can say that I'm done with all open items.

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Sun, Sep 27, 2009 at 5:47 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Attached is a new patch. I ran it through filterdiff, but in case that
> didn't work for some reason, I attached a gzipped version of the
> original unified diff produced by git.
>
>  * Changed underlying algorithm to match Tom's suggestion: do the second
> index lookup after already inserting into the first.
>  * Language changes, including the latest "<expression> CHECK WITH <op>"
> idea from Tom, seconded by Robert Haas.
>  * automatically builds index for you, no need to create it separately,
> just specify the index AM (or let it default to btree)
>  * Only one constraint per index is allowed, and the index is treated
> entirely as an internal implementation detail.
>  * Support for predicates (partial constraints/partial index)
>  * Support for expressions
>  * Support for other index options, like WITH list and USING INDEX
> TABLESPACE
>  * Docs updated and improved
>  * Tests updated
>  * Full recheck support (the previous recheck mechanism didn't work for
> expressions)
>  * Make information_schema ignore operator exclusion constraints
>  * error message improvements
>
> When testing/reviewing, use the documentation from CREATE TABLE, but use
> the ALTER TABLE variant instead. Right now the CREATE TABLE variant
> doesn't work (see below).
>
> There is still a significant TODO list:
>
>  * CREATE TABLE -- right now, it only works for ALTER TABLE, and the
> docs are lying. Coming soon.
>
>  * psql - haven't updated it to keep up with the language changes
>
>  * pg_dump
>
>  * LIKE
>
>  * Inheritance
>
>  * Enforce on existing tuples when the constraint is created -- This is
> intertwined with inheritance, I think, and I am still working on that.
> Obviously, this is an important TODO item to get the patch ready for
> commit.
>
>  * Deferrability (optional for now) -- I need the trigger to be able to
> perform the check as well. It looks like it has most of the information
> necessary, but I'm trying to determine where would be the cleanest place
> to export the constraint checking function so that it can be called by
> the trigger as well as ExecInsertIndexTuples and the bulk checker (that
> checks existing tuples at the time the constraint is added).
>
>  * GIN support (optional for now) -- I need to create a gingettuple
> method. It would have to be a wrapper around gingetbitmap, and would not
> be any more efficient than gingetbitmap, but it would allow my patch to
> work for GIN indexes.
>
> I think I've made some progress this commitfest, both in terms of
> decisions made (thanks to everyone who provided input and direction),
> and also in terms of code. I would still appreciate more review during
> this commitfest, but realistically, it will still be at least another
> week before I can say that I'm done with all open items.

In that case, I think we should target this for the next CommitFest.
Especially given the number and complexity of the patches remaining
for this CommitFest, I feel very uncomfortable with the idea of
waiting another week for a new patch version, and then possibly still
needing further changes before it is finally committed.   While we
allow patches to be resubmitted for the same CommitFest, this is
intended to be for minor adjustments, not significant rewrites.

So I'm going to mark this Returned with Feedback.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-09-27 at 21:38 -0400, Robert Haas wrote:
> In that case, I think we should target this for the next CommitFest.
> Especially given the number and complexity of the patches remaining
> for this CommitFest, I feel very uncomfortable with the idea of
> waiting another week for a new patch version, and then possibly still
> needing further changes before it is finally committed.   While we
> allow patches to be resubmitted for the same CommitFest, this is
> intended to be for minor adjustments, not significant rewrites.

OK, I expected that to be the case. I got significant feedback at the
beginning of this commitfest that required some substantial language
changes. I did find this commitfest extremely productive for my feature.

Right now I'm trying to provide some useful feedback to Paval for his
patch.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Sun, Sep 27, 2009 at 10:13 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2009-09-27 at 21:38 -0400, Robert Haas wrote:
>> In that case, I think we should target this for the next CommitFest.
>> Especially given the number and complexity of the patches remaining
>> for this CommitFest, I feel very uncomfortable with the idea of
>> waiting another week for a new patch version, and then possibly still
>> needing further changes before it is finally committed.   While we
>> allow patches to be resubmitted for the same CommitFest, this is
>> intended to be for minor adjustments, not significant rewrites.
>
> OK, I expected that to be the case. I got significant feedback at the
> beginning of this commitfest that required some substantial language
> changes. I did find this commitfest extremely productive for my feature.

Excellent, glad to hear it.

> Right now I'm trying to provide some useful feedback to Paval for his
> patch.

Thanks, I deeply appreciate that.  I believe that there are 29 people
who submitted patches for this CommitFest, and that 4 of them are
reviewing, yourself included.  Furthermore, patches and feature
proposals from people who are not themselves helping with the
CommitFest have continued to roll in during this CommitFest.
Personally, I find this quite objectionable.  Apparently, CommitFest
no longer means a time when people put aside their own patches to
review those of others; it seems now to mean a time when 87% of the
patch authors either continue development or ignore the CommitFest
completely.

Fortunately, a number of very competent people who did NOT submit
patches nevertheless volunteered to help review, so we may be OK.  But
I am not sure this is a very sustainable solution.  If everyone who
submitted a pach for this CF had also reviewed one, every patch would
now have a review and there would even be enough reviewers for major
patches to have two each.  Instead, we are still struggling to get
every patch looked at once.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-09-27 at 22:40 -0400, Robert Haas wrote:
> Apparently, CommitFest
> no longer means a time when people put aside their own patches to
> review those of others; it seems now to mean a time when 87% of the
> patch authors either continue development or ignore the CommitFest
> completely.

Well, I'm not completely innocent here, either. I also spent time making
progress on my patch, both in terms of code and discussion so that I
would at least have enough information to get it ready during the next
development cycle.

We don't have a clear "design review" stage that allows developers an
opportunity to get thorough feedback during the development cycle, so
the commitfest is also the design review stage by default. I got some
comments when I posted my design on 8/16, but it didn't really get
hashed out until a month later when the commitfest was underway.

The ideal is to propose, design, implement, and then submit for the
commitfest. The reality is that the commitfest is often the first time
the developer gets thorough feedback on the design. So, as a developer,
I'm hesitant to polish a patch before the commitfest because I know
significant changes will be required. Hopefully that didn't waste too
much of Brendan's time.

That's just an observation from my experience with my patch. I know it's
easy to point at little inefficiencies from afar, so I'm not suggesting
we change our process. Overall, real progress is being made for the
project in general and my patch in particular, so I'm more than willing
to set my minor frustrations aside as long as that continues.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Sun, Sep 27, 2009 at 11:31 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2009-09-27 at 22:40 -0400, Robert Haas wrote:
>> Apparently, CommitFest
>> no longer means a time when people put aside their own patches to
>> review those of others; it seems now to mean a time when 87% of the
>> patch authors either continue development or ignore the CommitFest
>> completely.
>
> Well, I'm not completely innocent here, either. I also spent time making
> progress on my patch, both in terms of code and discussion so that I
> would at least have enough information to get it ready during the next
> development cycle.

I don't see any problem with that.  As long as everyone is willing to
spend SOME time on their own patch and SOME time reviewing the work of
others, the system works.  After all, the time to review a patch is,
IME, far shorter than the time to develop one.  But right now we have
a large majority of patch authors who are not contributing ANYTHING to
the CommitFest, and that's not so good.

...Robert


Re: operator exclusion constraints [was: generalized index constraints]

From
Jeff Davis
Date:
On Sun, 2009-09-20 at 17:54 -0400, Tom Lane wrote:
> (I'm wondering a bit if anyone will want a WHERE clause, too, though
> adding that later shouldn't pose any big syntactic obstacles.)

Where should I put the WHERE clause? My current syntax (with patch) is:

[ CONSTRAINT constraint_name ] EXCLUSION [USING index_method] (expression CHECK WITH operator [, ...]) index_parameters
}
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
[ WHERE predicate ]

That's a little awkward to document, because WHERE is only supported for
operator exclusion constraints, so it doesn't just fit along side CHECK
and FOREIGN KEY. My only concern is that it would make the CREATE TABLE
syntax slightly harder to read.

We could allow the WHERE clause to be syntactically correct for all the
other constraints, but throw a "not implemented" error if they try to
use it. I'm not sure if that fits nicely with the spec or not.

I tried to move the WHERE clause right before or after the
index_parameters, but that resulted in shift/reduce conflicts.

Thoughts?

Regards,Jeff Davis



Re: operator exclusion constraints [was: generalized index constraints]

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I tried to move the WHERE clause right before or after the
> index_parameters, but that resulted in shift/reduce conflicts.

You could avoid the conflicts in at least two ways:

* require parens around the WHERE expression

* stick the WHERE inside the EXCLUSION ( ... ) bit, so that its
expression is terminated by the outer right paren.

Not sure if either of these is less ugly than putting it at the end,
though :-(.  They both seem a bit surprising.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
Here's another WIP patch for operator exclusion constraints (both
patches are the same, but one is a context diff made with filterdiff).

It's almost done, and the only reason I'm posting this now is because I
see additional work being done for scalable deferred unique checks, and
I'd like to make sure that we're not interfering with eachother.

Completed Items:

 * support for deferred constraints
 * enforce constraint when added to a table with existing data
 * change predicate to be before DEFERRABLE clause, and require
perentheses around predicate expression
 * doc and test updates
 * make LIKE and INHERITS ignore operator exclusion constraints (there
was some discussion that we should make LIKE behave more consistently,
but I think that's for a different patch)

Open Items:

 * psql support
 * pg_dump support
 * prevent altering a column that's part of an exclusion constraint in a
way that might cause the exclusion constraint to be violated
 * self-review before RRR

Regards,
    Jeff Davis


Attachment

Re: operator exclusion constraints

From
Jeff Davis
Date:
New patch attached (again, both attachments are the same, but one was
passed through filterdiff to get a context diff).

Done:
 * fixed the problem Dean Rasheed found using his test script
 * some cleanup

Items remaining:
 * psql
 * pg_dump
 * self-review before RRR
 * performance testing

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Jeff Davis
Date:
New patch attached. Again, both attached patches are the same, but one
is the original patch from "git diff", compressed; and the other is the
same patch passed through filterdiff to make it a context diff. Also
available from my git repo (branch "operator-exclusion-constraints"):

http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints


All open issues are closed; ready for review. Comments welcome.


Completed:

 * pg_dump support
 * psql support
 * clean up after brief review

Simple Performance Tests:

I used Dean Rasheed's python script from this email:

  http://archives.postgresql.org/pgsql-hackers/2009-10/msg01527.php

modifying it slightly for the various combinations listed below.

All tests with 10 threads and 1000 loops. The constraint is always
uniqueness (with the operator exclusion constraints that means "CHECK
WITH ="). All results are the approximate average of 3 consecutive runs.
All of these are high-contention tests because they are all competing
over 100 unique values.

[ "btree" means the existing constraint enforcement mechanism in btree,
although the exclusion constraints use a btree as well in this test ]

btree/immediate:       14s
exclusion/immediate:   90s

btree/deferrable:     155s
exclusion/deferrable: 150s

The next tests I did were simple no-contention insert tests of 1M
integer values. I only tested with immediate.

btree/immediate:       15s
exclusion/immediate:   19s

So, the performance is marginally worse for no-contention, about the
same for high-contention deferrable, and significantly worse for
high-contention immediate.

I think that the performance for high-contention immediate could be
improved by being a little more clever to avoid restarting the index
scan on every conflict. However, I've already made a mistake in that
area (as Dean pointed out), so I'll leave it simple for now.

Right as the November commitfest starts, I'll be in Japan for about a
week, so I may be slow to respond to comments during that time.

Regards,
    Jeff Davis


Attachment

Re: operator exclusion constraints

From
Simon Riggs
Date:
On Sun, 2009-11-01 at 10:53 -0800, Jeff Davis wrote:
> New patch attached. 

After reading the docs in the patch I don't believe you're going to all
this trouble to ensure two circles don't overlap. Can you give some
better examples of what you're trying to achieve and why anyone else
would care? (I'm busy, so are others).

I can probably guess, but my feeling is I shouldn't have to. I feel like
this might be a truly great feature, but I'm worried that either it
isn't at all or it is and yet will be overlooked. Does this project link
in with other planned developments in various plugins? 

The current patch writes the syntax like this EXCLUSION USING gist (c CHECK WITH &&)
makes it look like a table constraint, yet it clearly refers to a single
column. That looks very clumsy to read, to my eyes.

The syntax be easier to read if it was stated as a comparison
e.g. in the circle example CHECK ( NOT (NEW.c && c)) USING GIST
where NEW is the incoming row.
This is similar to the way I would write the constraint if I wanted to
ensure the values in two columns did not match/overlap etc CHECK ( NOT (col1 && col2))
and is also not such a radical departure from existing SQL Standard
syntax.

We only need the NOT when there isn't a clear negator defined, so in
most cases I would hope to read something like this CHECK (NEW.pkey != pkey) USING btree
which should be equivalent to the UNIQUE constraint

I don't think its too late to change the syntax.

-- Simon Riggs           www.2ndQuadrant.com



Re: operator exclusion constraints

From
Nathan Boley
Date:
> After reading the docs in the patch I don't believe you're going to all
> this trouble to ensure two circles don't overlap. Can you give some
> better examples of what you're trying to achieve and why anyone else
> would care? (I'm busy, so are others).
>

Non overlapping time intervals is one use case - think about room
scheduling. I personally want to use it to ensure the consistency of
genomic annotations.

-Nathan


Re: operator exclusion constraints

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> The syntax be easier to read if it was stated as a comparison
> e.g. in the circle example
>   CHECK ( NOT (NEW.c && c)) USING GIST

I don't think this is a good idea at all.  NEW is a nonstandard
Postgres-ism, and introducing it into this syntax doesn't seem very
future-proof to me.  What's more, the above is not in the least
analogous to a regular CHECK constraint, because there's some implicit
notion of "c" ranging over all other rows, which is not what is meant
by the same column reference in a CHECK constraint.

I agree that the proposed syntax is a bit awkward, but this isn't
better.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-11-01 at 22:42 +0000, Simon Riggs wrote:
> After reading the docs in the patch I don't believe you're going to all
> this trouble to ensure two circles don't overlap. Can you give some
> better examples of what you're trying to achieve and why anyone else
> would care? (I'm busy, so are others).

Non-overlapping periods of time. I couldn't document that, because the
PERIOD type doesn't exist in core (yet).

> I can probably guess, but my feeling is I shouldn't have to. I feel like
> this might be a truly great feature, but I'm worried that either it
> isn't at all or it is and yet will be overlooked. Does this project link
> in with other planned developments in various plugins? 

Absolutely:
http://archives.postgresql.org/pgsql-hackers/2009-10/msg01813.php

> The current patch writes the syntax like this
>   EXCLUSION USING gist (c CHECK WITH &&)
> makes it look like a table constraint, yet it clearly refers to a single
> column. That looks very clumsy to read, to my eyes.

It is a table constraint, and you can specify multiple columns. I don't
see much point in allowing this as a column constraint, because that's
not the typical case.

Most of the time, there will be two columns like: EXCLUSION(room_number CHECK WITH =, during CHECK WITH &&)

In other words, usually there is both a resource and a period of time
for the reservation. It is of course possible to use it for a column
constraint, and I'll add syntax if there's demand for it.

> The syntax be easier to read if it was stated as a comparison
> e.g. in the circle example
>   CHECK ( NOT (NEW.c && c)) USING GIST
> where NEW is the incoming row.
> This is similar to the way I would write the constraint if I wanted to
> ensure the values in two columns did not match/overlap etc
>   CHECK ( NOT (col1 && col2))
> and is also not such a radical departure from existing SQL Standard
> syntax.

We've already had very extensive discussion about the syntax. Your idea
is interesting, but I agree with Tom that it's not ideal, either. NEW
might be OK, but Tom's observation about the new meaning of "c" (ranging
over the entire table) is a compelling problem.

Consider: CHECK ( NOT (NEW.c && c OR c && d))

The right side of the OR could either mean "c overlaps d" or "forall c,
d: c overlaps d". I can't come up with a way to treat "c" consistently
between the left and right side of the OR (put another way, is "c" free
or bound?).

We could allow subselects in CHECK, but it's difficult to infer from
arbitrary queries what I can enforce with an operator exclusion
constraint, and what I can't.

If you want to re-open the syntax discussion, we can (right is better
than soon). However, it is late in the cycle, so I'll need something
very clear quite soon if this is going to make it into 8.5.

Personally I think the current syntax is pretty good.

Regards,Jeff Davis




Re: operator exclusion constraints

From
Simon Riggs
Date:
On Sun, 2009-11-01 at 15:42 -0800, Jeff Davis wrote:

> Most of the time, there will be two columns like:
>   EXCLUSION(room_number CHECK WITH =, during CHECK WITH &&)

Now that's a great example.

Looks like the classic "don't allow the same room to be booked more than
once at the same time".

It bothers me that we would have completely separate syntax for this
feature as opposed to normal SQL. It also doesn't make it easy to
interpret from the business statement to the implementation. Notice that
the "," above means "AND". How would we use an OR conditional? How would
we express the wish to use a partial index? 

How would I express a bidding rule: "Only allow bids that are better
than the highest bid so far"

EXCLUSION (item CHECK WITH =, bid_price CHECK WITH <)

Did I get the ">" the right way around?

How would I specify a tree that has only 2 down branches at any node,
'left' and 'right'?

> If you want to re-open the syntax discussion, we can 

I don't *want* to and I don't want to derail a good feature. But we'll
be looking at this for years and believe me if you introduce even a
minor bump you'll hear it repeated endlessly.

> Personally I think the current syntax is pretty good.

The feature sounds great, regrettably the syntax doesn't seem very
clean, as an objective observer. I apologise if this causes you trouble,
I have no axe to grind here. Not sure that if we submitted this to SQL
Standard committee that it would be accepted as is.

(You made a few other points which I regrettably skimmed over in my
reply).

-- Simon Riggs           www.2ndQuadrant.com



Re: operator exclusion constraints

From
Peter Eisentraut
Date:
On Sun, 2009-11-01 at 22:42 +0000, Simon Riggs wrote:
> The current patch writes the syntax like this
>   EXCLUSION USING gist (c CHECK WITH &&)
> makes it look like a table constraint, yet it clearly refers to a
> single
> column. That looks very clumsy to read, to my eyes.

I think the word CHECK should be avoided completely in this syntax, to
avoid confusion with CHECK constraints.



Re: operator exclusion constraints

From
Simon Riggs
Date:
On Sun, 2009-11-01 at 18:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > The syntax be easier to read if it was stated as a comparison
> > e.g. in the circle example
> >   CHECK ( NOT (NEW.c && c)) USING GIST
> 
> I don't think this is a good idea at all.  NEW is a nonstandard
> Postgres-ism, and introducing it into this syntax doesn't seem very
> future-proof to me.  What's more, the above is not in the least
> analogous to a regular CHECK constraint, because there's some implicit
> notion of "c" ranging over all other rows, which is not what is meant
> by the same column reference in a CHECK constraint.
> 
> I agree that the proposed syntax is a bit awkward, but this isn't
> better.

Agreed. Just looking for readable, future-proof syntax.

-- Simon Riggs           www.2ndQuadrant.com



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-11-02 at 07:38 +0000, Simon Riggs wrote:
> It bothers me that we would have completely separate syntax for this
> feature as opposed to normal SQL. It also doesn't make it easy to
> interpret from the business statement to the implementation. Notice that
> the "," above means "AND".

Yes, in that way, it's similar to a UNIQUE constraint, e.g.
UNIQUE (a, b). The more columns you add, the more permissive the
constraint.

> How would we use an OR conditional?

Specify multiple constraints.

> How would
> we express the wish to use a partial index? 

EXCLUSION (...) WHERE (...)

The perens are actually required around the predicate in this case, due
to syntactic problems.

> How would I express a bidding rule: "Only allow bids that are better
> than the highest bid so far"
> 
> EXCLUSION (item CHECK WITH =, bid_price CHECK WITH <)

That would be a cool feature, unfortunately it won't work in the current
form. This constraint is only enforced on index insert -- imagine what
confusion would be caused when:

UPDATE foo SET third_column = 7 ...

If that's a HOT update, it wouldn't re-check the bid_price. If it turns
into a cold update, it would reject the update because the bid_price is
no longer the highest.

> Did I get the ">" the right way around?

The above problem essentially means we only allow commutative operators,
which avoids this source of confusion.

Interestingly, reflexive operators aren't required. So, if <> is
searchable, you can have the opposite of unique: all values must be the
same. That might be interesting for something like:
 EXCLUSION(room CHECK WITH =,           during CHECK WITH &&,           student_grade CHECK WITH <>)

To ensure that a shared room isn't shared between students of different
grades. Not the most compelling use case, but I could imagine something
along these lines being useful.

Maybe a better example would involve sheep and wolves ;)

> How would I specify a tree that has only 2 down branches at any node,
> 'left' and 'right'?

I'm not sure I understand this exactly. If the left or right is
explcitly a part of the tuple, I think it can be done with unique.

If not, and you're looking for a maximum of two tuples, you can see my
ill-fated extension to this feature here:

http://archives.postgresql.org/pgsql-hackers/2009-11/msg00016.php

As Tom pointed out, most use cases would not have a constant limit
throughout the table. If you do have such a use case, you can revive the
proposal.

> Not sure that if we submitted this to SQL
> Standard committee that it would be accepted as is.

There are implementation details bubbling up to the user-visible
behavior, and I share your concern. The SQL committee would never care
about these implementation details, and I wish we didn't have to,
either.

The machinism that I've employed searches (using a dirty snapshot) for
all of the physical tuples that cause a logical conflict with the
physical tuple currently being added. If found, it uses physical
information from those tuples, like visibility information, to determine
whether to wait, and on whom to wait. After waiting it may either
proceed or abort.

If we move closer to a nice, clean query to express the constraint, it
gets very difficult to tell which physical tuple is responsible for the
conflict. If we don't know what physical tuple is causing the conflict,
we have to serialize all writes.

Additionally, the more it looks like a query, the more we have to tell
users "follow this template" -- which will just lead to confusion and
disappointment for users who think we've implemented SQL ASSERT (which
we haven't).

Although the current syntax isn't great, it is declarative, and it does
allow a variety of constraints.

I certainly welcome ideas that will make a better trade-off here. At the
end, I just want a feature that can implement temporal keys.

Regards,Jeff Davis




Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-11-02 at 08:25 +0000, Peter Eisentraut wrote:
> I think the word CHECK should be avoided completely in this syntax, to
> avoid confusion with CHECK constraints.

This is an easy change. I don't have a strong opinion, so the only thing
I can think to do is ask for a vote.

Do you have a specific alternative in mind? How about just "WITH"?

Regards,Jeff Davis




Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2009-11-02 at 08:25 +0000, Peter Eisentraut wrote:
>> I think the word CHECK should be avoided completely in this syntax, to
>> avoid confusion with CHECK constraints.

> This is an easy change. I don't have a strong opinion, so the only thing
> I can think to do is ask for a vote.

> Do you have a specific alternative in mind? How about just "WITH"?

I think we had that discussion already, and rejected using "WITH" by
itself because it was so totally devoid of suggestion of what it was
the system would do "with" the expression or operator.

If we don't want to introduce a new reserved word it's difficult to
find alternatives :-(.  One thing that just came to mind is that we
might be able to do something like
EXCLUSION (expr CHECK NOT operator)
orEXCLUSION (expr CONSTRAIN NOT operator)

I like the "NOT" here because "CHECK NOT =" seems to convey pretty
clearly what it is you are checking for.  Because NOT is reserved and
can't appear as a connective, I think that this approach might allow
a non-reserved leading word, thus possibly the second variant would
work without reserving CONSTRAIN.  I have not tested whether bison
agrees with me though ;-).  In any case I think "CHECK NOT =" reads
pretty well, and don't feel a strong urge to use some other word there.
        regards, tom lane


Re: operator exclusion constraints

From
Simon Riggs
Date:
On Mon, 2009-11-02 at 13:12 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > On Mon, 2009-11-02 at 08:25 +0000, Peter Eisentraut wrote:
> >> I think the word CHECK should be avoided completely in this syntax, to
> >> avoid confusion with CHECK constraints.
> 
> > This is an easy change. I don't have a strong opinion, so the only thing
> > I can think to do is ask for a vote.
> 
> > Do you have a specific alternative in mind? How about just "WITH"?
> 
> I think we had that discussion already, and rejected using "WITH" by
> itself because it was so totally devoid of suggestion of what it was
> the system would do "with" the expression or operator.
> 
> If we don't want to introduce a new reserved word it's difficult to
> find alternatives :-(.  One thing that just came to mind is that we
> might be able to do something like
> 
>     EXCLUSION (expr CHECK NOT operator)
> or
>     EXCLUSION (expr CONSTRAIN NOT operator)
> 
> I like the "NOT" here because "CHECK NOT =" seems to convey pretty
> clearly what it is you are checking for.  Because NOT is reserved and
> can't appear as a connective, I think that this approach might allow
> a non-reserved leading word, thus possibly the second variant would
> work without reserving CONSTRAIN.  I have not tested whether bison
> agrees with me though ;-).  In any case I think "CHECK NOT =" reads
> pretty well, and don't feel a strong urge to use some other word there.

Yep, like the NOT.

Other ideasEXCLUSION (expr NOT operator)
CONSTRAINT (expr NOT operator ALL ROWS)

-- Simon Riggs           www.2ndQuadrant.com



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-11-02 at 18:28 +0000, Simon Riggs wrote:
> > I like the "NOT" here because "CHECK NOT =" seems to convey pretty
> > clearly what it is you are checking for.  Because NOT is reserved and
> > can't appear as a connective, I think that this approach might allow
> > a non-reserved leading word, thus possibly the second variant would
> > work without reserving CONSTRAIN.  I have not tested whether bison
> > agrees with me though ;-).  In any case I think "CHECK NOT =" reads
> > pretty well, and don't feel a strong urge to use some other word there.
> 

Peter, do any of these ideas work for you? It looks like this opens the
door to using a word other than CHECK. CONSTRAIN NOT is a little
awkward, is there another word that might work better?

I'm not excited about using NOT, because I think it has a hint of a
double-negative when combined with EXCLUSION. The original idea was to
specify the way to find tuples mutually exclusive with the new tuple;
and NOT makes that a little less clear, in my opinion. But I'm fine with
it if that's what everyone else thinks is best.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I'm not excited about using NOT, because I think it has a hint of a
> double-negative when combined with EXCLUSION.

Well, the choice of EXCLUSION isn't set in stone either ...
        regards, tom lane


Re: operator exclusion constraints

From
Dean Rasheed
Date:
2009/11/3 Tom Lane <tgl@sss.pgh.pa.us>:
> Jeff Davis <pgsql@j-davis.com> writes:
>> I'm not excited about using NOT, because I think it has a hint of a
>> double-negative when combined with EXCLUSION.
>
> Well, the choice of EXCLUSION isn't set in stone either ...
>

Is this really a generalized uniqueness constraint, extended to
support operators other than = ?
Perhaps sticking with the word UNIQUE might be more suggestive of this:
 UNIQUE (room_number WITH = , during WITH &&)

or:
 UNIQUE (room_number , during USING && )

- Dean


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Tue, 2009-11-03 at 21:31 +0000, Dean Rasheed wrote:
> Is this really a generalized uniqueness constraint, extended to
> support operators other than = ?

That has been discussed in the past:

http://archives.postgresql.org/message-id/1253119552.24770.203.camel@jdavis
http://archives.postgresql.org/message-id/1253122946.24770.250.camel@jdavis

However, some constraints allowed by this feature are the *opposite* of
unique: consider "<>".

Personally, I don't like to use the word UNIQUE to describe a constraint
that may reject unique values or permit duplicates.

We already have some reasonable agreement around EXCLUSION ... CHECK
WITH. We should stick with the current syntax unless there's a good
consensus around some other specific proposal.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Tue, Nov 3, 2009 at 5:05 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> We already have some reasonable agreement around EXCLUSION ... CHECK
> WITH. We should stick with the current syntax unless there's a good
> consensus around some other specific proposal.

Yeah.  I don't like the inflexibility of the current syntax, but
that's mostly because I wish the feature itself could be made more
general.  It would be nice to be able to write constraints of the
form:

forall <vars> : <expression>

For example, a uniqueness constraint on a column c is: forall x,y : x.c != y.c
And a does-not overlap constraint might look like this: forall x,y :
NOT (x.c && y.c)

Note that an ordinary check constraint is a special case of this where
there is only one rowvar and it is implicit.

HOWEVER, this is probably a lot more work than what you've already
done, and what you've already done is really good, and we shouldn't
hesitate to commit on the grounds that it won't cure diabetes and
balance the budget.  I don't think there is any really beautiful
syntax for the feature as it stands, but considering how useful it is
I am not inclined to stand on ceremony...

...Robert


Re: operator exclusion constraints

From
Peter Eisentraut
Date:
On Tue, 2009-11-03 at 08:51 -0800, Jeff Davis wrote:
> Peter, do any of these ideas work for you? It looks like this opens the
> door to using a word other than CHECK. CONSTRAIN NOT is a little
> awkward, is there another word that might work better?
> 
> I'm not excited about using NOT, because I think it has a hint of a
> double-negative when combined with EXCLUSION. The original idea was to
> specify the way to find tuples mutually exclusive with the new tuple;
> and NOT makes that a little less clear, in my opinion. But I'm fine with
> it if that's what everyone else thinks is best.

I've been thinking how the other constraint types "read", e.g.,

a CHECK (a > 0) means "check that a is > 0"
b PRIMARY KEY means "b is the primary key"
c UNIQUE means "c is unique [in this table]"

That's easy.  Whereas

EXCLUSION (a CHECK NOT =) means, er, "check that a is not an exclusion
of =" or something.  Huh?

A more readable alternative might be going into the direction of
(written as a column constraint):

a EXCLUSIVE BY =

meaning "a is exclusive [in this table] [as measured] by =".  Or as
table constraint

EXCLUSIVE (a, b) BY =

And then you could think of UNIQUE as "EXCLUSIVE BY default-equals-op".

EXCLUSIVE is already a key word, by the way.



Re: operator exclusion constraints

From
Robert Haas
Date:
On Thu, Nov 5, 2009 at 9:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Tue, 2009-11-03 at 08:51 -0800, Jeff Davis wrote:
>> Peter, do any of these ideas work for you? It looks like this opens the
>> door to using a word other than CHECK. CONSTRAIN NOT is a little
>> awkward, is there another word that might work better?
>>
>> I'm not excited about using NOT, because I think it has a hint of a
>> double-negative when combined with EXCLUSION. The original idea was to
>> specify the way to find tuples mutually exclusive with the new tuple;
>> and NOT makes that a little less clear, in my opinion. But I'm fine with
>> it if that's what everyone else thinks is best.
>
> I've been thinking how the other constraint types "read", e.g.,
>
> a CHECK (a > 0) means "check that a is > 0"
> b PRIMARY KEY means "b is the primary key"
> c UNIQUE means "c is unique [in this table]"
>
> That's easy.  Whereas
>
> EXCLUSION (a CHECK NOT =) means, er, "check that a is not an exclusion
> of =" or something.  Huh?
>
> A more readable alternative might be going into the direction of
> (written as a column constraint):
>
> a EXCLUSIVE BY =
>
> meaning "a is exclusive [in this table] [as measured] by =".  Or as
> table constraint
>
> EXCLUSIVE (a, b) BY =
>
> And then you could think of UNIQUE as "EXCLUSIVE BY default-equals-op".
>
> EXCLUSIVE is already a key word, by the way.

Ooh, that's kind of neat.  But I think you'd need EXCLUSIVE (a, b) BY
(=, =), since it could equally well be EXCLUSIVE (a, b) BY (=, &&).

...Robert


Re: operator exclusion constraints

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Thu, Nov 5, 2009 at 9:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

> > Or as table constraint
> >
> > EXCLUSIVE (a, b) BY =
> >
> > And then you could think of UNIQUE as "EXCLUSIVE BY default-equals-op".
> >
> > EXCLUSIVE is already a key word, by the way.
> 
> Ooh, that's kind of neat.  But I think you'd need EXCLUSIVE (a, b) BY
> (=, =), since it could equally well be EXCLUSIVE (a, b) BY (=, &&).

Perhaps EXCLUSIVE (a BY =, b BY =)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: operator exclusion constraints

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Ooh, that's kind of neat.  But I think you'd need EXCLUSIVE (a, b) BY
> (=, =), since it could equally well be EXCLUSIVE (a, b) BY (=, &&).

Yeah, we definitely want some parentheses delimiting the expression.
EXCLUSIVE still feels like the wrong part-of-speech though.  How
about EXCLUDING (...) BY ... instead?
        regards, tom lane


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 5, 2009, at 6:56 AM, Tom Lane wrote:

> Yeah, we definitely want some parentheses delimiting the expression.
> EXCLUSIVE still feels like the wrong part-of-speech though.  How
> about EXCLUDING (...) BY ... instead?

CHECK is a verb; so what about a verb here?
    EXCLUDE (...) BY ...

Best,

David


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-05 at 09:25 -0800, David E. Wheeler wrote:
> CHECK is a verb; so what about a verb here?

CHECK -- verb or noun
UNIQUE -- adjective
PRIMARY KEY -- noun
FOREIGN KEY -- noun

Any of them can be called an adjective if you put it before the word
"constraint". But that's kind of circular: if we made a new type of
constraint and called it XYZ, we could say XYZ CONSTRAINT, and call XYZ
an adjective.

I'm having trouble finding a clear pattern to follow.

Regards,Jeff Davis





Re: operator exclusion constraints

From
Alvaro Herrera
Date:
Jeff Davis escribió:

> I'm having trouble finding a clear pattern to follow.

Rewrite SQL in Latin?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 5, 2009, at 10:13 AM, Jeff Davis wrote:

> CHECK -- verb or noun
> UNIQUE -- adjective
> PRIMARY KEY -- noun
> FOREIGN KEY -- noun

Put like that, I'm thinking:

EXCLUSION - noun or adjective
     EXCLUSION (...) BY ...

But that doesn't read as well to my eye as:
    EXCLUDE (...) BY ...

Or
    EXCLUDING (...) BY ...

I think either of those last two work well enough.

Best,

David


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-05 at 09:56 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Ooh, that's kind of neat.  But I think you'd need EXCLUSIVE (a, b) BY
> > (=, =), since it could equally well be EXCLUSIVE (a, b) BY (=, &&).
> 
> Yeah, we definitely want some parentheses delimiting the expression.
> EXCLUSIVE still feels like the wrong part-of-speech though.  How
> about EXCLUDING (...) BY ... instead?

I think EXCLUDING conflicts with the EXCLUDING in LIKE. Also, it becomes
a little more difficult to place the access method clause, because
"EXCLUDING USING gist" doesn't sound great.

Regards,Jeff Davis



Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 5, 2009, at 11:09 AM, Jeff Davis wrote:

> I think EXCLUDING conflicts with the EXCLUDING in LIKE. Also, it  
> becomes
> a little more difficult to place the access method clause, because
> "EXCLUDING USING gist" doesn't sound great.

Well that's clearly a verb. So perhaps "EXCLUDE USING  
gist" ("EXCLUDING USING gist" is a little weirder).

Best,

David


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-05 at 10:30 -0800, David E. Wheeler wrote:
> But that doesn't read as well to my eye as:
> 
>      EXCLUDE (...) BY ...

I think EXCLUDE might be a little *too* specific. It sounds like
whatever is on the right hand side will be excluded, but that's not
really what happens.

EXCLUSION is vague about what is doing the excluding and what is being
excluded. I think that's good in this case, because the actual meaning
can't easily be expressed with a couple keywords, so suggesting the
behavior is about as close as we can get (unless someone comes up with a
new idea).

>      EXCLUDING (...) BY ...

I think that's better, but still sounds a little wrong to me.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-05 at 11:16 -0800, David E. Wheeler wrote:
> Well that's clearly a verb. So perhaps "EXCLUDE USING  
> gist" ("EXCLUDING USING gist" is a little weirder).

That's not bad.

As I just said in my other email, I think the word EXCLUDE is a little
bit too specific, but the other ideas out there aren't perfect, either.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-05 at 09:56 -0500, Tom Lane wrote:
> Yeah, we definitely want some parentheses delimiting the expression.
> EXCLUSIVE still feels like the wrong part-of-speech though.  How
> about EXCLUDING (...) BY ... instead?

If I put EXCLUSION in the type_func_name keyword list, it works fine.
But I'm having a little trouble trying to use EXCLUDING or EXCLUSIVE,
because if I move them from unreserved to any other keyword list, I get
reduce/reduce conflicts.

Am I doing something wrong? I would assume that making words more
reserved would usually not lead to conflicts.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
Out of the ideas in this thread, here are the best and most complete so
far:

Form 1: WORD1 [ USING index_method ]   ( expression WORD2 operator [, ... ] ) index_parameters   [ WHERE ( predicate )
]

Form 2: WORD1 [ USING index_method ]   ( expression [, ... ] ) WORD2 ( operator [, ...] ) index_parameters   [ WHERE (
predicate) ]
 

Word 1: EXCLUSION EXCLUDE

Word 2: CHECK WITH BY (only works with Form 2)

The current syntax is Form 1 using EXCLUSION/CHECK WITH.

I had trouble making the grammar work using EXCLUDING or EXCLUSIVE.
However, even if we get past that problem, we would want to move the
"USING index_method" clause afterward (otherwise it's too awkward), but
that causes conflicts with the index_parameters (because of the "USING
INDEX TABLESPACE tablespace" clause).

CHECK WITH has a little more meaning than BY, but as Peter says, it
might cause confusion with a regular CHECK constraint. EXCLUDE implies
that the thing on the right hand side is being excluded, when it's
actually specifying what pairs of tuples are mutally exclusive; so I
think EXCLUDE is a little bit too specific.

Right now, I am leaning toward Form 2, and EXCLUSION/BY. So the typical
case would read like:
 EXCLUSION USING gist (room, during) BY (=, &&)

Comments, ideas or votes welcome. However, if anyone has any new ideas,
please make an attempt to ensure that the grammar can work and that it's
just as expressive. We've had several productive brainstorming sessions
so far, and I certainly appreciate the input, but I think it's time to
move toward a consensus.

Regards,Jeff Davis



Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 6, 2009, at 10:42 AM, Jeff Davis wrote:

> Right now, I am leaning toward Form 2, and EXCLUSION/BY. So the  
> typical
> case would read like:
>
>  EXCLUSION USING gist (room, during) BY (=, &&)

I like this, but like EXCLUDE better
    EXCLUDE USING gist (room, during) BY (=, &&)

Is your objection to EXCLUDE for cases when there is no USING clause?
    EXLUDE (room, during) BY (=, &&)

Yes, I can see how that'd be a bit more confusing. So EXCLUSION  
probably is best.

BTW, is it the case that room maps to = and during maps to && in this  
example? If so, wouldn't it make more sense to combine them?
    EXCLUSION (room WITH =, during WITH &&)

Or am I misunderstanding how this works (quite likely, I'm sure).

Best,

David



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> If I put EXCLUSION in the type_func_name keyword list, it works fine.
> But I'm having a little trouble trying to use EXCLUDING or EXCLUSIVE,
> because if I move them from unreserved to any other keyword list, I get
> reduce/reduce conflicts.

> Am I doing something wrong? I would assume that making words more
> reserved would usually not lead to conflicts.

Putting any of these at a higher level than unreserved is problematic
because they are not permitted to be reserved words according to the
SQL spec.  While we do have some nonstandard reserved words, I think
the bar for adding new ones has to be pretty high, because it will break
applications that (a) worked before and (b) are not violating either the
letter or the spirit of the standard.

I'd be less worried about making a new col_name_keyword, but if it has
to be type_func_name_keyword then it's nearly as bad as fully reserved.

The main advantage of the CHECK WITH syntax in my eyes was that it
avoided the need to create a new reserved word.
        regards, tom lane


Re: operator exclusion constraints

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> BTW, is it the case that room maps to = and during maps to && in this  
> example? If so, wouldn't it make more sense to combine them?

>      EXCLUSION (room WITH =, during WITH &&)

I think so too.  Keeping the expression and the associated operator
together seems more readable and less error-prone than having them
separated by other columns.

BTW, where is the optional opclass name going to fit in?  ("There
isn't one" is not an acceptable answer.)
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 10:50 -0800, David E. Wheeler wrote:
> Is your objection to EXCLUDE for cases when there is no USING clause?
> 
>      EXLUDE (room, during) BY (=, &&)
> 

"Objection" is too strong a word. EXCLUDE is a transitive verb, so it's
slightly confusing in the above case.

> BTW, is it the case that room maps to = and during maps to && in this  
> example? If so, wouldn't it make more sense to combine them?
> 
>      EXCLUSION (room WITH =, during WITH &&)

That's (close to) the current syntax, which I'm perfectly fine with.
Form 1 with EXCLUSION/CHECK WITH is the current syntax.

It seemed like the winds were shifting towards separating them, but I'm
happy leaving it alone.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 14:05 -0500, Tom Lane wrote:
> BTW, where is the optional opclass name going to fit in?  ("There
> isn't one" is not an acceptable answer.)

"expression" is actually an index_elem, which includes all of the
options including the opclass. I'll make that more clear.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Fri, Nov 6, 2009 at 2:11 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2009-11-06 at 10:50 -0800, David E. Wheeler wrote:
>> Is your objection to EXCLUDE for cases when there is no USING clause?
>>
>>      EXLUDE (room, during) BY (=, &&)
>>
>
> "Objection" is too strong a word. EXCLUDE is a transitive verb, so it's
> slightly confusing in the above case.
>
>> BTW, is it the case that room maps to = and during maps to && in this
>> example? If so, wouldn't it make more sense to combine them?
>>
>>      EXCLUSION (room WITH =, during WITH &&)
>
> That's (close to) the current syntax, which I'm perfectly fine with.
> Form 1 with EXCLUSION/CHECK WITH is the current syntax.
>
> It seemed like the winds were shifting towards separating them, but I'm
> happy leaving it alone.

I don't favor separating them.  Locality of reference is good.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 14:00 -0500, Tom Lane wrote:
> The main advantage of the CHECK WITH syntax in my eyes was that it
> avoided the need to create a new reserved word.

It still needs the EXCLUSION keyword, though, and where does that fit
in? If I include it as unreserved, I get shift/reduce conflicts. If I
include it as a type_func_name keyword, it works.

CHECK, FOREIGN, PRIMARY, and UNIQUE are all reserved as well, which
makes sense because it looks like they conflict directly with column
names in the table definition.

Do you see a way to avoid that problem?

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Fri, 2009-11-06 at 14:00 -0500, Tom Lane wrote:
>> The main advantage of the CHECK WITH syntax in my eyes was that it
>> avoided the need to create a new reserved word.

> It still needs the EXCLUSION keyword, though, and where does that fit
> in? If I include it as unreserved, I get shift/reduce conflicts. If I
> include it as a type_func_name keyword, it works.

If you could get it down to col_name_keyword, I wouldn't complain.

Most of the problems we've had with having to reserve keywords in CREATE
TABLE stem from the fact that they can follow a DEFAULT expression.
If we restrict this thing to being a table constraint, not a column
constraint, it seems like the issue might go away (and in fact I think
you might not even need col_name_keyword).  As long as we are explicitly
specifying column names in the exclusion expressions, I don't think it's
very sensible to write it as a column constraint anyway.  Have you
tried that approach?
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 14:59 -0500, Tom Lane wrote:
> If we restrict this thing to being a table constraint, not a column
> constraint, it seems like the issue might go away (and in fact I think
> you might not even need col_name_keyword).

I never enabled the syntax for column constraints, so it was always a
part of ConstraintElem.

To make sure I understand what you're saying, you think that:

CREATE TABLE foo
( exclusion int, EXCLUSION (exclusion CHECK WITH =)
);

should work? It's not ambiguous, but I'm not an expert on the grammar,
so I don't immediately know a non-invasive way to express that.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> To make sure I understand what you're saying, you think that:

> CREATE TABLE foo
> (
>   exclusion int,
>   EXCLUSION (exclusion CHECK WITH =)
> );

> should work?

Well, it looks like it should be able to work, because left-paren
can't immediately follow a column name AFAIR.  Maybe I'm missing
something.  What's your grammar patch exactly, and what does
bison -v finger as being the problem?
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 19:05 -0500, Tom Lane wrote:
> > CREATE TABLE foo
> > (
> >   exclusion int,
> >   EXCLUSION (exclusion CHECK WITH =)
> > );
>
> Well, it looks like it should be able to work, because left-paren
> can't immediately follow a column name AFAIR.

I agree; I don't think it's ambiguous. The other possibility is the
optional "USING index_method" clause in between, but USING is already
reserved, so I don't see a problem there either.

> Maybe I'm missing
> something.  What's your grammar patch exactly, and what does
> bison -v finger as being the problem?
>

bison -v doesn't show anything useful beyond saying that there is one
shift/reduce conflict. The gram.output is 10MB, which doesn't help me
much (I'm still trying to make sense of it). I'd offer to send it along,
but I'm sure bison would produce the same thing for you.

Patch attached with EXCLUSION as a col_name_keyword and one shift/reduce
conflict.

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Fri, 2009-11-06 at 19:05 -0500, Tom Lane wrote:
>> Maybe I'm missing
>> something.  What's your grammar patch exactly, and what does
>> bison -v finger as being the problem?

> bison -v doesn't show anything useful beyond saying that there is one
> shift/reduce conflict. The gram.output is 10MB, which doesn't help me
> much (I'm still trying to make sense of it).

Well, you need to learn a bit more about bison I think.  The complaint
is

State 1135 conflicts: 1 shift/reduce

so we look at state 1135, which says:

state 1135

  241 alter_table_cmd: ADD_P . opt_column columnDef
  251                | ADD_P . TableConstraint

    CHECK       shift, and go to state 1698
    COLUMN      shift, and go to state 1742
    CONSTRAINT  shift, and go to state 1699
    EXCLUSION   shift, and go to state 1700
    FOREIGN     shift, and go to state 1701
    PRIMARY     shift, and go to state 1702
    UNIQUE      shift, and go to state 1703

    EXCLUSION  [reduce using rule 887 (opt_column)]
    $default   reduce using rule 887 (opt_column)

    TableConstraint  go to state 1743
    ConstraintElem   go to state 1705
    opt_column       go to state 1744

This is the state immediately after scanning "ADD" in an ALTER TABLE
command, and what it's unhappy about is that it has two different things
to do if the next token is EXCLUSION.  (The dot in the productions
indicates "where we are", and the square brackets mark the unreachable
action.)  If you check the other mentioned states it becomes clear that
the state-1700 path leads to deciding that EXCLUSION begins a
TableConstraint, while rule 887 is the "empty" alternative for
opt_column, and is what would have to be done next if EXCLUSION is a
column name beginning a ColumnDef.  So the difficulty is that it can't
be sure whether EXCLUSION is a column name without looking one token
past EXCLUSION, but it has to decide whether to eliminate COLUMN before
it can look ahead past EXCLUSION.

This is a pretty common difficulty with empty-producing productions.
The usual way around it is to eliminate the empty production by making
the calling production a bit more redundant.  In this case, we can fix
it by replacing

alter_table_cmd:
            ADD_P opt_column columnDef

with two productions

alter_table_cmd:
            ADD_P columnDef
            | ADD_P COLUMN columnDef

The reason this fixes it is that now the parser does not have to make
a shift-reduce decision while EXCLUSION is the next token: it's just
going to shift all the time, and it only has to reduce once EXCLUSION
is the current token and it can see the next one as lookahead.  (In
which case, it will reduce EXCLUSION to ColId and proceed with the
its-a-ColumnDef path, only if the next token isn't "(" or "USING".)

Another way to think about is is that we are forcing bison to split
this one state into two, but I find it easier to understand how to
fix the problem by looking for ways to postpone the reduce decision.

So IOW, you need the attached patch to the patch.

            regards, tom lane

*** gram.y~    Fri Nov  6 20:51:57 2009
--- gram.y    Fri Nov  6 20:58:39 2009
***************
*** 1595,1602 ****
          ;

  alter_table_cmd:
!             /* ALTER TABLE <name> ADD [COLUMN] <coldef> */
!             ADD_P opt_column columnDef
                  {
                      AlterTableCmd *n = makeNode(AlterTableCmd);
                      n->subtype = AT_AddColumn;
--- 1595,1610 ----
          ;

  alter_table_cmd:
!             /* ALTER TABLE <name> ADD <coldef> */
!             ADD_P columnDef
!                 {
!                     AlterTableCmd *n = makeNode(AlterTableCmd);
!                     n->subtype = AT_AddColumn;
!                     n->def = $2;
!                     $$ = (Node *)n;
!                 }
!             /* ALTER TABLE <name> ADD COLUMN <coldef> */
!             | ADD_P COLUMN columnDef
                  {
                      AlterTableCmd *n = makeNode(AlterTableCmd);
                      n->subtype = AT_AddColumn;

Re: operator exclusion constraints

From
Tom Lane
Date:
I wrote:
> So IOW, you need the attached patch to the patch.

... plus, of course, move EXCLUSION to the unreserved_keyword list.
Or maybe forget about it and go to EXCLUDE or EXCLUDING?
        regards, tom lane


Re: operator exclusion constraints

From
Andrew Dunstan
Date:

Tom Lane wrote:
> This is a pretty common difficulty with empty-producing productions.
> The usual way around it is to eliminate the empty production by making
> the calling production a bit more redundant.  In this case, we can fix
> it by replacing
>
> alter_table_cmd:
>             ADD_P opt_column columnDef
>
> with two productions
>
> alter_table_cmd:
>             ADD_P columnDef
>             | ADD_P COLUMN columnDef
>
> The reason this fixes it is that now the parser does not have to make
> a shift-reduce decision while EXCLUSION is the next token: it's just
> going to shift all the time, and it only has to reduce once EXCLUSION
> is the current token and it can see the next one as lookahead.  (In
> which case, it will reduce EXCLUSION to ColId and proceed with the
> its-a-ColumnDef path, only if the next token isn't "(" or "USING".)
>
> Another way to think about is is that we are forcing bison to split
> this one state into two, but I find it easier to understand how to
> fix the problem by looking for ways to postpone the reduce decision.
>
>   

This is a pretty good short explanation of how to deal with shift/reduce 
problems in bison. With your permission I'm going to copy it to the Wiki 
- it's amazing given the ubiquity of bison, yacc and friends how little 
the mechanics of LALR(1) parsers are understood. We've had many people 
puzzling over it over the years.

cheers

andrew


Re: operator exclusion constraints

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> This is a pretty good short explanation of how to deal with shift/reduce 
> problems in bison. With your permission I'm going to copy it to the Wiki 

If you like, but I think the part about figuring out which production
is the problem seemed to be at least as important for Jeff ...
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 21:20 -0500, Tom Lane wrote:
> > bison -v doesn't show anything useful beyond saying that there is one
> > shift/reduce conflict. The gram.output is 10MB, which doesn't help me
> > much (I'm still trying to make sense of it).
>
> Well, you need to learn a bit more about bison I think.

Yes, I do. Thank you very much for the detailed explanation; it was very
informative.

I have made the suggested changes, and now it works easily with
EXCLUSION, EXCLUDING, EXCLUSIVE, or EXCLUDE. I have also merged with the
latest changes, and I did another cleanup pass on the patch.

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-06 at 21:23 -0500, Tom Lane wrote:
> Or maybe forget about it and go to EXCLUDE or EXCLUDING?

I left it as EXCLUSION for now. "EXCLUDING USING ..." and "EXCLUSIVE
USING ..." both sound a little awkward to me. Either could be improved
by moving the USING clause around, but that just creates more grammar
headaches.

EXCLUDE probably flows most nicely with the optional USING clause or
without. My only complaint was that it's a transitive verb, so it seems
to impart more meaning than it actually can. I doubt anyone would
actually be more confused in practice, though. If a couple of people
agree, I'll change it to EXCLUDE.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> EXCLUDE probably flows most nicely with the optional USING clause or
> without. My only complaint was that it's a transitive verb, so it seems
> to impart more meaning than it actually can. I doubt anyone would
> actually be more confused in practice, though. If a couple of people
> agree, I'll change it to EXCLUDE.

EXCLUDE sounds good to me.
        regards, tom lane


Re: operator exclusion constraints

From
Robert Haas
Date:
On Sat, Nov 7, 2009 at 1:56 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2009-11-06 at 21:23 -0500, Tom Lane wrote:
>> Or maybe forget about it and go to EXCLUDE or EXCLUDING?
>
> I left it as EXCLUSION for now. "EXCLUDING USING ..." and "EXCLUSIVE
> USING ..." both sound a little awkward to me. Either could be improved
> by moving the USING clause around, but that just creates more grammar
> headaches.
>
> EXCLUDE probably flows most nicely with the optional USING clause or
> without. My only complaint was that it's a transitive verb, so it seems
> to impart more meaning than it actually can. I doubt anyone would
> actually be more confused in practice, though. If a couple of people
> agree, I'll change it to EXCLUDE.

Personally, I think that this is all rather a matter of opinion, and
of course bikeshedding.  CHECK is a verb, which might suggest that
EXCLUDE is the best choice, and it has a nice declarative sound to it.But the other example is FOREIGN KEY, which is
nota verb at all,
 
which seems to me to more closely parallel EXCLUSION or perhaps
EXCLUDING.  I think I like EXCLUSIVE the least of the four, but at the
end of the day, I don't think we can really go far wrong.

I also don't think there's anything wrong with EXCLUDING USING, nor
anything more wrong EXCLUSIVE USING than there is with EXCLUSIVE
alone.  Nor do I think there's any problem with EXCLUDE being
transitive because, of course, we're going to follow it with a
description of what we want to exclude, which may be thought of as its
direct object.  Once again, I don't think we can go far wrong.

Honestly, I'd probably be in favor of breaking the virtual tie in
favor of whichever word is already a keyword, rather than trying to
decide on (IMHO extremely tenuous) grammatical grounds.  But I can't
get worked up about that one way or the other either.

...Robert


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 7, 2009, at 11:08 AM, Tom Lane wrote:

>> EXCLUDE probably flows most nicely with the optional USING clause or
>> without. My only complaint was that it's a transitive verb, so it  
>> seems
>> to impart more meaning than it actually can. I doubt anyone would
>> actually be more confused in practice, though. If a couple of people
>> agree, I'll change it to EXCLUDE.
>
> EXCLUDE sounds good to me.

+1

David


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sat, 2009-11-07 at 14:11 -0500, Robert Haas wrote:
> Honestly, I'd probably be in favor of breaking the virtual tie in
> favor of whichever word is already a keyword

The ones that are already keywords are EXCLUSIVE and EXCLUDING, which
are also the least desirable, so that rule doesn't work as a
tie-breaker.

I think that EXCLUSION and EXCLUDE are the options still in the running
here.

Regards,Jeff Davis




Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2009-11-07 at 14:11 -0500, Robert Haas wrote:
>> Honestly, I'd probably be in favor of breaking the virtual tie in
>> favor of whichever word is already a keyword

> The ones that are already keywords are EXCLUSIVE and EXCLUDING, which
> are also the least desirable, so that rule doesn't work as a
> tie-breaker.

I think it doesn't really matter now that we've succeeded in making the
keyword unreserved.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sat, 2009-11-07 at 10:56 -0800, Jeff Davis wrote:
> EXCLUDE probably flows most nicely with the optional USING clause or
> without. My only complaint was that it's a transitive verb, so it seems
> to impart more meaning than it actually can. I doubt anyone would
> actually be more confused in practice, though. If a couple of people
> agree, I'll change it to EXCLUDE.

It looks like EXCLUDE is the winner. Updated patch attached.

The feature is still called "operator exclusion constraints", and the
docs still make reference to that name, but the syntax specification has
been updated.

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Simon Riggs
Date:
On Sun, 2009-11-08 at 13:41 -0800, Jeff Davis wrote:
> On Sat, 2009-11-07 at 10:56 -0800, Jeff Davis wrote:
> > EXCLUDE probably flows most nicely with the optional USING clause or
> > without. My only complaint was that it's a transitive verb, so it seems
> > to impart more meaning than it actually can. I doubt anyone would
> > actually be more confused in practice, though. If a couple of people
> > agree, I'll change it to EXCLUDE.
> 
> It looks like EXCLUDE is the winner. Updated patch attached.
> 
> The feature is still called "operator exclusion constraints", and the
> docs still make reference to that name, but the syntax specification has
> been updated.

Don't think that name is very useful either... sounds like you want to
exclude operators, which is why I got lost in the first place. I'd call
them "generic exclusion constraints" or "user-defined exclusion
constraints". Sorry for this.

-- Simon Riggs           www.2ndQuadrant.com



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-11-08 at 22:03 +0000, Simon Riggs wrote:
> Don't think that name is very useful either... sounds like you want to
> exclude operators, which is why I got lost in the first place. I'd call
> them "generic exclusion constraints" or "user-defined exclusion
> constraints". Sorry for this.

Either of those names are fine with me, too. The current name is a
somewhat shortened version of the name "operator-based exclusion
constraints", so we can also just use that name. Or, just "exclusion
constraints".

I'll leave the current patch as-is for now, and wait for some reviewer
comments. This is purely a documentation issue, so there are bound to be
a few of these things that I can clarify at once.

Regards,Jeff Davis




Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 8, 2009, at 7:43 PM, Jeff Davis wrote:

> Either of those names are fine with me, too. The current name is a
> somewhat shortened version of the name "operator-based exclusion
> constraints", so we can also just use that name. Or, just "exclusion
> constraints".

(exclusion constraints)++

David


Re: operator exclusion constraints

From
Greg Stark
Date:
On Mon, Nov 9, 2009 at 5:12 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Nov 8, 2009, at 7:43 PM, Jeff Davis wrote:
>
>> Either of those names are fine with me, too. The current name is a
>> somewhat shortened version of the name "operator-based exclusion
>> constraints", so we can also just use that name. Or, just "exclusion
>> constraints".
>
> (exclusion constraints)++


Out of curiosity, is this feature at all similar to SQL assertions?
What would we be missing to turn this into them?


-- 
greg


Re: operator exclusion constraints

From
Alvaro Herrera
Date:
Greg Stark escribió:

> Out of curiosity, is this feature at all similar to SQL assertions?
> What would we be missing to turn this into them?

I see no relationship to assertions.  Those are not tied to any
particular table, and are defined with any random expression you care to
think of.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: operator exclusion constraints

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > This is a pretty good short explanation of how to deal with shift/reduce 
> > problems in bison. With your permission I'm going to copy it to the Wiki 
> 
> If you like, but I think the part about figuring out which production
> is the problem seemed to be at least as important for Jeff ...

I agree that it would be worth an entry here
http://wiki.postgresql.org/wiki/Developer_FAQ

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-11-09 at 18:03 +0000, Greg Stark wrote:
> Out of curiosity, is this feature at all similar to SQL assertions?
> What would we be missing to turn this into them?

I addressed that here:

http://archives.postgresql.org/pgsql-hackers/2009-11/msg00049.php

The exclusion constraint mechanism can enforce a subset of the
constraints that ASSERT can express; although the same goes for all
other constraints, because ASSERT is very general.

The exclusion constraint mechanism requires finding the physical tuples
that cause a conflict, so that we know when to wait and on which
transaction to wait. Otherwise, we have to wait on all transactions;
i.e. serialize.

The problem with ASSERT is that it expresses a constraint based on a
query, which can return arbitrary logical records after an arbitrary
amount of manipulation. So there's no way to work backwards. If we try,
we'll end up either:(a) supporting only a tiny subset, and throwing bizarre errors that
users don't understand when they try to work outside the template; or(b) deciding to serialize when we can't do better,
andagain, users
 
will be confused about the performance and locking characteristics.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Sun, Nov 8, 2009 at 4:41 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sat, 2009-11-07 at 10:56 -0800, Jeff Davis wrote:
>> EXCLUDE probably flows most nicely with the optional USING clause or
>> without. My only complaint was that it's a transitive verb, so it seems
>> to impart more meaning than it actually can. I doubt anyone would
>> actually be more confused in practice, though. If a couple of people
>> agree, I'll change it to EXCLUDE.
>
> It looks like EXCLUDE is the winner. Updated patch attached.
>
> The feature is still called "operator exclusion constraints", and the
> docs still make reference to that name, but the syntax specification has
> been updated.

[ reviewing ]

First thoughts:

I think the create_table documentation gets into a little too much
gorey detail.  I'm willing to take a pass at improving it, if you'd
like, but generally I think it should avoid discussion of
implementation details.  For example, saying that it's not as fast as
a UNIQUE constraint is good; the fact that an extra index lookup is
involved is probably overkill.   Incidentally, the wording in the
first paragraph fails to take into account the possibility that there
are multiple operators.

In index_create(), the elog() where relopxconstraints < 0 should just
complain about the value being negative, I think, rather than listing
the value.  If you just say the value is -3, it doesn't give the user
a clue why that's bad.

There is a spurious diff hunk for reindex_relation().

In ATRewriteTable() you reindent a bunch of variable declarations;
pg_indent will undo this, so you should nix this part.

In ATAddOperatorExclusionConstraint(), the message "method %s does not
support gettuple" seems a bit user-unfriendly.  Can we explain the
problem by referring to the functionality of getttuple(), rather than
the name of it?

I don't really like this message, for a number of reasons.

alter table foo add constraint bar exclude (a check with =, b check with =);
ERROR:  operator exclusion constraint violation detected: "foo_a_exclusion"
DETAIL:  Tuple "(1, 1, 2)" conflicts with existing tuple "(1, 1, 3)".

The corresponding error for a UNIQUE index is: could not create unique
index "bar", which I like better.  Only the relevant columns from the
tuples are dumped, and the tuple is not surrounded by double quotes;
any reason not to parallel that here?  Also, the message is all
lower-case.  Similarly, for an insert/update situation, it seems that
a message like "key value violates exclusion constraint \"%s\"" would
be better than the existing message.

alter table X add constraint Y exclude (...) seems to ignore the value
of Y and create both the constraint and the index with a name of its
own choosing.

As a quick performance test, I inserted a million 3-integer tuples
into a 3-column table with a unique constraint or an operator
exclusion constraint over all three columns.  The former took ~ 15 s,
the latter ~ 21.5 s.  That seems acceptable.

I think preprocessOpExConstraints should be called
transformOpxConstraints - opx rather than opex because that's what you
most frequently use elsewhere in the patch, and transform rather than
preprocess for consistency with other, similar functions.

In ruleutils.c, the prototype for pg_get_opxdef_worker() has a small
whitespace inconsistency relative to the surrounding declarations.

I haven't really grokked the substantive things that this patch is
doing yet - these are just preliminary comments based on a quick
read-through.  I'll write more after I have a chance to look at it in
more detail.

...Robert


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 13, 2009, at 8:39 PM, Robert Haas wrote:

> alter table foo add constraint bar exclude (a check with =, b check with =);

I've been meaning to comment on this syntax one more time; apologies for the bike-shedding. But I'm wondering if the
"CHECK"is strictly necessary there, since the WITH seems adequate, and there was some discussion before about the CHECK
keywordpossibly causing confusion with check constraints. 

Best,

David



Re: operator exclusion constraints

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Nov 13, 2009, at 8:39 PM, Robert Haas wrote:
>> alter table foo add constraint bar exclude (a check with =, b check with =);

> I've been meaning to comment on this syntax one more time; apologies for the bike-shedding. But I'm wondering if the
"CHECK"is strictly necessary there, since the WITH seems adequate, and there was some discussion before about the CHECK
keywordpossibly causing confusion with check constraints.
 

I had been manfully restraining myself from re-opening this discussion,
but yeah I was thinking the same thing.  The original objection to using
just WITH was that it wasn't very clear what you were doing "with" the
operator; but that was back when we had a different initial keyword for
the construct.  EXCLUDE ... WITH ... seems to match up pretty naturally.
        regards, tom lane


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Nov 14, 2009, at 8:55 AM, Tom Lane wrote:

>> I've been meaning to comment on this syntax one more time; apologies for the bike-shedding. But I'm wondering if the
"CHECK"is strictly necessary there, since the WITH seems adequate, and there was some discussion before about the CHECK
keywordpossibly causing confusion with check constraints. 
>
> I had been manfully restraining myself from re-opening this discussion,
> but yeah I was thinking the same thing.  The original objection to using
> just WITH was that it wasn't very clear what you were doing "with" the
> operator; but that was back when we had a different initial keyword for
> the construct.  EXCLUDE ... WITH ... seems to match up pretty naturally.

You're more man than I, Tom, but yeah, with EXCLUDE, WITH works well on its own, methinks.

Best,

David

Re: operator exclusion constraints

From
Robert Haas
Date:
On Sat, Nov 14, 2009 at 12:11 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Nov 14, 2009, at 8:55 AM, Tom Lane wrote:
>
>>> I've been meaning to comment on this syntax one more time; apologies for the bike-shedding. But I'm wondering if
the"CHECK" is strictly necessary there, since the WITH seems adequate, and there was some discussion before about the
CHECKkeyword possibly causing confusion with check constraints. 
>>
>> I had been manfully restraining myself from re-opening this discussion,
>> but yeah I was thinking the same thing.  The original objection to using
>> just WITH was that it wasn't very clear what you were doing "with" the
>> operator; but that was back when we had a different initial keyword for
>> the construct.  EXCLUDE ... WITH ... seems to match up pretty naturally.
>
> You're more man than I, Tom, but yeah, with EXCLUDE, WITH works well on its own, methinks.

I haven't thought about this too deeply, but could we allow the "with
=" part to be optional?  And would it be a good idea?  Seems like you
would commonly have one or more keys that exclude on equality and then
the last one would use an overlap-type operator.

...Robert


Re: operator exclusion constraints

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I haven't thought about this too deeply, but could we allow the "with
> =" part to be optional?  And would it be a good idea?

I don't think so.  We generally do not believe in defaulting operators
based on name.  If there were a way to select the "standard" exclusion
operator based on opclass membership it might make sense, but almost by
definition this facility is going to be working with unusual opclasses
that might not even have an equality slot.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-11-13 at 23:39 -0500, Robert Haas wrote:
> [ reviewing ]

Thank you for the comments so far.

> In index_create(), the elog() where relopxconstraints < 0 should just
> complain about the value being negative, I think, rather than listing
> the value.  If you just say the value is -3, it doesn't give the user
> a clue why that's bad.

Hopefully the user never sees that message -- it's almost an Assert.
PostgreSQL uses elog(ERROR,...) in many places that should be
unreachable, but might happen due to bugs in distant places or
corruption. I'm not sure the exact convention there, but I figure that
some details are appropriate.

I tried to make all user-visible errors into ereport()s.

> In ATAddOperatorExclusionConstraint(), the message "method %s does not
> support gettuple" seems a bit user-unfriendly.  Can we explain the
> problem by referring to the functionality of getttuple(), rather than
> the name of it?

How about "operator exclusion constraints don't support method X"? Then
perhaps have a detail-level message to explain that the access method
needs to support the gettuple interface.

Trying to describe what gettuple does doesn't help a humble user much.
All they care about is "can't use gin".

> I don't really like this message, for a number of reasons.
> 
> alter table foo add constraint bar exclude (a check with =, b check with =);
> ERROR:  operator exclusion constraint violation detected: "foo_a_exclusion"
> DETAIL:  Tuple "(1, 1, 2)" conflicts with existing tuple "(1, 1, 3)".
> 
> The corresponding error for a UNIQUE index is: could not create unique
> index "bar", which I like better.  Only the relevant columns from the
> tuples are dumped, and the tuple is not surrounded by double quotes;
> any reason not to parallel that here?

By "relevant columns" I assume you mean the entire index tuple. That
means we need to have column names represented somewhere, because we
don't want the user to have to match up ordinal index columns.

Also, with exclusion constraints, both values are always relevant, not
just the one being inserted. What if the user just wants to adjust their
request slightly to avoid an overlap -- how do they know how far to go?
I know this could be accomplished with extra queries, as well, but that
doesn't always work for someone looking through the logs after the fact,
when the original values may be gone.

So, the kind of error message you're suggesting starts to get awkward: (a: 1 = 1, b: 1 = 1)

or something? And then with more complex type output functions, and
expression indexes, it starts to look very complex.

What do you think is the cleanest approach?

> Also, the message is all
> lower-case.

I know the error conventions are documented somewhere, but I completely
forgot where. Can you please point me to the right place? I thought most
error messages were supposed to be lower case, and detail messages were
supposed to read like sentences.

> Similarly, for an insert/update situation, it seems that
> a message like "key value violates exclusion constraint \"%s\"" would
> be better than the existing message.

I can certainly simplify it, but I was trying to match the usefulness of
UNIQUE constraint error messages.

> As a quick performance test, I inserted a million 3-integer tuples
> into a 3-column table with a unique constraint or an operator
> exclusion constraint over all three columns.  The former took ~ 15 s,
> the latter ~ 21.5 s.  That seems acceptable.

Great news. I had similar results, though they depend on the conflict
percentage as well (I assume you had zero conflicts).

Regards,Jeff Davis



Re: operator exclusion constraints

From
Brendan Jurd
Date:
2009/11/15 Jeff Davis <pgsql@j-davis.com>:
> I know the error conventions are documented somewhere, but I completely
> forgot where. Can you please point me to the right place? I thought most
> error messages were supposed to be lower case, and detail messages were
> supposed to read like sentences.

http://www.postgresql.org/docs/current/static/error-style-guide.html

And you are correct.

Cheers,
BJ


Re: operator exclusion constraints

From
Greg Stark
Date:
On Sat, Nov 14, 2009 at 6:00 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Hopefully the user never sees that message -- it's almost an Assert.
> PostgreSQL uses elog(ERROR,...) in many places that should be
> unreachable, but might happen due to bugs in distant places or
> corruption. I'm not sure the exact convention there, but I figure that
> some details are appropriate.

Yeah, I think that's right. I think part of the rationale is that if
the admin mucks around with catalog tables or does some DDL with
inconsistent definitions (like an operator class that isn't internally
consistent for example) then we don't treat those errors as
user-visible errors that need to be translated but they shouldn't
cause a database crash either.

If it's possible for the case to arrive through users doing things
through entirely supported means then they might need to be real
ereports(). But I guess there are grey areas.

-- 
greg


Re: operator exclusion constraints

From
Jeff Davis
Date:
New patches attached. You may find it easiest to follow the changes I'm
making through my git repo:

http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints

Note, the attached patches also changed CHECK WITH to just WITH, as
concluded in this discussion:

http://archives.postgresql.org/pgsql-hackers/2009-11/msg00785.php

On Fri, 2009-11-13 at 23:39 -0500, Robert Haas wrote:
> I think the create_table documentation gets into a little too much
> gorey detail.  I'm willing to take a pass at improving it, if you'd
> like, but generally I think it should avoid discussion of
> implementation details.  For example, saying that it's not as fast as
> a UNIQUE constraint is good; the fact that an extra index lookup is
> involved is probably overkill.   Incidentally, the wording in the
> first paragraph fails to take into account the possibility that there
> are multiple operators.

Fixed. Of course, I welcome any further revisions you have.

> There is a spurious diff hunk for reindex_relation().

Fixed.

> In ATRewriteTable() you reindent a bunch of variable declarations;
> pg_indent will undo this, so you should nix this part.

Fixed.

> In ATAddOperatorExclusionConstraint(), the message "method %s does not
> support gettuple" seems a bit user-unfriendly.  Can we explain the
> problem by referring to the functionality of getttuple(), rather than
> the name of it?

Now it looks like:

ERROR:  method "gin" does not support operator exclusion constraints
DETAIL:  The index access method must support the gettuple() interface
to be used with an operator exclusion constraint.

Hopefully that is an improvement.

> alter table X add constraint Y exclude (...) seems to ignore the value
> of Y and create both the constraint and the index with a name of its
> own choosing.

Bug, and fixed.

> I think preprocessOpExConstraints should be called
> transformOpxConstraints - opx rather than opex because that's what you
> most frequently use elsewhere in the patch, and transform rather than
> preprocess for consistency with other, similar functions.

Fixed.

> In ruleutils.c, the prototype for pg_get_opxdef_worker() has a small
> whitespace inconsistency relative to the surrounding declarations.

Fixed.

Thanks,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Sat, Nov 14, 2009 at 1:58 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sat, Nov 14, 2009 at 6:00 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> Hopefully the user never sees that message -- it's almost an Assert.
>> PostgreSQL uses elog(ERROR,...) in many places that should be
>> unreachable, but might happen due to bugs in distant places or
>> corruption. I'm not sure the exact convention there, but I figure that
>> some details are appropriate.
>
> Yeah, I think that's right. I think part of the rationale is that if
> the admin mucks around with catalog tables or does some DDL with
> inconsistent definitions (like an operator class that isn't internally
> consistent for example) then we don't treat those errors as
> user-visible errors that need to be translated but they shouldn't
> cause a database crash either.
>
> If it's possible for the case to arrive through users doing things
> through entirely supported means then they might need to be real
> ereports(). But I guess there are grey areas.

I guess my point wasn't that the message was likely to be exercised,
but rather that it isn't obvious that it's describing an error
condition at all.  If you get this message:

relation "whatever" has relopxconstraints = -3

...you can't even tell that it's an error message unless it happens to
be preceded by "ERROR: ".  If you get something like:

relation "whatever" has invalid relopxconstraints value (-3)

...it's much more clear that this is not just informational.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sat, 2009-11-14 at 09:11 -0800, David E. Wheeler wrote:
> On Nov 14, 2009, at 8:55 AM, Tom Lane wrote:
> > I had been manfully restraining myself from re-opening this discussion,
> > but yeah I was thinking the same thing.  The original objection to using
> > just WITH was that it wasn't very clear what you were doing "with" the
> > operator; but that was back when we had a different initial keyword for
> > the construct.  EXCLUDE ... WITH ... seems to match up pretty naturally.
> 
> You're more man than I, Tom, but yeah, with EXCLUDE, WITH works well on its own, methinks.

Changed in new patch here:

http://archives.postgresql.org/message-id/1258226849.708.97.camel@jdavis

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sat, 2009-11-14 at 14:36 -0500, Robert Haas wrote:
> I guess my point wasn't that the message was likely to be exercised,
> but rather that it isn't obvious that it's describing an error
> condition at all.  If you get this message:
> 
> relation "whatever" has relopxconstraints = -3
> 

I pretty much copied similar code for relchecks, see
pg_constraint.c:570.

If you have a suggestion, I'll make the change. It doesn't sound too
urgent though, to me.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-11-09 at 09:12 -0800, David E. Wheeler wrote:
> On Nov 8, 2009, at 7:43 PM, Jeff Davis wrote:
> 
> > Either of those names are fine with me, too. The current name is a
> > somewhat shortened version of the name "operator-based exclusion
> > constraints", so we can also just use that name. Or, just "exclusion
> > constraints".
> 
> (exclusion constraints)++

Ok, I guess this is another issue that requires consensus.

Note: this is purely for documentation, release notes, and user-visible
error messages. This does not have any impact on the syntax, I think
we've got a strong consensus on that already and I would prefer not to
break that discussion open again.

1. Operator Exclusion Constraints (current)
2. Generic/Generalized/General Exclusion Constraints
3. Exclusion Constraints (has the potential to cause confusion with
constraint_exclusion)

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Sat, Nov 14, 2009 at 2:39 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> If you have a suggestion, I'll make the change. It doesn't sound too
> urgent though, to me.

Yeah, probably not.

...Robert


Re: operator exclusion constraints

From
Robert Haas
Date:
On Sat, Nov 14, 2009 at 2:27 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> New patches attached.

Forgive me if this is discussed before, but why does this store the
strategy numbers of the relevant operators instead of the operators
themselves?  It seems like this could lead to surprising behavior if
the user modifies the definition of the operator class.

I'm wondering if we can't use the existing
BuildIndexValueDescription() rather than the new function
tuple_as_string().  I realize there are two tuples, but maybe it makes
sense to just call it twice?

I'm attaching a revised doc patch for your consideration.

...Robert

Attachment

Re: operator exclusion constraints

From
Jeff Davis
Date:
I'm in Tokyo right now, so please excuse my abbreviated reply.

On Tue, 2009-11-17 at 23:13 -0500, Robert Haas wrote:
> Forgive me if this is discussed before, but why does this store the
> strategy numbers of the relevant operators instead of the operators
> themselves?

At constraint definition time, I need to make sure that the strategy
numbers can be identified anyway, so it wouldn't save any work in
ATAddOperatorExclusionConstraint. At the time it seemed slightly more
direct to use strategy numbers in index_check_constraint, but it's
probably about the same.

> It seems like this could lead to surprising behavior if
> the user modifies the definition of the operator class.

Right now, operator classes can't be modified in any meaningful way. Am
I missing something?

> I'm wondering if we can't use the existing
> BuildIndexValueDescription() rather than the new function
> tuple_as_string().  I realize there are two tuples, but maybe it makes
> sense to just call it twice?

Are you suggesting I change the error output, or reorganize the code to
try to reuse BuildIndexValueDescription, or both?

Regards,Jeff Davis



Re: operator exclusion constraints

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Forgive me if this is discussed before, but why does this store the
> strategy numbers of the relevant operators instead of the operators
> themselves?  It seems like this could lead to surprising behavior if
> the user modifies the definition of the operator class.

Wild guess:
 http://www.postgresql.org/docs/8.4/static/xindex.html
 34.14.2. Index Method Strategies
 The operators associated with an operator class are identified by "strategy numbers", which serve to identify the
semanticsof each operator within the context of its operator class. For example, B-trees impose a strict ordering on
keys,lesser to greater, and so operators like "less than" and "greater than or equal to" are interesting with respect
toa B-tree. Because PostgreSQL allows the user to define operators, PostgreSQL cannot look at the name of an operator
(e.g.,< or >=) and tell what kind of comparison it is. Instead, the index method defines a set of "strategies", which
canbe thought of as generalized operators. Each operator class specifies which actual operator corresponds to each
strategyfor a particular data type and interpretation of the index semantics.
 

Regards,
-- 
dim


Re: operator exclusion constraints

From
Josh Berkus
Date:
All,

FWIW, I'm doing a redesign of a client's production web application
right now.  I was able, by combining OEC and the Period type from
pgfoundry, to make a set of constraints for declaratively asserting in a
sports database:

That the same player couldn't belong to two different teams at the same
time;
That the same player couldn't belong to the same team in two different
positions with overlapping time periods.

This worked as spec'd, and would be extremely useful for this real-world
app if it was ready to use in production now.

However, I do have an issue with the SQLSTATE returned from the OEC
violation.  Currently it returns constraint violation, which, from the
perspective of an application developer, is not useful.  OECs are, in
application terms, materially identical to UNIQUE constraints and serve
the same purpose.  As such, I'd far rather see OECs return unique key
violation instead, as any existing application error-trapping code would
handle the violation more intelligently if it did.

--Josh Berkus



Re: operator exclusion constraints

From
Robert Haas
Date:
On Wed, Nov 18, 2009 at 9:00 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> I'm in Tokyo right now, so please excuse my abbreviated reply.
>
> On Tue, 2009-11-17 at 23:13 -0500, Robert Haas wrote:
>> Forgive me if this is discussed before, but why does this store the
>> strategy numbers of the relevant operators instead of the operators
>> themselves?
>
> At constraint definition time, I need to make sure that the strategy
> numbers can be identified anyway, so it wouldn't save any work in
> ATAddOperatorExclusionConstraint. At the time it seemed slightly more
> direct to use strategy numbers in index_check_constraint, but it's
> probably about the same.

It sets off a red flag for me any time I see code that asks for A from
the user and then actually stores B under the hood, for fear that the
relationship that A and B might change.  However...

>> It seems like this could lead to surprising behavior if
>> the user modifies the definition of the operator class.
>
> Right now, operator classes can't be modified in any meaningful way. Am
> I missing something?

...poking at it, I have to agree that at least as things stand right
now, I can't find a way to break it.  Not sure if it's future-proof.

>> I'm wondering if we can't use the existing
>> BuildIndexValueDescription() rather than the new function
>> tuple_as_string().  I realize there are two tuples, but maybe it makes
>> sense to just call it twice?
>
> Are you suggesting I change the error output, or reorganize the code to
> try to reuse BuildIndexValueDescription, or both?

I was thinking maybe you call BuildIndexValueDescription twice and
make the error message say something like <output of first call>
conflicts with <output of second call>.

One other thing I noticed tonight while poking at this.  If I install
contrib/citext, I can do this:

create table test (a citext, exclude using hash (a with =));

But if I install contrib/intarray, I can't do this:

create table test (a int4[], exclude using gist (a with =));
ERROR:  operator does not exist: integer[] = integer[]

Not sure if I'm doing something wrong, or if this is a limitation of
the design, or if it's a bug, but it seems strange.  I'm guessing it's
because intarray uses the anyarray operator rather than a dedicated
operator for int[], but it seems like that ought to work.

...Robert


Re: operator exclusion constraints

From
Robert Haas
Date:
On Wed, Nov 18, 2009 at 9:21 AM, Josh Berkus <josh@agliodbs.com> wrote:
> All,
>
> FWIW, I'm doing a redesign of a client's production web application
> right now.  I was able, by combining OEC and the Period type from
> pgfoundry, to make a set of constraints for declaratively asserting in a
> sports database:
>
> That the same player couldn't belong to two different teams at the same
> time;
> That the same player couldn't belong to the same team in two different
> positions with overlapping time periods.
>
> This worked as spec'd, and would be extremely useful for this real-world
> app if it was ready to use in production now.
>
> However, I do have an issue with the SQLSTATE returned from the OEC
> violation.  Currently it returns constraint violation, which, from the
> perspective of an application developer, is not useful.  OECs are, in
> application terms, materially identical to UNIQUE constraints and serve
> the same purpose.  As such, I'd far rather see OECs return unique key
> violation instead, as any existing application error-trapping code would
> handle the violation more intelligently if it did.

I guess I'm going to have to vote -1 on this proposal.  I code see
inventing a pgsql-specific SQLSTATE value for exclusion constraints,
since they will be a pgsql-specific extension, but reusing the unique
key violation value seems misleading.  I admit it may help in a
limited number of cases, but IMHO it's not worth the confusion.

That's just my $0.02, though.

...Robert


Re: operator exclusion constraints

From
Josh Berkus
Date:
RObert,

> I guess I'm going to have to vote -1 on this proposal.  I code see
> inventing a pgsql-specific SQLSTATE value for exclusion constraints,
> since they will be a pgsql-specific extension, but reusing the unique
> key violation value seems misleading.  I admit it may help in a
> limited number of cases, but IMHO it's not worth the confusion.

I'd rather have a new one than just using "contstraint violation" which
is terribly non-specific, and generally makes the application developer
think that a value is too large.

--Josh BErkus



Re: operator exclusion constraints

From
David Fetter
Date:
On Fri, Nov 20, 2009 at 01:36:59PM +0900, Josh Berkus wrote:
> RObert,
> 
> > I guess I'm going to have to vote -1 on this proposal.  I code see
> > inventing a pgsql-specific SQLSTATE value for exclusion constraints,
> > since they will be a pgsql-specific extension, but reusing the unique
> > key violation value seems misleading.  I admit it may help in a
> > limited number of cases, but IMHO it's not worth the confusion.
> 
> I'd rather have a new one than just using "contstraint violation" which
> is terribly non-specific, and generally makes the application developer
> think that a value is too large.

What, if anything, does the standard have to say about violations of
ASSERTIONs?  I know these aren't ASSERTIONs, but they much more
closely resemble them than they do UNIQUE constraints.  For example,
if the operator is <> instead of =, a violation is actually the
opposite of a UNIQUE violation.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: operator exclusion constraints

From
Peter Eisentraut
Date:
On sön, 2009-11-22 at 16:03 -0800, David Fetter wrote:
> What, if anything, does the standard have to say about violations of
> ASSERTIONs?  I know these aren't ASSERTIONs, but they much more
> closely resemble them than they do UNIQUE constraints.

An assertion is by definition a constraint that is a schema component
independent of a table.  Which an exclusion constraint may or may not
be, but it's an independent issue.  (To clarify: It currently can't be,
because assertions are not implemented, but when they are, it could be.)
For the same reason, assertions don't have separate error codes, because
they are just constraints after all.




Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-19 at 22:55 -0500, Robert Haas wrote:
> > At constraint definition time, I need to make sure that the strategy
> > numbers can be identified anyway, so it wouldn't save any work in
> > ATAddOperatorExclusionConstraint. At the time it seemed slightly more
> > direct to use strategy numbers in index_check_constraint, but it's
> > probably about the same.
>
> It sets off a red flag for me any time I see code that asks for A from
> the user and then actually stores B under the hood, for fear that the
> relationship that A and B might change.  However...

Still not addressed because it touches a bit more code (including the
catalogs). I basically agree, however, so I'll take care of this soon.

> I was thinking maybe you call BuildIndexValueDescription twice and
> make the error message say something like <output of first call>
> conflicts with <output of second call>.

Do you really think that's a better error message, or are you just
trying to re-use similar code?

Let's start from how the error message should read, and then see if we
can re-use some code to make it look that way. It's one of the most
visible aspects of the feature, and it needs to be reasonably concise
and understandable in the simple case, but contain all of the necessary
information.

I think it's better to avoid the "=" when describing the conflict. I
tend to read it as "equals" even though it's just punctuation in this
case, so it would be distracting. I could change it to a colon, I
suppose.

> create table test (a int4[], exclude using gist (a with =));
> ERROR:  operator does not exist: integer[] = integer[]

Thanks, fixed. I am now using compatible_oper_opid(), which will find
any operators that don't require binary-incompatible coercion of the
operands.

Do you think there's any reason to support binary-incompatible coercion
of the operands? I can't think of a single use case, and if you really
need to, you can coerce the types explicitly in the expression.

Changes in attached patch:

  * use compatible_oper_opid() to find operator from name
  * Add new SQLSTATE 23P01 for the operator exclusion constraint error,
    so that applications can more easily distinguish it from other
    constraint violations.

Regards,
    Jeff Davis


Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Wed, Nov 25, 2009 at 3:23 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> I was thinking maybe you call BuildIndexValueDescription twice and
>> make the error message say something like <output of first call>
>> conflicts with <output of second call>.
>
> Do you really think that's a better error message, or are you just
> trying to re-use similar code?
>
> Let's start from how the error message should read, and then see if we
> can re-use some code to make it look that way. It's one of the most
> visible aspects of the feature, and it needs to be reasonably concise
> and understandable in the simple case, but contain all of the necessary
> information.
>
> I think it's better to avoid the "=" when describing the conflict. I
> tend to read it as "equals" even though it's just punctuation in this
> case, so it would be distracting. I could change it to a colon, I
> suppose.

I disagree wholeheartedly.  :-)  My ideal error message is something like:

DETAIL: (a, b, c)=(1, 2, 3) conflicts with (a, b, c)=(4, 5, 6)

In particular, I think it's very important that we only emit the
columns which are part of the operator exclusion constraints, and NOT
all the columns of the tuple.  The entire tuple could be very large -
one of the columns not involved in the constraint could be a 4K block
of text, for example, and spitting that out only obscures the real
source of the problem.  You could argue that one of the columns
involved in the constraint could be a 4K block text, too, but in
practice I think the constraint columns are likely to be short nine
times out of ten.  The equals sign is equating the column names to the
column values, not the values to each other, so I don't see that as
confusing.

>> create table test (a int4[], exclude using gist (a with =));
>> ERROR:  operator does not exist: integer[] = integer[]
>
> Thanks, fixed. I am now using compatible_oper_opid(), which will find
> any operators that don't require binary-incompatible coercion of the
> operands.
>
> Do you think there's any reason to support binary-incompatible coercion
> of the operands? I can't think of a single use case, and if you really
> need to, you can coerce the types explicitly in the expression.

My operator-class-fu is insufficient to render judgment on this point.I think the thing to do is look at a bunch of
non-built-inopclasses 
and check for POLA violations.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Wed, 2009-11-25 at 09:02 -0500, Robert Haas wrote:
> I disagree wholeheartedly.  :-)  My ideal error message is something like:
> 
> DETAIL: (a, b, c)=(1, 2, 3) conflicts with (a, b, c)=(4, 5, 6)
> 
> In particular, I think it's very important that we only emit the
> columns which are part of the operator exclusion constraints, and NOT
> all the columns of the tuple.  The entire tuple could be very large -
> one of the columns not involved in the constraint could be a 4K block
> of text, for example, and spitting that out only obscures the real
> source of the problem.  You could argue that one of the columns
> involved in the constraint could be a 4K block text, too, but in
> practice I think the constraint columns are likely to be short nine
> times out of ten.  The equals sign is equating the column names to the
> column values, not the values to each other, so I don't see that as
> confusing.

Ok, fair enough. But how do you feel about: (a: 1, b: 2, c: 3)
as a tuple representation instead? I think it's closer to the way people
expect a tuple to be represented. I can change it in one place so that
the unique violations are reported that way, as well (as long as there
are no objections).

It still doesn't contain the operators, but they can look at the
constraint definition for that.

> My operator-class-fu is insufficient to render judgment on this point.
>  I think the thing to do is look at a bunch of non-built-in opclasses
> and check for POLA violations.

Ok, I'll consider this more.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Wed, 2009-11-25 at 09:02 -0500, Robert Haas wrote:
>> I disagree wholeheartedly.  :-)  My ideal error message is something like:
>> 
>> DETAIL: (a, b, c)=(1, 2, 3) conflicts with (a, b, c)=(4, 5, 6)

> Ok, fair enough. But how do you feel about:
>   (a: 1, b: 2, c: 3)
> as a tuple representation instead?

This seems like change for the sake of change.  We've been reporting
this type of error (in the context of foreign keys) using the first
syntax for a very long time.  I don't feel a need to rearrange it.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Wed, 2009-11-25 at 15:59 -0800, Jeff Davis wrote:
> > My operator-class-fu is insufficient to render judgment on this point.
> >  I think the thing to do is look at a bunch of non-built-in opclasses
> > and check for POLA violations.
> 
> Ok, I'll consider this more.

In cases where the operator class type is different from the search
operator's operands' types, one of the following is usually true:
* There is a binary cast from the opclass type (same as expression
type) to the search operator's operands' types, or it is otherwise
compatible (e.g. ANYARRAY).* There is a candidate function that's a better match (e.g. there may
be an =(int8, int8) on int2_ops, but there's also =(int2, int2)).* The left and right operand types are different, and
thereforedo not
 
work with operator exclusion constraints anyway (e.g. full text search
@@).

After installing all contrib modules, plus my period module, and
postgis, there appear to be no instances that violate these assumptions
(according to a join query and some manual testing). In theory there
could be, however.

It's kind of ugly to make it work, and a challenge to test it, so for
now I'll only accept operators returned by compatible_oper(). If you
disagree, I can make it work.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Tue, 2009-11-17 at 23:13 -0500, Robert Haas wrote:
> Forgive me if this is discussed before, but why does this store the
> strategy numbers of the relevant operators instead of the operators
> themselves?  It seems like this could lead to surprising behavior if
> the user modifies the definition of the operator class.

Still open.

> I'm wondering if we can't use the existing
> BuildIndexValueDescription() rather than the new function
> tuple_as_string().  I realize there are two tuples, but maybe it makes
> sense to just call it twice?

Changed.

> I'm attaching a revised doc patch for your consideration.

Thanks, I applied it. The only significant thing I changed was that I
did not inline the "index_elem" because it made it fairly hard to read.
Instead, I renamed it "exclude_elem" to make it a little more
meaningful, which I assume may have been your motivation for inlining
it.

Changes this patch:
 * doc changes
 * changed constraint violation message to be more like btree unique
   violation
 * improved error message when an operator is specified that doesn't
   have a search strategy

Remaining issues:
 * represent operator IDs in catalog, rather than strategy numbers
 * if someone thinks it's an issue, support search strategies that
   require binary-incompatible casts of the inputs

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Thu, Nov 26, 2009 at 4:33 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> Thanks, I applied it. The only significant thing I changed was that I
> did not inline the "index_elem" because it made it fairly hard to read.
> Instead, I renamed it "exclude_elem" to make it a little more
> meaningful, which I assume may have been your motivation for inlining
> it.

No, it was really just that I didn't see any point in defining a token
that was only used in one place.  It seems like it just represents a
jumble of tokens without any real connection between them, so I didn't
really see why we should break out that piece as opposed to anything
else.

> Changes this patch:
>  * doc changes
>  * changed constraint violation message to be more like btree unique
>   violation
>  * improved error message when an operator is specified that doesn't
>   have a search strategy
>
> Remaining issues:
>  * represent operator IDs in catalog, rather than strategy numbers
>  * if someone thinks it's an issue, support search strategies that
>   require binary-incompatible casts of the inputs

I'm (still) not an expert on this topic, but here's one more thought:
maybe we should try to set this up so that it works in any situation
in which an opclass itself would work.  IOW, if you manage to define
an opclass, you should probably be able to define an
operator-exclusion constraint against it, rather than having some
opclasses work and others arbitrarily fail.

Just my $0.02,

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-26 at 16:31 -0500, Robert Haas wrote:
> On Thu, Nov 26, 2009 at 4:33 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> > Thanks, I applied it. The only significant thing I changed was that I
> > did not inline the "index_elem" because it made it fairly hard to read.
> > Instead, I renamed it "exclude_elem" to make it a little more
> > meaningful, which I assume may have been your motivation for inlining
> > it.
> 
> No, it was really just that I didn't see any point in defining a token
> that was only used in one place.  It seems like it just represents a
> jumble of tokens without any real connection between them, so I didn't
> really see why we should break out that piece as opposed to anything
> else.

"table_constraint" and "column_constraint" are only used one place. I
found it convenient to do so because, in the common case, exclude_elem
is just a column name. So the user clearly sees:
 exclude_elem WITH operator

and the pairing isn't obscured by a bunch of optional stuff.

> I'm (still) not an expert on this topic, but here's one more thought:
> maybe we should try to set this up so that it works in any situation
> in which an opclass itself would work.  IOW, if you manage to define
> an opclass, you should probably be able to define an
> operator-exclusion constraint against it, rather than having some
> opclasses work and others arbitrarily fail.

That's what I was aiming for, but it's theoretically possible for that
case to require casts. I will do a little more investigation and write a
test case. If it looks reasonable, I'll refactor a little and just do
it. It is a pretty obscure case (seeing as I have yet to observe it,
including all contrib modules plus postgis), but I might as well do it
right as long as it's reasonable to do. If it gets even messier to
implement, maybe I'll reconsider.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-11-26 at 01:33 -0800, Jeff Davis wrote:
> Remaining issues:
>  * represent operator IDs in catalog, rather than strategy numbers

Done, attached.

>  * if someone thinks it's an issue, support search strategies that
>    require binary-incompatible casts of the inputs

I've already solved the original problem involving ANYARRAY. If someone
comes up with a good use case, or provides me with a little more
guidance, I'll reconsider this problem again. Otherwise, I feel like I'm
solving a problem that doesn't exist (after all, none of the contrib
modules seem to have a problem with the current assumptions, nor does
postgis, nor does my PERIOD module). So, I'm considering this a
"non-issue" until further notice.

To summarize, the problem as I understand it is this:

You have two types, T1 and T2, and there's an implicit cast (with
function or with inout) from T1 to T2. And you have an opclass like:

CREATE OPERATOR CLASS t1_ops FOR TYPE t1
...
 OPERATOR 17 %%(t2, t2)
...

And then you have an exclusion constraint like:
CREATE TABLE foo
(
  a t1,
  EXCLUDE (a t1_ops WITH %%)
);

What should the behavior be in the following two cases?
 1. Only operator %%(t2, t2) exists.
 2. Operator %%(t1, t1) and %%(t2, t2) exist.

If left unsolved, #1 results in an error because the operator requires
binary-incompatible coercion. #2 results in an error because %%(t1, t1)
is not in the opclass.

Arguably either one of them could succeed by finding %%(t2, t2) and
performing the appropriate conversion; but I think it's fair to just say
"the opclass is poorly defined".

Note that if the opclass is on type t2, you can simply cast "a" to t2
explicitly in the expression, like so:
  EXCLUDE((a::t2) t2_ops WITH %%)

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Fri, Nov 27, 2009 at 10:18 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2009-11-26 at 01:33 -0800, Jeff Davis wrote:
>> Remaining issues:
>>  * represent operator IDs in catalog, rather than strategy numbers
>
> Done, attached.
>
>>  * if someone thinks it's an issue, support search strategies that
>>    require binary-incompatible casts of the inputs
>
> I've already solved the original problem involving ANYARRAY. If someone
> comes up with a good use case, or provides me with a little more
> guidance, I'll reconsider this problem again. Otherwise, I feel like I'm
> solving a problem that doesn't exist (after all, none of the contrib
> modules seem to have a problem with the current assumptions, nor does
> postgis, nor does my PERIOD module). So, I'm considering this a
> "non-issue" until further notice.
>
> To summarize, the problem as I understand it is this:
>
> You have two types, T1 and T2, and there's an implicit cast (with
> function or with inout) from T1 to T2. And you have an opclass like:
>
> CREATE OPERATOR CLASS t1_ops FOR TYPE t1
> ...
>  OPERATOR 17 %%(t2, t2)
> ...
>
> And then you have an exclusion constraint like:
> CREATE TABLE foo
> (
>  a t1,
>  EXCLUDE (a t1_ops WITH %%)
> );
>
> What should the behavior be in the following two cases?
>  1. Only operator %%(t2, t2) exists.
>  2. Operator %%(t1, t1) and %%(t2, t2) exist.
>
> If left unsolved, #1 results in an error because the operator requires
> binary-incompatible coercion. #2 results in an error because %%(t1, t1)
> is not in the opclass.
>
> Arguably either one of them could succeed by finding %%(t2, t2) and
> performing the appropriate conversion; but I think it's fair to just say
> "the opclass is poorly defined".
>
> Note that if the opclass is on type t2, you can simply cast "a" to t2
> explicitly in the expression, like so:
>  EXCLUDE((a::t2) t2_ops WITH %%)

For parity with unique constraints, I think that the message:

operator exclusion constraint violation detected: %s

should be changed to:

conflicting key value violates operator exclusion constraint "%s"

In ATAddOperatorExclusionConstraint, "streatagy" is misspelled.

Other than that, it looks good to me.

...Robert


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Tue, 2009-12-01 at 23:19 -0500, Robert Haas wrote:
> For parity with unique constraints, I think that the message:
>
> operator exclusion constraint violation detected: %s
>
> should be changed to:
>
> conflicting key value violates operator exclusion constraint "%s"

Done, and updated tests.

> In ATAddOperatorExclusionConstraint, "streatagy" is misspelled.

Fixed.

> Other than that, it looks good to me.

Great, thanks for the detailed review!

Regards,
    Jeff Davis

Attachment

Re: operator exclusion constraints

From
Robert Haas
Date:
On Wed, Dec 2, 2009 at 12:18 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2009-12-01 at 23:19 -0500, Robert Haas wrote:
>> For parity with unique constraints, I think that the message:
>>
>> operator exclusion constraint violation detected: %s
>>
>> should be changed to:
>>
>> conflicting key value violates operator exclusion constraint "%s"
>
> Done, and updated tests.
>
>> In ATAddOperatorExclusionConstraint, "streatagy" is misspelled.
>
> Fixed.
>
>> Other than that, it looks good to me.
>
> Great, thanks for the detailed review!

Marked as Ready for Committer.

...Robert


Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2009-12-01 at 23:19 -0500, Robert Haas wrote:
>> For parity with unique constraints, I think that the message:
>> 
>> operator exclusion constraint violation detected: %s
>> 
>> should be changed to:
>> 
>> conflicting key value violates operator exclusion constraint "%s"

> Done, and updated tests.

I'm starting to go through this patch now.  I thought the consensus
was to refer to them as just "exclusion constraints"?  I'm not seeing
that the word "operator" really adds anything.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2009-12-03 at 19:00 -0500, Tom Lane wrote:
> I'm starting to go through this patch now.  I thought the consensus
> was to refer to them as just "exclusion constraints"?  I'm not seeing
> that the word "operator" really adds anything.

I assume you're referring to the name used in documentation and error
messages. I didn't see a clear consensus, but the relevant thread is
here:

http://archives.postgresql.org/message-id/1258227283.708.108.camel@jdavis

"Exclusion Constraints" is fine with me, as are the other options listed
in that email.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Thu, Dec 3, 2009 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2009-12-03 at 19:00 -0500, Tom Lane wrote:
>> I'm starting to go through this patch now.  I thought the consensus
>> was to refer to them as just "exclusion constraints"?  I'm not seeing
>> that the word "operator" really adds anything.
>
> I assume you're referring to the name used in documentation and error
> messages. I didn't see a clear consensus, but the relevant thread is
> here:
>
> http://archives.postgresql.org/message-id/1258227283.708.108.camel@jdavis
>
> "Exclusion Constraints" is fine with me, as are the other options listed
> in that email.

Yeah, I don't remember any such consensus either, but it's not a dumb
name.  I have been idly wondering throughout this process whether we
should try to pick a name that conveys the fact that these constraints
are inextricably tied to the opclass/index machinery - but I'm not
sure it's possible to really give that flavor in a short phrase, or
that it's actually important to do so.  IOW... "whatever".  :-)

...Robert


Re: operator exclusion constraints

From
"David E. Wheeler"
Date:
On Dec 3, 2009, at 6:26 PM, Robert Haas wrote:

> Yeah, I don't remember any such consensus either, but it's not a dumb
> name.  I have been idly wondering throughout this process whether we
> should try to pick a name that conveys the fact that these constraints
> are inextricably tied to the opclass/index machinery - but I'm not
> sure it's possible to really give that flavor in a short phrase, or
> that it's actually important to do so.  IOW... "whatever".  :-)

"Whatever constraints"? "Operator Whatevers"? "WhatEVER"s? I like it.

David


Re: operator exclusion constraints

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Dec 03, 2009 at 08:38:06PM -0800, David E. Wheeler wrote:

[...]

> "Whatever constraints"? "Operator Whatevers"? "WhatEVER"s? I like it.

drigting serioulsy off-topic: there's precedent for that in the most
venerable piece of free software; TeX has a "whatsit node" (basically an
extension mechanism).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLGKM7Bcgs9XrR2kYRAuHFAJ0ZZYzlXHJEgwEbsraAlKVI58yLAgCfU4Cz
n+0vobY0HxROigSHUGog7QI=
=MWH+
-----END PGP SIGNATURE-----


Re: operator exclusion constraints

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 3, 2009 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Thu, 2009-12-03 at 19:00 -0500, Tom Lane wrote:
>>> I'm starting to go through this patch now. �I thought the consensus
>>> was to refer to them as just "exclusion constraints"? �I'm not seeing
>>> that the word "operator" really adds anything.
>> 
>> I assume you're referring to the name used in documentation and error
>> messages. I didn't see a clear consensus, but the relevant thread is
>> here:
>> 
>> http://archives.postgresql.org/message-id/1258227283.708.108.camel@jdavis
>> 
>> "Exclusion Constraints" is fine with me, as are the other options listed
>> in that email.

> Yeah, I don't remember any such consensus either, but it's not a dumb
> name.  I have been idly wondering throughout this process whether we
> should try to pick a name that conveys the fact that these constraints
> are inextricably tied to the opclass/index machinery - but I'm not
> sure it's possible to really give that flavor in a short phrase, or
> that it's actually important to do so.  IOW... "whatever".  :-)

Well, unique constraints are tied to the opclass/index machinery too.

Unless there's loud squawks I'm going to exercise committer's
prerogative and make all the docs and messages just say "exclusion
constraint".
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Fri, 2009-12-04 at 11:35 -0500, Tom Lane wrote:
> Unless there's loud squawks I'm going to exercise committer's
> prerogative and make all the docs and messages just say "exclusion
> constraint".

Sounds fine to me.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Robert Haas
Date:
On Dec 4, 2009, at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 3, 2009 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> On Thu, 2009-12-03 at 19:00 -0500, Tom Lane wrote:
>>>> I'm starting to go through this patch now.  I thought the consensus
>>>> was to refer to them as just "exclusion constraints"?  I'm not  
>>>> seeing
>>>> that the word "operator" really adds anything.
>>>
>>> I assume you're referring to the name used in documentation and  
>>> error
>>> messages. I didn't see a clear consensus, but the relevant thread is
>>> here:
>>>
>>> http://archives.postgresql.org/message-id/1258227283.708.108.camel@jdavis
>>>
>>> "Exclusion Constraints" is fine with me, as are the other options  
>>> listed
>>> in that email.
>
>> Yeah, I don't remember any such consensus either, but it's not a dumb
>> name.  I have been idly wondering throughout this process whether we
>> should try to pick a name that conveys the fact that these  
>> constraints
>> are inextricably tied to the opclass/index machinery - but I'm not
>> sure it's possible to really give that flavor in a short phrase, or
>> that it's actually important to do so.  IOW... "whatever".  :-)
>
> Well, unique constraints are tied to the opclass/index machinery too.
>
> Unless there's loud squawks I'm going to exercise committer's
> prerogative and make all the docs and messages just say "exclusion
> constraint".

Go for it. Membership has its privileges.  :-)

...Robert


Re: operator exclusion constraints

From
David Fetter
Date:
On Fri, Dec 04, 2009 at 11:35:52AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Dec 3, 2009 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> >> On Thu, 2009-12-03 at 19:00 -0500, Tom Lane wrote:
> >>> I'm starting to go through this patch now. �I thought the
> >>> consensus was to refer to them as just "exclusion constraints"?
> >>> �I'm not seeing that the word "operator" really adds anything.
> >> 
> >> I assume you're referring to the name used in documentation and
> >> error messages. I didn't see a clear consensus, but the relevant
> >> thread is here:
> >> 
> >> http://archives.postgresql.org/message-id/1258227283.708.108.camel@jdavis
> >> 
> >> "Exclusion Constraints" is fine with me, as are the other options
> >> listed in that email.
> 
> > Yeah, I don't remember any such consensus either, but it's not a
> > dumb name.  I have been idly wondering throughout this process
> > whether we should try to pick a name that conveys the fact that
> > these constraints are inextricably tied to the opclass/index
> > machinery - but I'm not sure it's possible to really give that
> > flavor in a short phrase, or that it's actually important to do
> > so.  IOW... "whatever".  :-)
> 
> Well, unique constraints are tied to the opclass/index machinery
> too.
> 
> Unless there's loud squawks I'm going to exercise committer's
> prerogative and make all the docs and messages just say "exclusion
> constraint".

We have "constraint exclusion" already, which could make this
confusing.  How about either the original "operator exclusion
constraint" or the slightly easier "whatever constraint" names?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> [ exclusion constraints patch ]

Still working on this patch.  I ran into something I didn't like at all:

> +                 if (newrel != NULL)
> +                     ereport(ERROR,
> +                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                              errmsg("cannot rewrite table while adding "
> +                                     "operator exclusion constraint")));

This would be bad enough if the restriction were what the message
alleges, ie, you can't write an ALTER TABLE that both rewrites the heap
and adds an exclusion constraint.  However, actually the error also
occurs if you issue a rewriting ALTER TABLE against a table that already
has an exclusion constraint.  That raises it from annoyance to
showstopper IMO.

There is not a whole lot that can be done to fix this given the design
that exclusion checking is supposed to be done in ATRewriteTable ---
when that runs, we of course have not built the new index yet, so
there's no way to use it to check the constraint.

I think the right way to go at it is to drop that part of the patch
and instead have index_build execute a separate pass to check the
constraint after it's built an exclusion-constraint index.  In the
typical case where you're just doing ALTER TABLE ADD CONSTRAINT EXCLUDE,
this ends up being the same amount of work since there's no need to
run an ATRewriteTable scan at all.  It would be extra work if someone
tried to add multiple exclusion constraints in one ALTER; but I have a
hard time getting excited about that, especially in view of the fact
that the cost of the index searches is going to swamp the heapscan
anyway.

This approach would also mean that the constraint is re-verified
if someone does a REINDEX.  I think this is appropriate behavior
since reindexing a regular unique index also causes re-verification.
However I can imagine someone objecting on grounds of cost ---
it would probably about double the cost of reindexing compared
to assuming the constraint is still good.  We could probably teach
index_build to skip the check pass during REINDEX, but I don't
really want to.

Comments?
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-12-06 at 10:46 -0500, Tom Lane wrote:
> This would be bad enough if the restriction were what the message
> alleges, ie, you can't write an ALTER TABLE that both rewrites the heap
> and adds an exclusion constraint.  However, actually the error also
> occurs if you issue a rewriting ALTER TABLE against a table that already
> has an exclusion constraint.  That raises it from annoyance to
> showstopper IMO.

The following works as expected for me:
 create table foo(i int, j int, k int);
 -- the following causes the error in question: alter table foo add exclude(i with =),   alter column k type text;
 alter table foo add exclude(i with =); -- works alter table foo alter column k type text; -- works

So the workaround is simply to break the ALTER into two statements.

(Aside: the error message should probably have a DETAIL component
telling the user to break up the ALTER commands into separate actions.)

Aha -- I think I see the problem you're having: if you try to rewrite
one of the columns contained in the exclusion constraint, you get that
error:
 create table bar_exclude(i int, exclude (i with =));
 -- raises same error, which is confusing alter table bar_exclude alter column i type text;

Compared with UNIQUE: create table bar_unique(i int unique); alter table bar_unique alter column i type text; -- works

However, I think we _want_ exclusion constraints to fail in that case.
Unique constraints can succeed because both types support UNIQUE, and
the semantics are the same. But in the case of exclusion constraints,
it's quite likely that the semantics will be different or the named
operator won't exist at all. I think it's more fair to compare with a
unique index on an expression:
 create table bar_unique2(i int); create unique index bar_unique2_idx on bar_unique2 ((i + 1));
 alter table bar_unique2 alter column i type text; ERROR:  operator does not exist: text + integer

You could make the argument that we should do the same thing: try to
re-apply the expression on top of the new column. The only situation
where I can imagine that would be useful is if you are using exclusion
constraints in place of UNIQUE (even then, it's different, because it
uses operator names, not BTEqualStrategyNumber for the default btree
opclass).

If the user alters the type of a column that's part of an exclusion
constraint, the semantics are complex enough that I think we should
inform them that they need to drop the constraint, change the column,
and re-add it. So, my personal opinion is that we need to change the
error message (and probably have two distinct error messages for the two
cases) rather than changing the algorithm.

Comments?

Regards,Jeff Davis




Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Aha -- I think I see the problem you're having: if you try to rewrite
> one of the columns contained in the exclusion constraint, you get that
> error:

It fails for me regardless of which column is actually modified.
It could be this is a consequence of other changes I've been making,
but given the way ALTER TABLE works it's hard to see why the specific
column being modified would matter.  Anything that forces a table
rewrite would lead to running through this code path.

I don't really agree with your argument that it's okay to reject the
case of altering the underlying column type, anyhow.  There's enough
commonality of operator names that it's sensible to try to preserve
the constraint across a change.  Consider replacing a circle with a
polygon, say.  A no-overlap restriction is still sensible.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Sun, 2009-12-06 at 14:06 -0500, Tom Lane wrote:
> It fails for me regardless of which column is actually modified.
> It could be this is a consequence of other changes I've been making,
> but given the way ALTER TABLE works it's hard to see why the specific
> column being modified would matter.  Anything that forces a table
> rewrite would lead to running through this code path.

In ATAddOperatorExclusionConstraint():
 tab->constraints = lappend(tab->constraints, newcon);

if that isn't done, it doesn't go into the code path with that error
message at all.

> I don't really agree with your argument that it's okay to reject the
> case of altering the underlying column type, anyhow.  There's enough
> commonality of operator names that it's sensible to try to preserve
> the constraint across a change.  Consider replacing a circle with a
> polygon, say.  A no-overlap restriction is still sensible.

Let's say someone is changing from box to a postgis geometry
representing a box. For boxes, "=" actually means "equal area" (aside:
this is insane); but for polygons, "=" mean "equal". Changing in the
other direction is bad, as well. I know operators mostly follow
convention most of the time, but I'm concerned with the subtle (and
surprising) differences.

But if someone is changing a column type, which causes a table rewrite,
hopefully they've planned it. I'll look into doing it as you suggest.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> ... I'll look into doing it as you suggest.

I'm already working with a pretty-heavily-editorialized version.
Don't worry about it.
        regards, tom lane


Re: operator exclusion constraints

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> [ exclusion constraint patch ]

Applied after quite a lot of editorialization.  For future reference,
here is a summary of what I did:

* Reworded and renamed stuff to try to be consistent about calling these
things "exclusion constraints".  The original code and docs bore quite
a few traces of the various other terminologies, which is not too
surprising given the history but would have been confusing to future
readers of the code.

* Moved the verification of new exclusion constraints into index_build
processing as per discussion.

* Unified the EXCLUDE parsing path with the existing unique/pkey path
by the expedient of adding an excludeOpNames list to IndexStmt.  This
got rid of quite a lot of duplicated code, and fixed a number of bizarre
corner cases like the bogus wording of the index creation NOTICE messages.

* Cleaned up some things that didn't really meet project practices.
To mention a couple: one aspect of the "try to make the patch look
like it had always been there" rule is to insert new stuff in unsurprising
places.  Adding code at the end of a list or file very often doesn't meet
this test.  I tried to put the EXCLUDE constraint stuff beside
UNIQUE/PRIMARY KEY handling where possible.  Another pet peeve that was
triggered a lot in this patch is that you seemed to be intent on fitting
ereport() calls into 80 columns no matter what, and would break the error
message texts in random places to make it fit.  There's a good practical
reason not to do that: it makes it hard to grep the source code for an
error message.  You can break at phrase boundaries if you must, but
in general I prefer to just let the text go past the right margin.

There are a couple of loose ends yet:

* I made CREATE...LIKE processing handle exclusion constraints the same
as unique/pkey constraints, ie, they're processed by INCLUDING INDEXES.
There was some discussion of rearranging things to make these be processed
by INCLUDING CONSTRAINTS instead, but no consensus that I recall.  In
any case, failing to copy them at all is clearly no good.

* I'm not too satisfied with the behavior of psql's \d:

regression=# create table foo (f1 int primary key using index tablespace ts1,
regression(# f2 int, EXCLUDE USING btree (f2 WITH =) using index tablespace ts1,
regression(# f3 int, EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_f2_exclusion" for table "foo"
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_f3_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo     Table "public.foo"Column |  Type   | Modifiers 
--------+---------+-----------f1     | integer | not nullf2     | integer | f3     | integer | 
Indexes:   "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"   "foo_f2_exclusion" btree (f2), tablespace "ts1"
"foo_f3_exclusion"btree (f3) DEFERRABLE INITIALLY DEFERRED
 
Exclusion constraints:   "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =)   "foo_f3_exclusion" EXCLUDE USING btree
(f3WITH =) DEFERRABLE INITIALLY DEFERRED
 

regression=# 

This might have been defensible back when the idea was to keep constraints
decoupled from indexes, but now it just looks bizarre.  We should either
get rid of the "Exclusion constraints:" display and attach the info to
the index entries, or hide indexes that are attached to exclusion
constraints.  I lean to the former on the grounds of the precedent for
unique/pkey indexes --- which is not totally arbitrary, since an index
is usable as a query index regardless of its function as a constraint.
It's probably a debatable point though.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Mon, 2009-12-07 at 00:26 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > [ exclusion constraint patch ]
> 
> Applied after quite a lot of editorialization.  For future reference,
> here is a summary of what I did:

Thank you for the suggestions, and the other constructive criticism
during development.

Regards,Jeff Davis



Re: operator exclusion constraints

From
Tom Lane
Date:
Awhile back I wrote:
> * I'm not too satisfied with the behavior of psql's \d:

> regression=# create table foo (f1 int primary key using index tablespace ts1,
> regression(# f2 int, EXCLUDE USING btree (f2 WITH =) using index tablespace ts1,
> regression(# f3 int, EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
> NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_f2_exclusion" for table "foo"
> NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_f3_exclusion" for table "foo"
> CREATE TABLE
> regression=# \d foo
>       Table "public.foo"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  f1     | integer | not null
>  f2     | integer |
>  f3     | integer |
> Indexes:
>     "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"
>     "foo_f2_exclusion" btree (f2), tablespace "ts1"
>     "foo_f3_exclusion" btree (f3) DEFERRABLE INITIALLY DEFERRED
> Exclusion constraints:
>     "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =)
>     "foo_f3_exclusion" EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED

> regression=#

> This might have been defensible back when the idea was to keep constraints
> decoupled from indexes, but now it just looks bizarre.  We should either
> get rid of the "Exclusion constraints:" display and attach the info to
> the index entries, or hide indexes that are attached to exclusion
> constraints.  I lean to the former on the grounds of the precedent for
> unique/pkey indexes --- which is not totally arbitrary, since an index
> is usable as a query index regardless of its function as a constraint.
> It's probably a debatable point though.

Attached is a patch against HEAD that folds exclusion constraints into
\d's regular indexes list.  With this, the above example produces

      Table "public.foo"
 Column |  Type   | Modifiers
--------+---------+-----------
 f1     | integer | not null
 f2     | integer |
 f3     | integer |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"
    "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =), tablespace "ts1"
    "foo_f3_exclusion" EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED

Any objections?

            regards, tom lane

? psql
Index: describe.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/describe.c,v
retrieving revision 1.240
diff -c -r1.240 describe.c
*** describe.c    11 Mar 2010 04:36:43 -0000    1.240
--- describe.c    11 Mar 2010 05:18:28 -0000
***************
*** 1105,1111 ****
          bool        hasrules;
          bool        hastriggers;
          bool        hasoids;
-         bool        hasexclusion;
          Oid            tablespace;
          char       *reloptions;
          char       *reloftype;
--- 1105,1110 ----
***************
*** 1128,1135 ****
          printfPQExpBuffer(&buf,
                "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
                            "c.relhastriggers, c.relhasoids, "
!                           "%s, c.reltablespace, c.relhasexclusion, "
!                           "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::text END\n"
                            "FROM pg_catalog.pg_class c\n "
             "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
                            "WHERE c.oid = '%s'\n",
--- 1127,1134 ----
          printfPQExpBuffer(&buf,
                "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
                            "c.relhastriggers, c.relhasoids, "
!                           "%s, c.reltablespace, "
!                           "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text
END\n"
                            "FROM pg_catalog.pg_class c\n "
             "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
                            "WHERE c.oid = '%s'\n",
***************
*** 1207,1216 ****
          strdup(PQgetvalue(res, 0, 6)) : 0;
      tableinfo.tablespace = (pset.sversion >= 80000) ?
          atooid(PQgetvalue(res, 0, 7)) : 0;
!     tableinfo.hasexclusion = (pset.sversion >= 90000) ?
!         strcmp(PQgetvalue(res, 0, 8), "t") == 0 : false;
!     tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 9), "") != 0) ?
!         strdup(PQgetvalue(res, 0, 9)) : 0;
      PQclear(res);
      res = NULL;

--- 1206,1213 ----
          strdup(PQgetvalue(res, 0, 6)) : 0;
      tableinfo.tablespace = (pset.sversion >= 80000) ?
          atooid(PQgetvalue(res, 0, 7)) : 0;
!     tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 8), "") != 0) ?
!         strdup(PQgetvalue(res, 0, 8)) : 0;
      PQclear(res);
      res = NULL;

***************
*** 1545,1571 ****
                  appendPQExpBuffer(&buf, "i.indisvalid, ");
              else
                  appendPQExpBuffer(&buf, "true as indisvalid, ");
!             appendPQExpBuffer(&buf, "pg_catalog.pg_get_indexdef(i.indexrelid, 0, true)");
              if (pset.sversion >= 90000)
                  appendPQExpBuffer(&buf,
!                                   ",\n  (NOT i.indimmediate) AND "
!                                   "EXISTS (SELECT 1 FROM pg_catalog.pg_constraint "
!                                   "WHERE conrelid = i.indrelid AND "
!                                   "conindid = i.indexrelid AND "
!                                   "contype IN ('p','u','x') AND "
!                                   "condeferrable) AS condeferrable"
!                                   ",\n  (NOT i.indimmediate) AND "
!                                   "EXISTS (SELECT 1 FROM pg_catalog.pg_constraint "
!                                   "WHERE conrelid = i.indrelid AND "
!                                   "conindid = i.indexrelid AND "
!                                   "contype IN ('p','u','x') AND "
!                                   "condeferred) AS condeferred");
              else
!                 appendPQExpBuffer(&buf, ", false AS condeferrable, false AS condeferred");
              if (pset.sversion >= 80000)
                  appendPQExpBuffer(&buf, ", c2.reltablespace");
              appendPQExpBuffer(&buf,
!                               "\nFROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i\n"
                                "WHERE c.oid = '%s' AND c.oid = i.indrelid AND i.indexrelid = c2.oid\n"
                "ORDER BY i.indisprimary DESC, i.indisunique DESC, c2.relname",
                                oid);
--- 1542,1564 ----
                  appendPQExpBuffer(&buf, "i.indisvalid, ");
              else
                  appendPQExpBuffer(&buf, "true as indisvalid, ");
!             appendPQExpBuffer(&buf, "pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),\n  ");
              if (pset.sversion >= 90000)
                  appendPQExpBuffer(&buf,
!                                   "pg_catalog.pg_get_constraintdef(con.oid, true), "
!                                   "contype, condeferrable, condeferred");
              else
!                 appendPQExpBuffer(&buf,
!                                   "null AS constraintdef, null AS contype, "
!                                   "false AS condeferrable, false AS condeferred");
              if (pset.sversion >= 80000)
                  appendPQExpBuffer(&buf, ", c2.reltablespace");
              appendPQExpBuffer(&buf,
!                               "\nFROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i\n");
!             if (pset.sversion >= 90000)
!                 appendPQExpBuffer(&buf,
!                                   "  LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid =
i.indexrelidAND contype IN ('p','u','x'))\n"); 
!             appendPQExpBuffer(&buf,
                                "WHERE c.oid = '%s' AND c.oid = i.indrelid AND i.indexrelid = c2.oid\n"
                "ORDER BY i.indisprimary DESC, i.indisunique DESC, c2.relname",
                                oid);
***************
*** 1580,1625 ****
                  printTableAddFooter(&cont, _("Indexes:"));
                  for (i = 0; i < tuples; i++)
                  {
-                     const char *indexdef;
-                     const char *usingpos;
-
                      /* untranslated index name */
                      printfPQExpBuffer(&buf, "    \"%s\"",
                                        PQgetvalue(result, i, 0));

!                     /* Label as primary key or unique (but not both) */
!                     appendPQExpBuffer(&buf,
!                                    strcmp(PQgetvalue(result, i, 1), "t") == 0
!                                       ? " PRIMARY KEY," :
!                                   (strcmp(PQgetvalue(result, i, 2), "t") == 0
!                                    ? " UNIQUE,"
!                                    : ""));
!                     /* Everything after "USING" is echoed verbatim */
!                     indexdef = PQgetvalue(result, i, 5);
!                     usingpos = strstr(indexdef, " USING ");
!                     if (usingpos)
!                         indexdef = usingpos + 7;

!                     appendPQExpBuffer(&buf, " %s", indexdef);

                      if (strcmp(PQgetvalue(result, i, 3), "t") == 0)
                          appendPQExpBuffer(&buf, " CLUSTER");

                      if (strcmp(PQgetvalue(result, i, 4), "t") != 0)
                          appendPQExpBuffer(&buf, " INVALID");

-                     if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
-                         appendPQExpBuffer(&buf, " DEFERRABLE");
-
-                     if (strcmp(PQgetvalue(result, i, 7), "t") == 0)
-                         appendPQExpBuffer(&buf, " INITIALLY DEFERRED");
-
                      printTableAddFooter(&cont, buf.data);

                      /* Print tablespace of the index on the same line */
                      if (pset.sversion >= 80000)
                          add_tablespace_footer(&cont, 'i',
!                                             atooid(PQgetvalue(result, i, 8)),
                                                false);
                  }
              }
--- 1573,1627 ----
                  printTableAddFooter(&cont, _("Indexes:"));
                  for (i = 0; i < tuples; i++)
                  {
                      /* untranslated index name */
                      printfPQExpBuffer(&buf, "    \"%s\"",
                                        PQgetvalue(result, i, 0));

!                     /* If exclusion constraint, print the constraintdef */
!                     if (strcmp(PQgetvalue(result, i, 7), "x") == 0)
!                     {
!                         appendPQExpBuffer(&buf, " %s",
!                                           PQgetvalue(result, i, 6));
!                     }
!                     else
!                     {
!                         const char *indexdef;
!                         const char *usingpos;
!
!                         /* Label as primary key or unique (but not both) */
!                         if (strcmp(PQgetvalue(result, i, 1), "t") == 0)
!                             appendPQExpBuffer(&buf, " PRIMARY KEY,");
!                         else if (strcmp(PQgetvalue(result, i, 2), "t") == 0)
!                             appendPQExpBuffer(&buf, " UNIQUE,");
!
!                         /* Everything after "USING" is echoed verbatim */
!                         indexdef = PQgetvalue(result, i, 5);
!                         usingpos = strstr(indexdef, " USING ");
!                         if (usingpos)
!                             indexdef = usingpos + 7;
!                         appendPQExpBuffer(&buf, " %s", indexdef);

!                         /* Need these for deferrable PK/UNIQUE indexes */
!                         if (strcmp(PQgetvalue(result, i, 8), "t") == 0)
!                             appendPQExpBuffer(&buf, " DEFERRABLE");

+                         if (strcmp(PQgetvalue(result, i, 9), "t") == 0)
+                             appendPQExpBuffer(&buf, " INITIALLY DEFERRED");
+                     }
+
+                     /* Add these for all cases */
                      if (strcmp(PQgetvalue(result, i, 3), "t") == 0)
                          appendPQExpBuffer(&buf, " CLUSTER");

                      if (strcmp(PQgetvalue(result, i, 4), "t") != 0)
                          appendPQExpBuffer(&buf, " INVALID");

                      printTableAddFooter(&cont, buf.data);

                      /* Print tablespace of the index on the same line */
                      if (pset.sversion >= 80000)
                          add_tablespace_footer(&cont, 'i',
!                                             atooid(PQgetvalue(result, i, 10)),
                                                false);
                  }
              }
***************
*** 1657,1694 ****
              PQclear(result);
          }

-         /* print exclusion constraints */
-         if (tableinfo.hasexclusion)
-         {
-             printfPQExpBuffer(&buf,
-                               "SELECT r.conname, "
-                               "pg_catalog.pg_get_constraintdef(r.oid, true)\n"
-                               "FROM pg_catalog.pg_constraint r\n"
-                               "WHERE r.conrelid = '%s' AND r.contype = 'x'\n"
-                               "ORDER BY 1",
-                               oid);
-             result = PSQLexec(buf.data, false);
-             if (!result)
-                 goto error_return;
-             else
-                 tuples = PQntuples(result);
-
-             if (tuples > 0)
-             {
-                 printTableAddFooter(&cont, _("Exclusion constraints:"));
-                 for (i = 0; i < tuples; i++)
-                 {
-                     /* untranslated contraint name and def */
-                     printfPQExpBuffer(&buf, "    \"%s\" %s",
-                                       PQgetvalue(result, i, 0),
-                                       PQgetvalue(result, i, 1));
-
-                     printTableAddFooter(&cont, buf.data);
-                 }
-             }
-             PQclear(result);
-         }
-
          /* print foreign-key constraints (there are none if no triggers) */
          if (tableinfo.hastriggers)
          {
--- 1659,1664 ----

Re: operator exclusion constraints

From
Greg Stark
Date:
On Thu, Mar 11, 2010 at 5:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Indexes:
>>     "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"
>>     "foo_f2_exclusion" btree (f2), tablespace "ts1"
>>     "foo_f3_exclusion" btree (f3) DEFERRABLE INITIALLY DEFERRED
>> Exclusion constraints:
>>     "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =)
>>     "foo_f3_exclusion" EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED
>
>> This might have been defensible back when the idea was to keep constraints
>> decoupled from indexes, but now it just looks bizarre.

The only really bizarre part is the "DEFERRABLE INITIALLY DEFERRED" on
the index.

>>  We should either
>> get rid of the "Exclusion constraints:" display and attach the info to
>> the index entries, or hide indexes that are attached to exclusion
>> constraints.  I lean to the former on the grounds of the precedent for
>> unique/pkey indexes --- which is not totally arbitrary, since an index
>> is usable as a query index regardless of its function as a constraint.
>> It's probably a debatable point though.

There is a third option -- print PRIMARY keys twice, once as a btree
index and again as a constraint where it says somehting like "USING
index foo_pkey"
I think in the long term that would be best -- especially if we
combine it with a patch to be able to create a new primary key
constraint using an existing index. That's something people have been
asking for anyways and I think it's a somewhat important property that
these lines can be copy pasted and run nearly as-is to recreate the
objects.

I definitely agree that your other proposed way to go is worse. I
think people need a list of indexes in one place.

So given the current syntax for creating these I think your proposed
change is the least worst alternative.

--
greg


Re: operator exclusion constraints

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> There is a third option -- print PRIMARY keys twice, once as a btree
> index and again as a constraint where it says somehting like "USING
> index foo_pkey"

No, that's exactly what I hate about the current behavior for exclusion
constraints, and I'd like it even less for more-common options like
primary or unique constraints.  \d is too d*mn verbose already; there is
no percentage in making it even longer by printing redundant entries for
many indexes.

One thing I did think about was converting PK/UNIQUE indexes to be
printed by pg_get_constraintdef() too, rather than assembling an ad-hoc
output for them as we do now.  This would make the code a bit simpler
but would involve some small changes in the output --- in particular,
you wouldn't see any indication that they were btrees, since there's
no place for that in standard constraint syntax.  On balance it didn't
seem like an improvement, although it would partially respond to your
desire to have the output be cut-and-pasteable.
        regards, tom lane


Re: operator exclusion constraints

From
Jeff Davis
Date:
On Thu, 2010-03-11 at 00:29 -0500, Tom Lane wrote:

Patch changes:

> > Indexes:
> >     "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"
> >     "foo_f2_exclusion" btree (f2), tablespace "ts1"
> >     "foo_f3_exclusion" btree (f3) DEFERRABLE INITIALLY DEFERRED
> > Exclusion constraints:
> >     "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =)
> >     "foo_f3_exclusion" EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED

To:

> Indexes:
>     "foo_pkey" PRIMARY KEY, btree (f1), tablespace "ts1"
>     "foo_f2_exclusion" EXCLUDE USING btree (f2 WITH =), tablespace "ts1"
>     "foo_f3_exclusion" EXCLUDE USING btree (f3 WITH =) DEFERRABLE INITIALLY DEFERRED
> 
> Any objections?

Looks good to me.

Regards,Jeff Davis