Thread: CHECK constraints and optimizations

CHECK constraints and optimizations

From
Edmund Dengler
Date:
Greetings!

Just trying some tests out, and wanted to know about some optimizations.
If I do a CHECK constraint on a table, is this used to optimize a SELECT
or does Postgresql rely mostly on normal index search?

For example, I want to create some tables to manage different data in a
kind of <object, relationship, object2> manner, but where object2 could be
an IP address, text, a number, etc. So I thought of doing the following:

----------

create table tmp (
  luid bigserial,
  object_luid bigint,
  relationship ltree
);

create table tmp1
(
  child_luid bigint,
  check (relationship <@ 'Object')
)
inherits (tmp);

create table tmp2 (
  ip inet,
  check (relationship <@ 'IP')
)
inherits (tmp);

insert into tmp1 (object_luid, relationship, child_luid) values (1, 'Object', 2);
insert into tmp2 (object_luid, relationship, ip) values (1, 'IP.Packet.Source', '10.1.1.2');
insert into tmp2 (object_luid, relationship, ip) values (2, 'IP.Packet.Source', '10.11.0.1');

create view tmp_view as
select luid, object_luid, relationship, child_luid, null as ip
from tmp1
union
select luid, object_luid, relationship, null, ip
from tmp2
;

explain analyze select * from tmp_view where object_luid = 2;
explain analyze select * from tmp_view where relationship <@ 'IP.Packet';

explain analyze select * from (
select luid, object_luid, relationship, child_luid, null as ip
from tmp1
union all
select luid, object_luid, relationship, null, ip
from tmp2
) as foo where relationship <@ 'IP.Packet';
;

-----------------------------------------

When I do the above analyzes, the table for <tmp1> is still scanned, even
though the WHERE clause cannot meet the CHECK clause. Now, this is a
fairly edge case for optimizations, so I just wanted to check that this
indeed will not be utilized. Or will it be only after I have lots of rows
in the table, thereby justifying the check? Is there ever a time where
CONSTRAINTS will be used to optimize a SELECT?

Alternatively, is there another way of accomplishing what I want without
the ugly VIEW (as each new table type I add will necessitate rebuilding
the VIEW with a new column)?

Regards!
Ed

Re: CHECK constraints and optimizations

From
Tom Lane
Date:
Edmund Dengler <edmundd@eSentire.com> writes:
> Just trying some tests out, and wanted to know about some optimizations.
> If I do a CHECK constraint on a table, is this used to optimize a SELECT

It is not.

            regards, tom lane

Re: CHECK constraints and optimizations

From
"scott.marlowe"
Date:
On Wed, 5 May 2004, Edmund Dengler wrote:

> Greetings!
>
> Just trying some tests out, and wanted to know about some optimizations.
> If I do a CHECK constraint on a table, is this used to optimize a SELECT
> or does Postgresql rely mostly on normal index search?

I think the only kind of constraint that incidentally improves performance
is a unique constraint, which creates a unique index.

A check constraint is run on a record when it is changed to make sure it
still meets the requirements of the constraint.  There is no seperate file
that says "this row meets the constraint".  Deferred constraints mean the
check is to be done at the commit time of the transaction.

Note that unique constraints are not necessarily deferrable due to issues
caused by using an immediate acting unique index.  I don't think this is
easily fixable either.

So, a check constraint is of no use during a read from the table, and
is a performance penalty when writing to it.


Re: CHECK constraints and optimizations

From
Troels Arvin
Date:
On Wed, 05 May 2004 21:09:25 -0400, Tom Lane wrote:

>> Just trying some tests out, and wanted to know about some optimizations.
>> If I do a CHECK constraint on a table, is this used to optimize a SELECT
>
> It is not.

I one were to try to add some constraint-based optimizations ("semantic
query optimizations"), what parts of the code would be most relevant to
study?

In particular, I'm interested in the case of join eliminations, based on
foreign key constraints. E.g. having a SUPPLIER(s_id,...) and a
SUPPLIER_PART(s_id,p_id) table where SUPPLIER_PART.s_id references
SUPPLIER.s_id. Then, a "SELECT p_id FROM SUPPLIER_PART NATURAL JOIN
SUPPLIER" could skip the join and just look in SUPPLIER_PART.

Another thing:

Oracle and PostgreSQL uses IOs to respond to
SELECT * FROM person WHERE age < 30 AND age > 30.
DB2 and MySQL sees that the result is the empty set, without wasting IOs.
- So here's another place for potential optimizations, although the area
is rather hairy, as soon as one moves beyond the most simple cases.


By the way, in "An Introduction to Database Systems", Date writes about
semantic optimizations:

  "... such optimization could provide significant
  performance improvements - much greater improvements, very likely,
  than are obtained by any of today's more traditional optimization
  techniques".

--
Greetings from Troels Arvin, Copenhagen, Denmark


Re: CHECK constraints and optimizations

From
"scott.marlowe"
Date:
On Thu, 6 May 2004, Troels Arvin wrote:

> On Wed, 05 May 2004 21:09:25 -0400, Tom Lane wrote:
>
> >> Just trying some tests out, and wanted to know about some optimizations.
> >> If I do a CHECK constraint on a table, is this used to optimize a SELECT
> >
> > It is not.
>
> I one were to try to add some constraint-based optimizations ("semantic
> query optimizations"), what parts of the code would be most relevant to
> study?

I'll leave the answer to such a question to someone who knows the
internals of pgsql a bit better than me.

> Oracle and PostgreSQL uses IOs to respond to
> SELECT * FROM person WHERE age < 30 AND age > 30.
> DB2 and MySQL sees that the result is the empty set, without wasting IOs.
> - So here's another place for potential optimizations, although the area
> is rather hairy, as soon as one moves beyond the most simple cases.

The postgresql team considers the load on their plates to be great enough
without bothering to optimize highly non-optimal, poorly thought out
queries.

I.e. if you're asking for things like in that where clause, no one's gonna
optimize that.  There's already too much to do around here without
focusing on that.  Now, if someone gets an itch and wants to try and code
it in their spare time...


Re: CHECK constraints and optimizations

From
Martijn van Oosterhout
Date:
On Thu, May 06, 2004 at 09:29:42AM -0600, scott.marlowe wrote:
> A check constraint is run on a record when it is changed to make sure it
> still meets the requirements of the constraint.  There is no seperate file
> that says "this row meets the constraint".  Deferred constraints mean the
> check is to be done at the commit time of the transaction.
>
> Note that unique constraints are not necessarily deferrable due to issues
> caused by using an immediate acting unique index.  I don't think this is
> easily fixable either.
>
> So, a check constraint is of no use during a read from the table, and
> is a performance penalty when writing to it.

I have been thinking though, imagine a table with the constraint:

x < 1000

If I have a query that has WHERE x > 2000, can't that be optimised to
WHERE FALSE? Or WHERE x < 1200 optimised to x < 1000?

Obviously not if the constraint is deferred, but otherwise?

The other person is correct in that (x < 1000 and x > 2000) is not
optimised away by postgresql. Odd, because the capability is there as
very similar tests are use by partial indexes and the index code in
general. If that worked, the system could just add the (simple) CHECK
constraints to the WHERE clause of a query, do the optimisation phrase
and then remove any that remain.

I can't see why this wouldn't work.

Any thoughts?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: CHECK constraints and optimizations

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I can't see why this wouldn't work.

Doubtless you could do it.  The problem with the idea is that those
inference tests are pretty expensive.  I think that any such thing would
waste significant numbers of cycles on ordinary queries while only being
a win on a few poorly-written queries.

We do have to make a tradeoff between planning time and execution time,
and I fear that this idea is not going to be a win in those terms.
If you feel like experimenting, though, go for it ...

            regards, tom lane

Re: CHECK constraints and optimizations

From
Martijn van Oosterhout
Date:
On Thu, May 06, 2004 at 09:02:21PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I can't see why this wouldn't work.
>
> Doubtless you could do it.  The problem with the idea is that those
> inference tests are pretty expensive.  I think that any such thing would
> waste significant numbers of cycles on ordinary queries while only being
> a win on a few poorly-written queries.

Is it really that expensive? From the index code I remember playing
with way back when I was fiddling with the partial index stuff, there
is a table where it takes:

X OPa Val1
X OPb Val2

and it has a lookup table on (OPa,OPb) to provide an OPc that can be
applied to (Val1,Val2) to determine if one implies the other. I was
very impressed actually, quite a neat idea. Quite simple I thought.

I wasn't really considering anything more complicated than this. No
subclauses, only ANDs.

> We do have to make a tradeoff between planning time and execution time,
> and I fear that this idea is not going to be a win in those terms.
> If you feel like experimenting, though, go for it ...

Ofcourse, my ulterior motive is that I want table partitioning based on
values within the tuple. And then have queries avoid scanning tables
that query things that are not in those tables as inferred by parts of
the WHERE clause. Think phone calls with a different subtable for each
year, automatically.

I toyed with creating a script that would generate the RULEs necessary
to implement it in the current system, but splitting a table into four
peices would require around 50+ RULEs (4 subtables x 4 conditions x 3
query types), obviously massively more inefficient that what's being
suggested here.

The solution is to build it right into the storage manager, but I
haven't tried that yet.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: CHECK constraints and optimizations

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, May 06, 2004 at 09:02:21PM -0400, Tom Lane wrote:
>> Doubtless you could do it.  The problem with the idea is that those
>> inference tests are pretty expensive.

> Is it really that expensive?

I'm afraid it would be.  You're correct that the basic test in the
pred_test routine is simple enough, but there are a few issues:

1. You gotta root through all the index opclasses to see if you can find
one involving the operators at hand.  (I'm pretty sure this is the most
expensive part of pred_test_simple_clause() as it stands.)

2. Using this to detect redundant/contradictory clauses implies
comparing every WHERE clause to every other WHERE clause, hence O(N^2)
effort.

3. You'll be paying the price even on pretty simple queries.  I usually
figure that extra planner effort is worthwhile if it only fires on
complex queries, or queries where there's particular reason to think a
win might be had.  (Eg, 7.4 goes out of its way when it sees an
IN-subselect clause, but I don't think anyone has a problem with that.)
But with this idea, I don't see any way to avoid expending a lot of
effort on queries where no win will actually result.


BTW, there actually is code in the btree index stuff to detect
contradictory index quals, so "x > 10 AND x < 10" will in fact result in
no I/O if the chosen plan is an indexscan on x.  (This path doesn't have
the same problems mentioned above, because by the time control gets
there, we've already determined that the operators are indeed in the
same index opclass and that the same variable is involved.)  So that's
another hole in the scope of usefulness of a planning-time test.

I was a tad surprised by the assertion up at the top of this thread
that MySQL has a test for this case.  From what I know of their design
philosophy, they'd have even less interest than us in optimizing
badly-written queries at the cost of slowing down the normal path.
Am I right to guess that what they actually have is a short-circuit case
similar to ours for contradictory index quals, and not a blanket check
for contradictory WHERE conditions in general?

            regards, tom lane