Re: Constraint Exclusion (Partitioning) - Initial Review - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Constraint Exclusion (Partitioning) - Initial Review
Date
Msg-id 1120431087.3940.103.camel@localhost.localdomain
Whole thread Raw
In response to Re: Constraint Exclusion (Partitioning) - Initial Review requested  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Constraint Exclusion (Partitioning) - Initial Review
List pgsql-patches
On Sat, 2005-07-02 at 15:56 -0400, Bruce Momjian wrote:
> Seems you have managed to combine inheritance, check constraints, and
> partial index into table partitioning.  It is nice it requires no new
> syntax.  Here is an example from your tests:
>
>     DROP TABLE pm cascade;
>     CREATE TABLE pm
>     ( dkey   INT NOT NULL
>     );
>
>     CREATE TABLE p1 ( CHECK (dkey BETWEEN 10000 AND 19999)) INHERITS (pm);
>     CREATE TABLE p2 ( CHECK (dkey BETWEEN 20000 AND 29999)) INHERITS (pm);
>     CREATE TABLE p3 ( CHECK (dkey BETWEEN 30000 AND 39999)) INHERITS (pm);
>
> So, in this case, a SELECT from pm would pull from the three base
> tables, and those tables can be located on different tablespaces, and
> the backend will only look in child tables that might contain rows basd
> on the check constraints.  It is that last phrase that is the new
> functionality here.

Yes, dead on. Thank you for this elegant summary. The main idea was
originally Hannu Krosing's, I believe, with suggestion from Tom to
enhance the partial index machinery to this end.

So a query such as

    select * from pm where dkey = 25000

will have an EXPLAIN that completely ignores p1 and p3, since these can
be provably excluded from the plan without effecting the result.

I see the "no syntax" version as the first step towards additional
functionality that would require additional syntax.

> Oh, why would someone want to set enable_constraint_exclusion to false?

The included functionality performs the exclusion at plan time. If a
query was prepared for later execution, it *could* return the wrong
answer when the plan was executed at a later time since plans are not
invalidated when constraints change. So, in general, this should be set
to false except for circumstances where the user can guarantee no such
mistake would be made.

> You had a few questions:
>
> > Main questions:
> > 1. How should we handle the case where *all* inherited relations are
> > excluded? (This is not currently covered in the code).
>
> I assume this means we don't return any rows.  Why it is an issue?

A code question only. No issue, just how should the code look?

> > 2. Should this feature be available for all queries or just inherited
> > relations?
>
> I don't see why other queries should not use this.  Our TODO already
> has:
>
>     * Use CHECK constraints to influence optimizer decisions
>
>       CHECK constraints contain information about the distribution of values
>       within the table. This is also useful for implementing subtables where
>       a tables content is distributed across several subtables.
>
> and this looks like what you are doing.  However, again, I see the
> constraint as just informing whether there might be any rows in the
> table.  Am I missing something?  Are you thinking views with UNION could
> benefit from this?

In general, it seems you might want this. In normal use check
constraints tend to be on minor columns, not key columns. Queries that
would be provably able to exclude tables based upon this would be
strange queries.

i.e.
select count(distinct item_pk) from warehouse where quantity < 0

is not a very common query. So we would pay the overhead of checking for
exclusion for all queries when only a few wierd ones would ever take
advantage of it. Sounds like a poor trade-off to me.

IMHO, the only time you might expect to see benefit is when you have
many similar tables that are partitioned by design into pieces that lend
themselves to exclusion. If you specifically designed a set of tables
and used UNION to bring them together, then I can see that you would
want it then also.... but is there any benefit in supporting two
different ways of achieving the same basic design: partitioned table.

> > 3. And should we extend RelOptInfo to include constraint information?
>
> Is the problem that you have to do multiple lookups without it?

> > 4. Do we want to integrate the test suite also?
>
> No, once this thing works, I don't see it getting broken frequently.  We
> usually don't test the optimizer, but we could add a single test for one
> row in each table.
>
> > 5. Presumably a section under Performance tips would be appropriate to
> > document this feature? (As well as section in run-time parameters).
>
> Yep.
>
> I am surprised no one else has commented on it, which I think means your
> code is ready for the queue.  Do you want to adjust it based on this
> feedback or should I apply and you can adjust it later.

I think that it means I did not explain myself well enough. :-)

Best Regards, Simon Riggs


pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Constraint Exclusion (Partitioning) - Initial Review
Next
From: Petr Jelinek
Date:
Subject: Re: per user/database connections limit again