Re: Constraint Exclusion (Partitioning) - Initial Review requested - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Constraint Exclusion (Partitioning) - Initial Review requested
Date
Msg-id 200507021956.j62JumG04898@candle.pha.pa.us
Whole thread Raw
In response to Constraint Exclusion (Partitioning) - Initial Review requested  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Constraint Exclusion (Partitioning) - Initial Review requested  (Alvaro Herrera <alvherre@surnet.cl>)
Re: Constraint Exclusion (Partitioning) - Initial Review  (Simon Riggs <simon@2ndquadrant.com>)
Re: Constraint Exclusion (Partitioning) - Initial Review  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Seems you have managed to combine inheritance, check constraints, and
partial index into table paritioning.  It is nice it requires no new
syntax.  Here is an example from your tests:

    DROP TABLE pm cascade;
    CREATE TABLE pm
    ( dkey   INT NOT NULL
    );

    CREATE TABLE p1 ( CHECK (dkey BETWEEN 10000 AND 19999)) INHERITS (pm);
    CREATE TABLE p2 ( CHECK (dkey BETWEEN 20000 AND 29999)) INHERITS (pm);
    CREATE TABLE p3 ( CHECK (dkey BETWEEN 30000 AND 39999)) INHERITS (pm);

So, in this case, a SELECT from pm would pull from the three base
tables, and those tables can be located on different tablespaces, and
the backend will only look in child tables that might contain rows basd
on the check constraints.  It is that last phrase that is the new
functionality here.

Oh, why would someone want to set enable_constraint_exclusion to false?

You had a few questions:

> Main questions:
> 1. How should we handle the case where *all* inherited relations are
> excluded? (This is not currently covered in the code).

I assume this means we don't return any rows.  Why it is an issue?

> 2. Should this feature be available for all queries or just inherited
> relations?

I don't see why other queries should not use this.  Our TODO already
has:

    * Use CHECK constraints to influence optimizer decisions

      CHECK constraints contain information about the distribution of values
      within the table. This is also useful for implementing subtables where
      a tables content is distributed across several subtables.

and this looks like what you are doing.  However, again, I see the
constraint as just informing whether there might be any rows in the
table.  Am I missing something?  Are you thinking views with UNION could
benefit from this?

> 3. And should we extend RelOptInfo to include constraint information?

Is the problem that you have to do multiple lookups without it?

> 4. Do we want to integrate the test suite also?

No, once this thing works, I don't see it getting broken frequently.  We
usually don't test the optimizer, but we could add a single test for one
row in each table.

> 5. Presumably a section under Performance tips would be appropriate to
> document this feature? (As well as section in run-time parameters).

Yep.

I am surprised no one else has commented on it, which I think means your
code is ready for the queue.  Do you want to adjust it based on this
feedback or should I apply and you can adjust it later.

---------------------------------------------------------------------------

Simon Riggs wrote:
>
> I enclose a fully working implementation of Constraint Exclusion, a very
> basic form of Partitioning. Initial review is requested, to allow us all
> to assess what further work is required on this prior to Beta freeze.
>
> Patch against current cvstip; passes make check and all special tests.
>
> The main purpose of this feature is to reduce access time against large
> tables that have been split into partitions by using the PostgreSQL
> inheritance facility. It has been written in a very generic way allowing
> a whole range of applications.
>
> If
>     a) a table is part of an inheritance set
>     b) the table has check constraints defined upon it
>     c) enable_constraint_exclusion = true
>
> then the planner will attempt to use the definition of the Constraints
> to see if that relation could ever have rows in it that the query might
> see. *No* additional SQL DDL syntax is required to define this.
>
> Only query clauses of the form ATTR OP CONSTANT will be considered, in a
> very similar way to the way partial indexes work already.
>
> The code changes effect only the planner, building upon the partial
> index logic to allow refutation as well as implication.
>
> There are clearly many questions to be answered by me and I'm happy to
> do so, so please fire away. My hope is to get a more polished form of
> this functionality into 8.1. Further developments on Partitioning are
> foreseen, though the feature submitted today is the main building block
> for any further work/optimization in this area and so additional
> features will be discussed at a later time.
>
> A full test suite has been specially written for this feature. This is
> included here also, though no attempt has been made as yet to integrate
> that with the main regression test suite (as yet). Required files are
> included in a single tar file with this email. Extract these to the
> PostgreSQL installation directory and run using ./testprange.sh
> The test suite executes around 100 queries against 7 different database
> designs, comparing results with/without the new enable option. Full and
> pruned EXPLAINs are also derived during execution to allow easier
> analysis of the success of the exclusion process (view the
> testprange_t*e.out files).
>
> There are no cases where any of the test queries returns a logically
> incorrect answer; hence fully working. There are a few cases where
> queries have not been optimised as far as possible; in those cases
> checks on my propositional logic are requested... This is extremely
> complex and my expectation is that testers/reviewers will find at least
> of couple of logic improvements. The most frequent queries are believed
> to work optimally.
>
> There is no documentation at this time.
>
> Main questions:
> 1. How should we handle the case where *all* inherited relations are
> excluded? (This is not currently covered in the code).
> 2. Should this feature be available for all queries or just inherited
> relations?
> 3. And should we extend RelOptInfo to include constraint information?
> 4. Do we want to integrate the test suite also?
> 5. Presumably a section under Performance tips would be appropriate to
> document this feature? (As well as section in run-time parameters).
>
> Additional thoughts:
> 1. We should be able to optimise the case where there is only a single
> non-excluded relation by removing the Append node.
>
> Best Regards, Simon Riggs

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: More E'' changes
Next
From: Bruce Momjian
Date:
Subject: Re: per user/database connections limit again