Thread: A Guide to Constraint Exclusion (Partitioning)

A Guide to Constraint Exclusion (Partitioning)

From

Simon Riggs

Date:

13 July 2005, 07:53:29

A Guide to the Constraint Exclusion Feature
===========================================

Simon Riggs    2ndQuadrant    simon@2ndquadrant.com

INTRODUCTION

Constraint Exclusion (CE) is an optimizer patch submitted for PostgreSQL
8.1. CE aims to greatly improve the performance for certain types of
common queries against large tables. CE extends the logic originally
developed for Partial Indexes to allow the optimizer to avoid scanning
particular tables for certain queries. CE will exclude a table from the
query plan when the query's WHERE clause is proven not to "overlap" with
any of the Constraints defined on those tables.

No new syntax is required to take advantage of this feature, which is
designed to work in conjunction with existing Inheritance features. Some
syntax *is* required to declare Constraints, which are already 
ANSI/ISO SQL:2003 compliant.

In other RDBMS these features are often referred to as Partitioning. The
PostgreSQL CE feature is much more widely applicable than that name
might suggest and will be of benefit to many users, not just for
Business Intelligence applications.

The aim of this guide is to help prospective users understand how the
feature will work, to allow improvement during the 8.1 beta test cycle.

All feedback is welcome.

TABLE PARTITIONING OVERVIEW

Let's look at a practical example of how this works and the performance
benefits it provides:

In many BI workloads there are a small number of very large tables,
often referred to as Fact tables. In order to improve load and query
performance against those tables it is often useful to split these
larger tables into pieces, known as partitions.

PostgreSQL 8.0 could support pseudo-partitioning using the Inheritance
feature. Here's a BI Fact table for a Retail DW as an example of that:

CREATE TABLE Sales_DateItemOutlet
( DateKey    Integer
, OutletKey    Integer
, ItemKey    Integer
, SoldQty    Integer
);

This table could be accessed using a query likeSELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey between
20050101and 20050101
 

This can then be split up so that we have a number of sub-tables:

CREATE TABLE Sales_Jan_DateItemOutlet 
() INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Feb_DateItemOutlet 
() INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Mar_DateItemOutlet 
() INHERITS (Sales_DateItemOutlet);

When we load this we put January's data in the Jan table etc, though put
no rows at all in the "parent" table.

Now, if we run our example query againSELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey between 20050101 and
20050101
we find that the query willScan all rows in Sales_DateItemOutlet (which is empty)Scan all rows in
Sales_Jan_DateItemOutletScanall rows in Sales_Feb_DateItemOutletScan all rows in Sales_Mar_DateItemOutlet
 
and return the correct answer. But we know that the query did not really
need to have scanned the Feb and Mar tables, since these do not contain
any data that would satisfy the query.

(The full EXPLAIN output is not shown above, for clarity only.) 

The new CE functionality aims to improve the performance of such
queries. Ideally we would like the query to Scan all rows in Sales_Jan_DateItemOutlet
and ignore the parent table, and all other child tables.

To do allow this, we must provide more declarative information to allow
the optimizer to understand as much as we do. To do this, we add
constraints on to each table, so that it is clear what rows they can
contain.

CREATE TABLE Sales_Jan_DateItemOutlet 
( CHECK (DateKey BETWEEN 20050101 AND 20050131) ) 
INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Feb_DateItemOutlet 
( CHECK (DateKey BETWEEN 20050201 AND 20050229) ) 
INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Mar_DateItemOutlet 
( CHECK (DateKey BETWEEN 20050301 AND 20050331) ) 
INHERITS (Sales_DateItemOutlet);

Now, when we execute our test query, the optimizer can understand that
the Feb and Mar tables can *never* hold rows that would match our query.
As a result it is able to provably exclude them from the query plan.

Thus, with Constraint Exclusion enabled our test querySELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey
between20050101 and 20050101
 
will perform the following scansScan all rows in Sales_DateItemOutlet (which is empty)Scan all rows in
Sales_Jan_DateItemOutlet

Running an EXPLAIN or EXPLAIN ANALYZE will allow you to see which tables
have been included in the query. There is no explicit message to say
that a table has been excluded.

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

Overall then, if the partitions are all the same size, we will have
reduced execution time of the query to around one-third of its previous
elapsed time. If we had 10 or 100 child tables, certain queries could be
10 or 100 times faster than without the CE feature.

In summary, the CE feature will be a huge performance gain for
qualifying queries against large tables in PostgreSQL databases. Since
no new syntax is required, existing PostgreSQL databases designed to
take advantage of inheritance will automatically benefit from the
performance enhancements.

CONSTRAINT EXCLUSION

If you wish to enable this feature you must set the server parameterenable_constraint_exclusion = true
This is a user-settable parameter, so could be enabled for specific
users or queries.

Constraint exclusion is not enabled by default. This is because
PostgreSQL does not yet have full plan invalidation when DDL changes are
made. If a change were made to any of the tables, the query would not
automatically re-optimize, so it would be possible to get an incorrect
answer returned from a CE plan. This is not a fault of CE, but simply a
side-effect of the current lack of full plan invalidation in PostgreSQL.

CE will work only when the query contains a direct and simple
restriction of the table, such as Attribute > Constant

If Operators are available for that datatype, CE will use =    <    <=    >    >=    <>
These operators must be defined as IMMUTABLE.

CE will work for most simple Constraints. Constraints are already
limited to IMMUTABLE predicates, such as Attribute > Constant.

Constraints likeAttribute BETWEEN Const1 and Const2
can be used to produce ranges of values, and would be generally
described as RANGE PARTITIONING.

Constraints likeAttribute IN (Const1, Const2, Const3 ...)
can be used to produce lists of values, and would generally be described
as LIST PARTITIONING.

All of these Constraint types could be mixed, to allow Constraints likeAttribute1 IN (Const1, Const2) 
AND     Attribute2 >= Const3 AND Attribute2 < Const4

Thus, the CE feature allows a very flexible partitioning scheme to be
developed that could mix LIST, RANGE style partitioning clauses.

Currently, there is no restriction that all constraints *must* be
mutually exclusive, nor even that the constraints may be similar on each
table. This can be useful for some designs where the inheritance
hierarchy is not "disjoint", as UML would term this situation.

CE does not prevent direct access to one of the child tables in an
inheritance hierarchy. In this case, no exclusion test would be
performed. Exclusion tests are performed *only* when the parent table in
an inheritance hierarchy is accessed. Exclusion tests are performed even
if the inheritance hierarchy is many levels deep (e.g. parent-child-
grandchild). CE also supports multiple inheritance.

PostgreSQL makes no restriction upon what indices are defined on the
various child tables. CE is not dependent upon the existence or absence
of indices, only upon the Query's WHERE clause and the Constraints
defined upon the tables.

CE can be very useful for large historical databases. As an example,
PostgreSQL would allow a historical data table split into sections like
this:
- Last 3 months: one table per week, each table defined with 3 indices
- 3-12 months: one table per month, no indices defined
- 12-36 months: tablespaces defined on hierarchical storage (near-line),
with no indices, so de-archived when required for use.

The CE feature would allow the optimizer to avoid wasteful de-archiving
of the older data, with an appropriate database/query design.

The current patch also includes a suite of 700 tests that successfully
exclude child tables in all common query types (whilst returning the
correct answer!). These include examples of List, Range and mixed
partitioning scenarios.

CURRENT RESTRICTIONS

It is not yet possible to specify that Constraints on child tables will
be mutually exclusive of each other. Currently, it would be up to the
designer to ensure that, if desired.

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

If a parent table has a Constraint defined upon it, then this will be
automatically copied to all child tables. Currently, there is no way to
tell which Constraints have been inherited from the parent, so exclusion
tests will be re-executed against all child tables. This will cause
additional optimization time.

Currently, all child tables will be considered. It may be possible in
the future to pre-sort the list of child tables, so that optimization
time can be reduced for parent tables with large numbers of partitions.

Currently, there is no index on the pg_inherits system table. As a
result, parents with more than 1000 child tables are likely to
experience longer than desirable planning times for their queries.

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such asDateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Other existing restrictions on Inherited tables continue to apply.

Further enhancements to the CE feature can be expected in the future.

COMPARISONS WITH OTHER RDBMS

In brief:

Teradata and Oracle already have table Partitioning. Oracle's
partitioning allows either LIST or RANGE partitioning. The new
PostgreSQL CE feature is considerably more flexible than Oracle's
declarative Partitioning syntax, and is achieved without adding 
non-standard SQL extensions.

Both Sybase and MS SQLServer 2005 have new Partitioning features
released this year. The Microsoft feature does allow joins between
tables to exclude partitions. However, it also requires that all
partitions should have identical indexes, which is a major blocking
factor to the use of very large tables.

DB2 has partitioning using UNION ALL views, which includes somewhat
similar functionality to the PostgreSQL CE feature. DB2 uses the
PARTITION keyword to signify a different type of feature, so do not be
confused that they do have this feature with declarative syntax.

=================================================================

Re: A Guide to Constraint Exclusion (Partitioning)

From

Richard Huxton

Date:

13 July 2005, 10:32:40

Sounds very useful - even for my small systems. Does it/would it work on 
an ordinary table (for those cases currently using UNION ALL)?

--   Richard Huxton  Archonet Ltd

Re: A Guide to Constraint Exclusion (Partitioning)

From

Simon Riggs

Date:

13 July 2005, 12:10:48

On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote:
> Sounds very useful - even for my small systems. Does it/would it work on 
> an ordinary table (for those cases currently using UNION ALL)?

I'm looking into that aspect right now.

I see no reason why it shouldn't work for UNION ALL views/queries in the
same manner as it does for inheritance.

Best Regards, Simon Riggs

Re: [Bizgres-general] Re: A Guide to Constraint

From

Hannu Krosing

Date:

14 July 2005, 05:38:02

On K, 2005-07-13 at 16:10 +0100, Simon Riggs wrote:
> On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote:
> > Sounds very useful - even for my small systems. Does it/would it work on 
> > an ordinary table (for those cases currently using UNION ALL)?
> 
> I'm looking into that aspect right now.
> 
> I see no reason why it shouldn't work for UNION ALL views/queries in the
> same manner as it does for inheritance.

Why does the CE speed depend on having index on pg_inherits ?

Can't you just check the constraints on individual tables ?

Or are the constraints not actually copied to child tables, and you must
look them up from parent tabless ?

-- 
Hannu Krosing <hannu@tm.ee>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

14 July 2005, 05:38:02

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

> 
> We aren't able to exclude the parent table from the above query because
> no Constraint was defined upon it. Since, in our example, the parent is
> empty there will be little effect on the query performance. It would be
> a mistake to attempt to get around this by placing a Constraint on the
> parent, since that would then automatically be created on all child
> tables also. So we can never exclude the parent without automatically
> excluding *all* of the children also.

At least in 8.0 you can drop the inherited constraint from child table:

hannu=# create table empty_master_table (
hannu(#   id serial,
hannu(#   data text,
hannu(#   constraint table_must_be_empty check(false)
hannu(# );
NOTICE:  CREATE TABLE will create implicit sequence
"empty_master_table_id_seq" for "serial" column "empty_master_table.id"
CREATE TABLE
hannu=# insert into empty_master_table (data) values (NULL);
ERROR:  new row for relation "empty_master_table" violates check
constraint "tab le_must_be_empty"
hannu=# create table first_partition() inherits (empty_master_table);
CREATE TABLE
hannu=# \d first_partition                           Table "public.first_partition"Column |  Type   |
         Modifiers
 
--------+---------
+--------------------------------------------------------------------id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)data   | text    |
Check constraints:   "table_must_be_empty" CHECK (false)
Inherits: empty_master_table

hannu=# alter table first_partition drop constraint table_must_be_empty;
ALTER TABLE
hannu=# \d first_partition                           Table "public.first_partition"Column |  Type   |
         Modifiers
 
--------+---------
+--------------------------------------------------------------------id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)data   | text    |
Inherits: empty_master_table

hannu=# \d empty_master_table                          Table "public.empty_master_table"Column |  Type   |
              Modifiers
 
--------+---------
+--------------------------------------------------------------------id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)data   | text    |
Check constraints:   "table_must_be_empty" CHECK (false)

hannu=# insert into first_partition(data) values ('first_partition');
INSERT 19501405 1
hannu=#

I imagine that this kind of thing does not work well with pg_dump, but
it is at least possible.


> Currently, there is no restriction that all constraints *must* be
> mutually exclusive, nor even that the constraints may be similar on each
> table. This can be useful for some designs where the inheritance
> hierarchy is not "disjoint", as UML would term this situation.

actually this is GOOD, as this way I can have a constraint on both 
insert_timestamp and primary_key fields, which are mostly but not
absolutely in the same order. And also to add extra IN (X,Y,Z)
constraints for some other fields. 

> CE does not prevent direct access to one of the child tables in an
> inheritance hierarchy. In this case, no exclusion test would be
> performed. Exclusion tests are performed *only* when the parent table in
> an inheritance hierarchy is accessed. Exclusion tests are performed even
> if the inheritance hierarchy is many levels deep (e.g. parent-child-
> grandchild). CE also supports multiple inheritance.

I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
even simple queries.

> CURRENT RESTRICTIONS
> 
> It is not yet possible to specify that Constraints on child tables will
> be mutually exclusive of each other. Currently, it would be up to the
> designer to ensure that, if desired.
> 
> It is not yet possible to specify that an inheritance parent has no
> rows, and, if so, should always be excluded from the query.

I think that a simple "CHECK(false)" constraint should be enough for
this.

> If a parent table has a Constraint defined upon it, then this will be
> automatically copied to all child tables. 

But they can be removed later if desired.

> Currently, there is no way to
> tell which Constraints have been inherited from the parent, so exclusion
> tests will be re-executed against all child tables. This will cause
> additional optimization time.

Have you done any performance testing, i.e. what is the actual impact of
CE on planning time ?

> Currently, all child tables will be considered. It may be possible in
> the future to pre-sort the list of child tables, so that optimization
> time can be reduced for parent tables with large numbers of partitions.
>
> Currently, there is no index on the pg_inherits system table. As a
> result, parents with more than 1000 child tables are likely to
> experience longer than desirable planning times for their queries.

Am I right that this is a general postgresql issue and has nothing to do
with CE ?

> CE checks will not currently recognise STABLE functions within a query.
> So WHERE clauses such as
>     DateKey > CURRENT DATE
> will not cause exclusion because CURRENT DATE is a STABLE function.
> 
> CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

Or do I have to make sure that it is not lifted out of that subquery
using something like pl/pgsql function ?

> Other existing restrictions on Inherited tables continue to apply.

WHat happens for multiple inheritance ? 

Is it detected and then also excluded from CE ?

Or is that just a "don't do it" item ?

> Further enhancements to the CE feature can be expected in the future.

Great! :D

And a big Thank You!

-- 
Hannu Krosing <hannu@tm.ee>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

14 July 2005, 05:48:41

On N, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:
> On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:
> 
> > 
> > We aren't able to exclude the parent table from the above query because
> > no Constraint was defined upon it. Since, in our example, the parent is
> > empty there will be little effect on the query performance. It would be
> > a mistake to attempt to get around this by placing a Constraint on the
> > parent, since that would then automatically be created on all child
> > tables also. So we can never exclude the parent without automatically
> > excluding *all* of the children also.

hannu=# create table ttt(id serial, t text);
NOTICE:  CREATE TABLE will create implicit sequence "ttt_id_seq" for
"serial" co lumn "ttt.id"
CREATE TABLE
hannu=# create table ttt2() inherits (ttt);
CREATE TABLE
hannu=# alter table ONLY ttt add constraint ccc check(false);
ALTER TABLE

this also works, but new inherited tables will still inherit the
constraint.

Perhaps we just need a way to store the "ONLY" status of the constraint,
and then not inherit these.

-- 
Hannu Krosing <hannu@skype.net>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Greg Stark

Date:

14 July 2005, 16:30:41

Hannu Krosing <hannu@tm.ee> writes:

> > It is not yet possible to specify that an inheritance parent has no
> > rows, and, if so, should always be excluded from the query.
> 
> I think that a simple "CHECK(false)" constraint should be enough for
> this.

huh, that's clever.

> > CE checks are not made when the parent table is involved in a join.
> 
> Is this also the case where parent table is inside subquery and that
> subquery is involved in a join?

I assume this is a TODO and just not in your first batch of work? It seems
like a pretty important piece eventually. Are there any fundamental
difficulties with handling joins eventually? 

-- 
greg

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 17:29:55

On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:
> On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

> > We aren't able to exclude the parent table from the above query because
> > no Constraint was defined upon it. Since, in our example, the parent is
> > empty there will be little effect on the query performance. It would be
> > a mistake to attempt to get around this by placing a Constraint on the
> > parent, since that would then automatically be created on all child
> > tables also. So we can never exclude the parent without automatically
> > excluding *all* of the children also.
> 
> At least in 8.0 you can drop the inherited constraint from child table:

Yes, you can. But in discussion on -hackers on 20 May there was
agreement (OK, me agreeing with Tom) that being able to drop inherited
constraints was a bug (or at least a deprecated feature...)

> > Currently, there is no restriction that all constraints *must* be
> > mutually exclusive, nor even that the constraints may be similar on each
> > table. This can be useful for some designs where the inheritance
> > hierarchy is not "disjoint", as UML would term this situation.
> 
> actually this is GOOD, as this way I can have a constraint on both 
> insert_timestamp and primary_key fields, which are mostly but not
> absolutely in the same order. And also to add extra IN (X,Y,Z)
> constraints for some other fields. 

Yes, understood. My description was not very good. I did not mean that
constraints on any one table should be mutually exclusive. I meant that
a set of constraints across a set of child tables should be able to be
defined mutually exclusive (e.g. just as Oracle partitions are).

> > CE does not prevent direct access to one of the child tables in an
> > inheritance hierarchy. In this case, no exclusion test would be
> > performed. Exclusion tests are performed *only* when the parent table in
> > an inheritance hierarchy is accessed. Exclusion tests are performed even
> > if the inheritance hierarchy is many levels deep (e.g. parent-child-
> > grandchild). CE also supports multiple inheritance.
> 
> I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
> even simple queries.

I've argued against that, but I guess if there is enough opinion I can
be persuaded.

My argument is that most constraints look something likeCHECK ( salary > 0 and salary < 1000000)
If you ask a query likeselect count(*) from emp where salary < -10000
then CE would work great. But you don't find many people asking such
nonsensical questions, or at least very often. Those queries do get
asked, but they seem either naive or "data profiling" type queries. So
my conclusion is that for most constraints and most queries, CE is
literally just a waste of planning time.

When a designer deliberately creates a scenario where constraints have
meaning, as in a mutually exclusive inheritance hierarchy or classic
partitioned table design, then you have a high chance of CE being
effective. 

The indicator for the value of CE, IMHO, is the "and/or" aspect of
multiple related tables.

Thats my argument, but I'm willing to hear counter arguments. I just
wish to avoid overselling CE and slowing most people's queries down for
no good reason.

> > CURRENT RESTRICTIONS
> > 
> > It is not yet possible to specify that Constraints on child tables will
> > be mutually exclusive of each other. Currently, it would be up to the
> > designer to ensure that, if desired.
> > 
> > It is not yet possible to specify that an inheritance parent has no
> > rows, and, if so, should always be excluded from the query.
> 
> I think that a simple "CHECK(false)" constraint should be enough for
> this.

That works, but as I said....

> > If a parent table has a Constraint defined upon it, then this will be
> > automatically copied to all child tables. 
> 
> But they can be removed later if desired.

But should not be...

> > Currently, there is no way to
> > tell which Constraints have been inherited from the parent, so exclusion
> > tests will be re-executed against all child tables. This will cause
> > additional optimization time.
> 
> Have you done any performance testing, i.e. what is the actual impact of
> CE on planning time ?

I think "bad" would be my assessment. But not "very bad". But this must
be offset by the extraordinary time saving in execution time.

Planning time depends upon the complexity of the query, the number and
complexity of the constraints and the number of child tables.

I think I would guess currently at child tables of around 1 GB, with
sizes as small as 100 MB being reasonable. Trying to keep table size
relatively evenly distributed would be useful also. With those
recommendations it will always be worth 0.1 sec investment in trying to
avoid 100+ sec sequential scan times. But again, you need to consider
your workload.

> > Currently, all child tables will be considered. It may be possible in
> > the future to pre-sort the list of child tables, so that optimization
> > time can be reduced for parent tables with large numbers of partitions.
> >
> > Currently, there is no index on the pg_inherits system table. As a
> > result, parents with more than 1000 child tables are likely to
> > experience longer than desirable planning times for their queries.
> 
> Am I right that this is a general postgresql issue and has nothing to do
> with CE ?

Yep.

> > CE checks will not currently recognise STABLE functions within a query.
> > So WHERE clauses such as
> >     DateKey > CURRENT DATE
> > will not cause exclusion because CURRENT DATE is a STABLE function.
> > 
> > CE checks are not made when the parent table is involved in a join.
> 
> Is this also the case where parent table is inside subquery and that
> subquery is involved in a join?

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

> Or do I have to make sure that it is not lifted out of that subquery
> using something like pl/pgsql function ?
> 
> > Other existing restrictions on Inherited tables continue to apply.
> 
> WHat happens for multiple inheritance ? 

CE works.

> Is it detected and then also excluded from CE ?

No

> Or is that just a "don't do it" item ?

Thats a religious debate.... I'll leave that one.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 17:34:19

On Thu, 2005-07-14 at 15:30 -0400, Greg Stark wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> 
> > > It is not yet possible to specify that an inheritance parent has no
> > > rows, and, if so, should always be excluded from the query.
> > 
> > I think that a simple "CHECK(false)" constraint should be enough for
> > this.
> 
> huh, that's clever.

But not clever enough... see my reply to Hannu. Sorry.

> > > CE checks are not made when the parent table is involved in a join.
> > 
> > Is this also the case where parent table is inside subquery and that
> > subquery is involved in a join?
> 
> I assume this is a TODO and just not in your first batch of work? It seems
> like a pretty important piece eventually. Are there any fundamental
> difficulties with handling joins eventually? 

Its a reasonable size piece of work, and could not be reworked in time
for 8.1.

Some other mental leaps may also be required.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

"Luke Lonergan"

Date:

14 July 2005, 18:13:43

>>> CE checks will not currently recognise STABLE functions within a query.
>>> So WHERE clauses such as
>>> DateKey > CURRENT DATE
>>> will not cause exclusion because CURRENT DATE is a STABLE function.
>>> 
>>> CE checks are not made when the parent table is involved in a join.
>> 
>> Is this also the case where parent table is inside subquery and that
>> subquery is involved in a join?
> 
> My comment was too terse. What I meant was that you can't do dynamic
> exclusion based upon the results of a join. i.e. PPUC2

Phew!  Correlated subqueries won't CE then, but the more common complex
queries will.  We'll test with some common ones soon.

- Luke

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

14 July 2005, 18:24:50

On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote:
> On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:
> > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:
> 
> > > We aren't able to exclude the parent table from the above query because
> > > no Constraint was defined upon it. Since, in our example, the parent is
> > > empty there will be little effect on the query performance. It would be
> > > a mistake to attempt to get around this by placing a Constraint on the
> > > parent, since that would then automatically be created on all child
> > > tables also. So we can never exclude the parent without automatically
> > > excluding *all* of the children also.
> > 
> > At least in 8.0 you can drop the inherited constraint from child table:
> 
> Yes, you can. But in discussion on -hackers on 20 May there was
> agreement (OK, me agreeing with Tom) that being able to drop inherited
> constraints was a bug (or at least a deprecated feature...)

Why is it a bug ?

Tom wrote on May 20:
> I think a good argument can be made that the above behavior is a bug,
> and that the ALTER command should have been rejected.  We've gone to
> great lengths to make sure you can't ALTER a child table to make it
> incompatible with the parent in terms of the column names and types;
> shouldn't this be true of check constraints as well?

There is a good fundamental reason why we dont let people drop columns
from children or to add them to parent ONLY: if we did not, then there
would be no way to query from the hierarchy.

I can't see any such reason for forbidding dropping constraints from
child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
not for CHECK constraints. 

And even disallowing it can probably be circumvented by a clever
functional CHECK constraint, which checks for also the table it is
defined on.

OTOH, disallowing all this for child FOREIGN KEY's seems logical though.

OTOOH, to be symmetrical with previous, we should also have UNIQUE and
PK constraints that span all the inheritance hierarchy, but I don't want
to go there now :) 

> > > CE does not prevent direct access to one of the child tables in an
> > > inheritance hierarchy. In this case, no exclusion test would be
> > > performed. Exclusion tests are performed *only* when the parent table in
> > > an inheritance hierarchy is accessed. Exclusion tests are performed even
> > > if the inheritance hierarchy is many levels deep (e.g. parent-child-
> > > grandchild). CE also supports multiple inheritance.
> > 
> > I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
> > even simple queries.
> 
> I've argued against that, but I guess if there is enough opinion I can
> be persuaded.
> 
> My argument is that most constraints look something like
>     CHECK ( salary > 0 and salary < 1000000)
> If you ask a query like
>     select count(*) from emp where salary < -10000
> then CE would work great. But you don't find many people asking such
> nonsensical questions, or at least very often. Those queries do get
> asked, but they seem either naive or "data profiling" type queries. So
> my conclusion is that for most constraints and most queries, CE is
> literally just a waste of planning time.

CE may have some use for automatically generated queries, but in general
I agree with you.

Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT
could probably be taught to use CE at some stage.

Also CE could at some stage be used to drop the UNIQUEifying nodes from
UNION (without ALL), if it can prove that the UNION is already UNIQUE.

> > > If a parent table has a Constraint defined upon it, then this will be
> > > automatically copied to all child tables. 
> > 
> > But they can be removed later if desired.
> 
> But should not be...

FOREIGN KEY constraints should not, but I think that simple CHECK's
could.

> > Have you done any performance testing, i.e. what is the actual impact of
> > CE on planning time ?
> 
> I think "bad" would be my assessment. But not "very bad". But this must
> be offset by the extraordinary time saving in execution time.
> 
> Planning time depends upon the complexity of the query, the number and
> complexity of the constraints and the number of child tables.
> 
> I think I would guess currently at child tables of around 1 GB, with
> sizes as small as 100 MB being reasonable. Trying to keep table size
> relatively evenly distributed would be useful also. With those
> recommendations it will always be worth 0.1 sec investment in trying to
> avoid 100+ sec sequential scan times. But again, you need to consider
> your workload.

That's why I asked for GUC, not a default behaviour ;)

> > > CE checks will not currently recognise STABLE functions within a query.
> > > So WHERE clauses such as
> > >     DateKey > CURRENT DATE
> > > will not cause exclusion because CURRENT DATE is a STABLE function.
> > > 
> > > CE checks are not made when the parent table is involved in a join.
> > 
> > Is this also the case where parent table is inside subquery and that
> > subquery is involved in a join?
> 
> My comment was too terse. What I meant was that you can't do dynamic
> exclusion based upon the results of a join. i.e. PPUC2

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having  CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with  CHECK(key_order BETWEEN 1 AND 1000)  CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_orderBETWEEN 2001 AND 3000)
 

then the 3rd one could be eliminated statically from a join on
id_order=key_order

...

> > WHat happens for multiple inheritance ? 
> 
> CE works.
>
> > Is it detected and then also excluded from CE ?
> 
> No
> 
> > Or is that just a "don't do it" item ?
> 
> Thats a religious debate.... I'll leave that one.

:)

> Best Regards, Simon Riggs

And thanks for the good work so far!

-- 
Hannu Krosing <hannu@skype.net>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 18:41:20

On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:
> On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote:
> > On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:
> > > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:
> > 
> > > > We aren't able to exclude the parent table from the above query because
> > > > no Constraint was defined upon it. Since, in our example, the parent is
> > > > empty there will be little effect on the query performance. It would be
> > > > a mistake to attempt to get around this by placing a Constraint on the
> > > > parent, since that would then automatically be created on all child
> > > > tables also. So we can never exclude the parent without automatically
> > > > excluding *all* of the children also.
> > > 
> > > At least in 8.0 you can drop the inherited constraint from child table:
> > 
> > Yes, you can. But in discussion on -hackers on 20 May there was
> > agreement (OK, me agreeing with Tom) that being able to drop inherited
> > constraints was a bug (or at least a deprecated feature...)
> 
> Why is it a bug ?
> 
> Tom wrote on May 20:
> > I think a good argument can be made that the above behavior is a bug,
> > and that the ALTER command should have been rejected.  We've gone to
> > great lengths to make sure you can't ALTER a child table to make it
> > incompatible with the parent in terms of the column names and types;
> > shouldn't this be true of check constraints as well?
> 
> There is a good fundamental reason why we dont let people drop columns
> from children or to add them to parent ONLY: if we did not, then there
> would be no way to query from the hierarchy.
> 
> I can't see any such reason for forbidding dropping constraints from
> child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
> not for CHECK constraints. 

If we allow DROPing them, why allow them to be inherited in the first
place? One or the other, not both.

I do still agree with Tom on that, but as I said at the time, I don't
see it as a big enough problem to spend time removing that feature. But
I personally wouldn't grow to rely on its existence either.

There are other ways...

> And even disallowing it can probably be circumvented by a clever
> functional CHECK constraint, which checks for also the table it is
> defined on.
> 
> OTOH, disallowing all this for child FOREIGN KEY's seems logical though.
> 
> OTOOH, to be symmetrical with previous, we should also have UNIQUE and
> PK constraints that span all the inheritance hierarchy, but I don't want
> to go there now :) 

Well, I am working towards that.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 18:51:57

On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:
> Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT
> could probably be taught to use CE at some stage.

It turns out that to solve this problem you very nearly have to solve
the "any table" problem. Thats an extra argument in favour of making
this work for any table.

> That's why I asked for GUC, not a default behaviour ;)

> but what about _static_ exlusion based on constraints ?
> 
> I mean if there is a left side table with say a single partition having
>    CHECK(id_order BETWEEN 1 AND 1000)
> (either originally or left after eliminating other by other constraints)
> 
> and 3 right side partition with
>    CHECK(key_order BETWEEN 1 AND 1000)
>    CHECK(key_order BETWEEN 1001 AND 2000)
>    CHECK(key_order BETWEEN 2001 AND 3000)
> 
> then the 3rd one could be eliminated statically from a join on
> id_order=key_order

Well, SQL allows you to express almost any query, but that doesn't mean
it is all 3 of: frequently occcuring, meaningful and interesting.

Have you ever seen such a construct?

I think we might be able to use equivalence to show that a restriction
on one table could be translated into a restriction on the Fact table.

e.g.

SELECT
FROM Fact, OtherTable
WHERE Fact.Key = OtherTable.Key
AND OtherTable.Key > 28000

But the harder and yet more common problem is where there is no direct
restriction on the equivalent join column. Thats the one I would
eventually seek to solve

e.g.

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Dimension.DescriptiveField = 'Blah'

where there is a relationship between DescriptiveField and Key enshrined
within the Dimension table.

> and thanks for the good work so far!

Thank you,

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

"Luke Lonergan"

Date:

14 July 2005, 19:02:41

>> I assume this is a TODO and just not in your first batch of work? It seems
>> like a pretty important piece eventually. Are there any fundamental
>> difficulties with handling joins eventually?
> 
> Its a reasonable size piece of work, and could not be reworked in time
> for 8.1.

As you've said previously, when you refer to joins not benefiting from CE,
you mean data-driven joins, or joins whose predicates are dynamically
determined, right?

- Luke

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

"Luke Lonergan"

Date:

14 July 2005, 19:07:37

Hannu,

>> My comment was too terse. What I meant was that you can't do dynamic
>> exclusion based upon the results of a join. i.e. PPUC2
> 
> but what about _static_ exlusion based on constraints ?
> 
> I mean if there is a left side table with say a single partition having
>    CHECK(id_order BETWEEN 1 AND 1000)
> (either originally or left after eliminating other by other constraints)
> 
> and 3 right side partition with
>    CHECK(key_order BETWEEN 1 AND 1000)
>    CHECK(key_order BETWEEN 1001 AND 2000)
>    CHECK(key_order BETWEEN 2001 AND 3000)
> 
> then the 3rd one could be eliminated statically from a join on
> id_order=key_order

I would expect that the 2nd and 3rd partitions would be CE'ed if the basic
support is there.  Does your current implementation include this capability
Simon?  Or is it limited to use with constant predicates?

- Luke

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

"Luke Lonergan"

Date:

14 July 2005, 19:16:33

Simon,

> SELECT
> FROM Fact, Dimension
> WHERE Fact.Key = Dimension.Key
> AND Dimension.DescriptiveField = 'Blah'

So, what happens with this:

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Fact.part = 100;

With Fact defined with 3 partitions:   CHECK(Part BETWEEN    1 AND 1000)   CHECK(Part BETWEEN 1001 AND 2000)
CHECK(PartBETWEEN 2001 AND 3000)
 

Will the 2nd and 3rd partitions be eliminated?

- Luke

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

14 July 2005, 19:21:19

On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

> 
> but what about _static_ exlusion based on constraints ?
> 
> I mean if there is a left side table with say a single partition having
>    CHECK(id_order BETWEEN 1 AND 1000)
> (either originally or left after eliminating other by other constraints)
> 
> and 3 right side partition with
>    CHECK(key_order BETWEEN 1 AND 1000)
>    CHECK(key_order BETWEEN 1001 AND 2000)
>    CHECK(key_order BETWEEN 2001 AND 3000)
> 
> then the 3rd one could be eliminated statically from a join on
> id_order=key_order

the simplest form of this seems to be carrying checks to both ends of
joins before CE.

so for two partitioned tables "main" and "detail", and query

select *  from main m,       detail d where m.id_main = d.key_main   and m.id_main in (1,7,42)

CE is done based on main.id_main in (1,7,42) and detail.key_mainin
(1,7,42)

Or perhaps this carrying over is already done automatically by postgres
planner before CE ?

-- 
Hannu Krosing <hannu@skype.net>

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

Simon Riggs

Date:

14 July 2005, 19:22:23

On Thu, 2005-07-14 at 15:16 -0700, Luke Lonergan wrote:
> Simon,
> 
> > SELECT
> > FROM Fact, Dimension
> > WHERE Fact.Key = Dimension.Key
> > AND Dimension.DescriptiveField = 'Blah'
> 
> So, what happens with this:
> 
> SELECT
> FROM Fact, Dimension
> WHERE Fact.Key = Dimension.Key
> AND Fact.part = 100;
> 
> With Fact defined with 3 partitions:
>     CHECK(Part BETWEEN    1 AND 1000)
>     CHECK(Part BETWEEN 1001 AND 2000)
>     CHECK(Part BETWEEN 2001 AND 3000)
> 
> Will the 2nd and 3rd partitions be eliminated?

Yes.

That is because the clause "part = 100" is a direct restriction on the
partitioned table, using an immutable operator and a constant.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

Simon Riggs

Date:

14 July 2005, 19:27:49

On Thu, 2005-07-14 at 15:07 -0700, Luke Lonergan wrote:
> Hannu,
> 
> >> My comment was too terse. What I meant was that you can't do dynamic
> >> exclusion based upon the results of a join. i.e. PPUC2
> > 
> > but what about _static_ exlusion based on constraints ?
> > 
> > I mean if there is a left side table with say a single partition having
> >    CHECK(id_order BETWEEN 1 AND 1000)
> > (either originally or left after eliminating other by other constraints)
> > 
> > and 3 right side partition with
> >    CHECK(key_order BETWEEN 1 AND 1000)
> >    CHECK(key_order BETWEEN 1001 AND 2000)
> >    CHECK(key_order BETWEEN 2001 AND 3000)
> > 
> > then the 3rd one could be eliminated statically from a join on
> > id_order=key_order
> 
> I would expect that the 2nd and 3rd partitions would be CE'ed if the basic
> support is there.  Does your current implementation include this capability
> Simon?  

No

> Or is it limited to use with constant predicates?

Yes.

It's the first time I've thought to compare the constraint predicates on
joined tables based upon the join restriction. That's possible, but
would take some time to work out.

I've argued that such a construct is not common. I'm open to suggestions
about what *is* common...

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 19:30:29

On Fri, 2005-07-15 at 01:20 +0300, Hannu Krosing wrote:
> On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:
> 
> > 
> > but what about _static_ exlusion based on constraints ?
> > 
> > I mean if there is a left side table with say a single partition having
> >    CHECK(id_order BETWEEN 1 AND 1000)
> > (either originally or left after eliminating other by other constraints)
> > 
> > and 3 right side partition with
> >    CHECK(key_order BETWEEN 1 AND 1000)
> >    CHECK(key_order BETWEEN 1001 AND 2000)
> >    CHECK(key_order BETWEEN 2001 AND 3000)
> > 
> > then the 3rd one could be eliminated statically from a join on
> > id_order=key_order
> 
> the simplest form of this seems to be carrying checks to both ends of
> joins before CE.
> 
> so for two partitioned tables "main" and "detail", and query
> 
> select * 
>   from main m, 
>        detail d 
>  where m.id_main = d.key_main 
>    and m.id_main in (1,7,42)
> 
> CE is done based on main.id_main in (1,7,42) and detail.key_mainin
> (1,7,42)
> 
> Or perhaps this carrying over is already done automatically by postgres
> planner before CE ?

Not sure... will check. I don't do it explicitly, but could do, given
time.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion (

From

"Luke Lonergan"

Date:

14 July 2005, 19:45:12

Simon,

> It's the first time I've thought to compare the constraint predicates on
> joined tables based upon the join restriction. That's possible, but
> would take some time to work out.
> 
> I've argued that such a construct is not common. I'm open to suggestions
> about what *is* common...

I also don't believe it is commonly used.  There are occasions where fact
tables are joined on a common key, but for CE to be effective, the key would
have to be used in a similar way in constructing the partitions.

There are many other ways to accomplish useful patterns, like partitioning
fact tables based on a date based key, then using a constant predicate in
the join, e.g:

SELECT * FROM Fact AND Dimension WHERE Fact.a = Dimension.a   AND to_char(T1.key,"MONTH") = 'JUNE'   AND
to_char(T1.key,"YYYY")= '2003';
 

With Fact defined with 12 partitions   CHECK(to_char(key,"MONTH") EQUAL 'JANUARY')
...   CHECK(to_char(key,"MONTH") EQUAL 'DECEMBER')

- Luke

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

14 July 2005, 19:49:48

On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:
> And thanks for the good work so far!

I think I should add: thanks for the good ideas so far.

Your ideas and challenges have been essential to progress to date, as
has been all the discussions and feedback.

This is the beginning of a journey, not the end. CE for 8.1 is really
just a taster of what could be for 8.2. I hope that doesn't dissuade
anybody from thinking we should "wait for the full functionality".

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

15 July 2005, 04:04:18

On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

> There is a good fundamental reason why we dont let people drop columns
> from children or to add them to parent ONLY: if we did not, then there
> would be no way to query from the hierarchy.
> 
> I can't see any such reason for forbidding dropping constraints from
> child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
> not for CHECK constraints. 
> 
> And even disallowing it can probably be circumvented by a clever
> functional CHECK constraint, which checks for also the table it is
> defined on.
> 
> OTOH, disallowing all this for child FOREIGN KEY's seems logical though.

Well, having thought more about it, I don't think that row-level
constraints (CHECK, FK and NULL) should be restricted at all by
inheritance hierarchy.

Table-level constraints (UNIQUE and PK) should, but they need to be
fixed first.

What should be done with CHECK, FK and NULL is making a way for marking
them as being defined on some table 'ONLY' if created as such. It should
also be marked with ONLY when any of the constraints inherited from this
one is dropped from child. Then all direct ancestor copies should get
'ONLY' status so thet they can be dumped properly. 

Or perhaps just disallow dropping constraints from children, but still
allow creating constraints on ONLY parent table, which seems a cleaner
solution.

Why I also think that allowing this for FK's is good, is that it allows 
partitioning tables on both sides of FK relation even without global
UNIQUE/PK constraints, by partitioning both of them on same or
compatible boundaries. By compatible I mean here that we could partition
the PK table on say PK values with a step of 1M but FK table on values
with step of 100K, so that they both will have approximately the same
number of rows per partition and there will be 10 tables with FK
constraints pointing to the same PK partition.

What comes to Toms's May 20 argument that people would be surprised when
they select form a table whith check(i>0) constraint and get out i<0
then I think this is a question of education. 
If they don't know better, they will be equally surprised when changing
a row in parent table actually changes a row in child table, or when a
row inserted into child shows up in parent.

The ONLY status of constraints should be accounted for in pg_dump so it
will work right and also shown in psql's \d output so it confuses users
less :)

-- 
Hannu Krosing <hannu@tm.ee>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Hannu Krosing

Date:

15 July 2005, 05:12:43

Bruce: could you change this TODO item
 o Prevent child tables from altering constraints like CHECK that were   inherited from the parent table

to
 o Prevent new child tables from inheriting constraints defined on   parents with keyword ONLY:     alter table ONLY t
addconstraint c check(x=y);   Prevent child tables from altering constraints that were   inherited from the parent
table(i.e were defined without    ONLY/with ALL)
 

This is a less restrictive solution to the same problem.

On N, 2005-07-14 at 22:41 +0100, Simon Riggs wrote:
> On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:
> > On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote:
> > > On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:
> > > > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:
> > > 
> > > > > We aren't able to exclude the parent table from the above query because
> > > > > no Constraint was defined upon it. Since, in our example, the parent is
> > > > > empty there will be little effect on the query performance. It would be
> > > > > a mistake to attempt to get around this by placing a Constraint on the
> > > > > parent, since that would then automatically be created on all child
> > > > > tables also. So we can never exclude the parent without automatically
> > > > > excluding *all* of the children also.
> > > > 
> > > > At least in 8.0 you can drop the inherited constraint from child table:
> > > 
> > > Yes, you can. But in discussion on -hackers on 20 May there was
> > > agreement (OK, me agreeing with Tom) that being able to drop inherited
> > > constraints was a bug (or at least a deprecated feature...)
> > 
> > Why is it a bug ?
> > 
> > Tom wrote on May 20:
> > > I think a good argument can be made that the above behavior is a bug,
> > > and that the ALTER command should have been rejected.  We've gone to
> > > great lengths to make sure you can't ALTER a child table to make it
> > > incompatible with the parent in terms of the column names and types;
> > > shouldn't this be true of check constraints as well?
> > 
> > There is a good fundamental reason why we dont let people drop columns
> > from children or to add them to parent ONLY: if we did not, then there
> > would be no way to query from the hierarchy.
> > 
> > I can't see any such reason for forbidding dropping constraints from
> > child tables or disallowing CREATE CONSTRAINT ON parent ONLY , at least
> > not for CHECK constraints. 
> 
> If we allow DROPing them, why allow them to be inherited in the first
> place? One or the other, not both.
> 
> I do still agree with Tom on that, but as I said at the time, I don't
> see it as a big enough problem to spend time removing that feature. But
> I personally wouldn't grow to rely on its existence either.
> 
> There are other ways...
> 
> > And even disallowing it can probably be circumvented by a clever
> > functional CHECK constraint, which checks for also the table it is
> > defined on.
> > 
> > OTOH, disallowing all this for child FOREIGN KEY's seems logical though.
> > 
> > OTOOH, to be symmetrical with previous, we should also have UNIQUE and
> > PK constraints that span all the inheritance hierarchy, but I don't want
> > to go there now :) 
> 
> Well, I am working towards that.
> 
> Best Regards, Simon Riggs
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
-- 
Hannu Krosing <hannu@skype.net>

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Simon Riggs

Date:

15 July 2005, 09:35:59

On Fri, 2005-07-15 at 11:10 +0300, Hannu Krosing wrote:
> Bruce: could you change this TODO item
> 
>   o Prevent child tables from altering constraints like CHECK that were
>     inherited from the parent table
> 
> to
> 
>   o Prevent new child tables from inheriting constraints defined on
>     parents with keyword ONLY:
>       alter table ONLY t add constraint c check(x=y);
>     Prevent child tables from altering constraints that were
>     inherited from the parent table (i.e were defined without 
>     ONLY/with ALL)

My thought was to introduce a new type of constraint, ABSTRACT.

This would not be inherited (it is not a check constraint). It would
allow a table to be declared "will never contain rows".

This fits better with the ideas and common understandings of
inheritance.

Best Regards, Simon Riggs

Re: [Bizgres-general] A Guide to Constraint Exclusion

From

Stephan Szabo

Date:

15 July 2005, 11:59:24

On Fri, 15 Jul 2005, Hannu Krosing wrote:
> What comes to Toms's May 20 argument that people would be surprised when
> they select form a table whith check(i>0) constraint and get out i<0
> then I think this is a question of education.

I potentially disagree. What are we trying to model here? Systems which
allow children to have values that are not in the domain of the parent are
useful for some things, but they also potentially lose some other useful
properties like substitutability since a child isn't a parent (or more
specifically f(parent) cannot assume parent's invarients are valid on its
argument which may be a child unless they are applied to the child at call
time).

Re: A Guide to Constraint Exclusion (Partitioning)

From

Tom Lane

Date:

22 July 2005, 19:33:05

Simon Riggs <simon@2ndquadrant.com> writes:
> In summary, the CE feature will be a huge performance gain for
> qualifying queries against large tables in PostgreSQL databases.

BTW, before we spend too much time chasing an emperor that may have no
clothes, it's worth asking whether this feature is really going to buy
anything in the real world.  What is bothering me at the moment is the
thought that the partitioning key would normally be indexed within
each table, and that an indexscan that's off the end of the indexed
range is cheap.

For example, you write

> Now, if we run our example query again
>     SELECT sum(soldqty) FROM Sales_DateItemOutlet
>     WHERE DateKey between 20050101 and 20050101
> we find that the query will
>     Scan all rows in Sales_DateItemOutlet (which is empty)
>     Scan all rows in Sales_Jan_DateItemOutlet
>     Scan all rows in Sales_Feb_DateItemOutlet
>     Scan all rows in Sales_Mar_DateItemOutlet

but the "scan all rows" will only happen if no index is provided on
DateKey in the child tables.  Otherwise the planner will probably
select plans like this:
       ->  Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)
      Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101))

for each child table for which the statistics indicate that no rows are
likely to be selected.  This will fall through quite quickly in
practice, meaning that the "huge performance gain" from not doing it at
all is a bit oversold.  (Note that it's already true that each child
table is planned separately, so the plan for the partition that *is*
targeted by the query may be different.)

AFAICS, CE as proposed is only worth bothering with if the partitioning
key is something you would not want to create indexes on; which does not
strike me as a major use-case.

It'd be more attractive if the methodology allowed an inheritance or
union-all collection to be reduced to one single table (ie, CE gets rid
of all but one collection member) and then that could be planned as if
it were a primitive table entry (ie, no Append plan node).  This doesn't
help much for simple queries on the fact table but it is interesting for
join cases, because if the Append is in the way there's no way to handle
inner indexscan join plans.

Unfortunately the patch as proposed is a long way from being able to do
that, and given the current semantics of inherited constraints it's not
even remotely feasible, since as you note we can't put a constraint on
just the parent table.  We could maybe do it for UNION ALL views, but
the patch doesn't handle that case.

So at the moment I'm feeling a bit dubious about the real value.
        regards, tom lane

Re: A Guide to Constraint Exclusion (Partitioning)

From

"Luke Lonergan"

Date:

23 July 2005, 01:24:14

Tom,

On 7/22/05 3:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> but the "scan all rows" will only happen if no index is provided on
> DateKey in the child tables.  Otherwise the planner will probably
> select plans like this:
> 
>         ->  Index Scan using i1 on sales_jan_dateitemoutlet
> sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)

Good point.

> So at the moment I'm feeling a bit dubious about the real value.

What about the need to run DML against the partition master?  Partitioning
itself is a big deal to BI/DW users, and to make it really usable will
require that DML be automatically applied based on the mapping of partition
master to the partitions.  Does the CE strategy help with the implementation
of automated DML?

- Luke

Re: A Guide to Constraint Exclusion (Partitioning)

From

Simon Riggs

Date:

23 July 2005, 05:02:38

On Fri, 2005-07-22 at 18:32 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > In summary, the CE feature will be a huge performance gain for
> > qualifying queries against large tables in PostgreSQL databases.
> 
> BTW, before we spend too much time chasing an emperor that may have no
> clothes, it's worth asking whether this feature is really going to buy
> anything in the real world.  

Perfectly valid thing to ask...

> What is bothering me at the moment is the
> thought that the partitioning key would normally be indexed within
> each table, and that an indexscan that's off the end of the indexed
> range is cheap.
> 
> For example, you write
> 
> > Now, if we run our example query again
> >     SELECT sum(soldqty) FROM Sales_DateItemOutlet
> >     WHERE DateKey between 20050101 and 20050101
> > we find that the query will
> >     Scan all rows in Sales_DateItemOutlet (which is empty)
> >     Scan all rows in Sales_Jan_DateItemOutlet
> >     Scan all rows in Sales_Feb_DateItemOutlet
> >     Scan all rows in Sales_Mar_DateItemOutlet
> 
> but the "scan all rows" will only happen if no index is provided on
> DateKey in the child tables.  Otherwise the planner will probably
> select plans like this:
> 
>         ->  Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)
>                Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101))
> 
> for each child table for which the statistics indicate that no rows are
> likely to be selected.  This will fall through quite quickly in
> practice, meaning that the "huge performance gain" from not doing it at
> all is a bit oversold.  (Note that it's already true that each child
> table is planned separately, so the plan for the partition that *is*
> targeted by the query may be different.)
> 
> AFAICS, CE as proposed is only worth bothering with if the partitioning
> key is something you would not want to create indexes on; which does not
> strike me as a major use-case.

Yes you can do that now, at cost, if you have range partitioning and you
want indexes on every table on the partitioning key. That is a major use
case but there are important variations on that theme. (I actually want
to build upon that use case also, but more on that later).

It's very common to scan whole ranges of dates on a large table, so in
those cases you are really just maintaining the indexes for partitioning
purposes. On older data it may be desirable not to have lots of indexes,
or at least use their resources on the indexes they really do want.

Also, if you have a List partitioned table where all rows in that table
have a single value, then you maintain an index for no reason other than
partitioning. Thats an expensive waste. 

Simply put, adding a constraint is faster and cheaper than adding an
pointless index. CE gives people that option.

Having constraints also allows (eventually) for us to have mutually
exclusive constraints. That then allows a more optimal scan of
constraints as well as some more advanced possibilities. How would we
implement mutual exclusivity? Use the proving logic for CE, of course.

In my experience, there is benefit from combined Range and List
partitioning and that is a frequent design choice. That makes it harder
to select which indexes to have: a single two-key index or 2 one-key
indexes.

A later objective is to have read-only tables on various kinds of media
to allow occasional access. Partition-wise joins are also an eventual
goal.

The total feature set of partitioning is fairly large. The technique
you've described is possible on most DBMS, yet almost all now or will
shortly support Partitioning also. Many things are possible once the
right declarative structures are created for the optimizer.

CE is the basis on which to hang other forthcoming features; I feel
happy that I got CE done in time for the 8.1 freeze. More things are to
come. The emperor's clothes are all hand-made, one stitch at a time.

> It'd be more attractive if the methodology allowed an inheritance or
> union-all collection to be reduced to one single table (ie, CE gets rid
> of all but one collection member) and then that could be planned as if
> it were a primitive table entry (ie, no Append plan node).  This doesn't
> help much for simple queries on the fact table but it is interesting for
> join cases, because if the Append is in the way there's no way to handle
> inner indexscan join plans.

That's been discussed on -hackers recently.

I have that planned and am half-way through implementing it, but I
wanted to get CE accepted first.

> Unfortunately the patch as proposed is a long way from being able to do
> that

Well, I think its *very* close.

> and given the current semantics of inherited constraints it's not
> even remotely feasible, since as you note we can't put a constraint on
> just the parent table.  

That's the bit that was discussed on -hackers. I suggested the use of
the keyword ABSTRACT to denote a table that could be sub-classed but
that would not allow any rows to be inserted into it. Such a table could
be immediately excluded from any query, then we can do get rid of the
Append...

> We could maybe do it for UNION ALL views, but
> the patch doesn't handle that case.

Yet. I've discussed a simple implementation for that on -hackers based
on feedback from the initial patch.

Best Regards, Simon Riggs

Re: A Guide to Constraint Exclusion (Partitioning)

From

Greg Stark

Date:

23 July 2005, 13:38:18

Simon Riggs <simon@2ndquadrant.com> writes:

> It's very common to scan whole ranges of dates on a large table, so in
> those cases you are really just maintaining the indexes for partitioning
> purposes. On older data it may be desirable not to have lots of indexes,
> or at least use their resources on the indexes they really do want.
> 
> Also, if you have a List partitioned table where all rows in that table
> have a single value, then you maintain an index for no reason other than
> partitioning. Thats an expensive waste. 
> 
> Simply put, adding a constraint is faster and cheaper than adding an
> pointless index. CE gives people that option.

Note also that the index is only useful if the index is *being used*. And
index scans are much slower than sequential scans.

So a query like "select * from invoices where fiscal_year = ?" is best
implemented by doing a sequential scan across invoices_fy05. This is *much*
faster than using indexes even if the indexes manage to speed up the empty
partitions simply because an index scan across the full partition would be so
much slower than a sequential scan.

-- 
greg

Re: A Guide to Constraint Exclusion (Partitioning)

From

Tom Lane

Date:

23 July 2005, 14:32:16

Greg Stark <gsstark@mit.edu> writes:
> Note also that the index is only useful if the index is *being used*. And
> index scans are much slower than sequential scans.

You miss my point entirely: an indexscan that hasn't got to retrieve any
rows (because it has a constraint that points off the end of the index
range) is extremely fast, and the planner will reliably detect that and
use the index scan over a seqscan (assuming it has statistics showing
the range of indexed values).  And this decision is made separately for
each child table, so the fact that a seqscan might be the best bet for
the target partition doesn't stop the planner from using the indexscan
in other partitions.

However, Simon made a fair argument that there are useful cases where
you don't need an index on a partitioning key, so my objection is
answered.
        regards, tom lane

Re: A Guide to Constraint Exclusion (Partitioning)

From

Greg Stark

Date:

23 July 2005, 15:27:43

Tom Lane <tgl@sss.pgh.pa.us> writes:

> And this decision is made separately for each child table, so the fact that
> a seqscan might be the best bet for the target partition doesn't stop the
> planner from using the indexscan in other partitions.

That was the detail I was missing. I'm surprised because I actually tested
this before I sent the message and saw a plan like this with a single
sequential scan node despite the three child tables:

staging=> explain select * from _test where a=1;                     QUERY PLAN                      
------------------------------------------------------Seq Scan on _test  (cost=0.00..22.50 rows=5 width=4)  Filter: (a
=1)
 
(2 rows)


[This is on 7.4, maybe the 8.0 plans are more explicit though I don't recall
any mention of changes in that area]

-- 
greg

Re: A Guide to Constraint Exclusion (Partitioning)

From

Tom Lane

Date:

23 July 2005, 18:13:24

Greg Stark <gsstark@mit.edu> writes:
> That was the detail I was missing. I'm surprised because I actually tested
> this before I sent the message and saw a plan like this with a single
> sequential scan node despite the three child tables:

> staging=> explain select * from _test where a=1;
>                       QUERY PLAN                      
> ------------------------------------------------------
>  Seq Scan on _test  (cost=0.00..22.50 rows=5 width=4)
>    Filter: (a = 1)
> (2 rows)

Uh, maybe you have sql_inheritance turned off?  Every version I can
remember would show you a pretty explicit Append plan for inheritance
scans...
        regards, tom lane

Re: A Guide to Constraint Exclusion (Partitioning)

From

Ron Mayer

Date:

24 July 2005, 10:41:28

Simon Riggs wrote:
> in those cases you are really just maintaining the indexes for partitioning
> purposes. On older data it may be desirable not to have lots of indexes,
> or at least use their resources on the indexes they really do want.
> 
> Also, if you have a List partitioned table where all rows in that table
> have a single value, then you maintain an index for no reason other than
> partitioning. Thats an expensive waste. 
> 
> Simply put, adding a constraint is faster and cheaper than adding an
> pointless index. CE gives people that option.

It seems with a partial index that whose partial index condition
specifies a range outside that of a partition would make the expense
much cheaper.

For example, if I created a couple partial indexes
   ON sales_2005(year) WHERE year<2005   ON sales_2005(year) WHERE year>2005

I would think it would be a very cheap index to maintain
(they'd be very small because they're empty, I'd think)
and give many of the same benefits for excluding tables
as a non-partial index on year would have given.
   Ron

I like the other features Simon mentioned, though, that sound like
they're based on these constraints.

Re: A Guide to Constraint Exclusion (Partitioning)

From

Simon Riggs

Date:

24 July 2005, 12:44:19

On Sun, 2005-07-24 at 06:44 -0700, Ron Mayer wrote:
> Simon Riggs wrote:
> > in those cases you are really just maintaining the indexes for partitioning
> > purposes. On older data it may be desirable not to have lots of indexes,
> > or at least use their resources on the indexes they really do want.
> > 
> > Also, if you have a List partitioned table where all rows in that table
> > have a single value, then you maintain an index for no reason other than
> > partitioning. Thats an expensive waste. 
> > 
> > Simply put, adding a constraint is faster and cheaper than adding an
> > pointless index. CE gives people that option.
> 
> 
> It seems with a partial index that whose partial index condition
> specifies a range outside that of a partition would make the expense
> much cheaper.
> 
> For example, if I created a couple partial indexes
> 
>     ON sales_2005(year) WHERE year<2005
>     ON sales_2005(year) WHERE year>2005
> 
> I would think it would be a very cheap index to maintain
> (they'd be very small because they're empty, I'd think)
> and give many of the same benefits for excluding tables
> as a non-partial index on year would have given.

The indexes would only be empty if you could guarantee that no rows were
inserted that actually did match the index criteria. How would you do
that? Check Constraints. So you'd have a more complex arrangement than
if you used CE.

Internally, CE uses very similar logic to that used by partial indexes
and they actually share sections of code.

The logic is reversed however, so you are looking for tables to provably
exclude and that gives different results from the positive-proof case.
That means that partial indexes wouldn't be able to be used in all cases
and you'd end up with sequential scans: exactly what we are trying to
avoid.

...Check Constraints are very, very cheap to maintain.

Best Regards, Simon Riggs

Re: A Guide to Constraint Exclusion (Partitioning)

From

Greg Stark

Date:

24 July 2005, 16:00:52

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Uh, maybe you have sql_inheritance turned off?  Every version I can
> remember would show you a pretty explicit Append plan for inheritance
> scans...

staging=> show sql_inheritance;sql_inheritance 
-----------------on
(1 row)

Maybe I'm doing something more obvious wrong?
 db=> create table _test (a integer); CREATE TABLE
 db=> create table _test2 (like _test); CREATE TABLE
 db=> create table _test3 (like _test); CREATE TABLE
 db=> create index _i on _test2 (a); CREATE INDEX
 db=> explain select * from _test where a=1;                       QUERY PLAN
------------------------------------------------------ Seq Scan on _test  (cost=0.00..22.50 rows=5 width=4)    Filter:
(a= 1) (2 rows)
 

-- 
greg

Re: A Guide to Constraint Exclusion (Partitioning)

From

Simon Riggs

Date:

24 July 2005, 16:55:54

On Sun, 2005-07-24 at 15:00 -0400, Greg Stark wrote:

> Maybe I'm doing something more obvious wrong?
> 
>   db=> create table _test (a integer);
>   CREATE TABLE
> 
>   db=> create table _test2 (like _test);
>   CREATE TABLE

Yes, unfortunately. You need the phrase "INHERITS (_test)" after the
bracket and before the ; for any tables you want to be children of
_test, such as _test2. The syntax "like..." creates a similar table, but
with no linkage between parent and children.

Best Regards, Simon Riggs