Thread: Partition Constraint Exclusion Limits

Partition Constraint Exclusion Limits

From

GMail

Date:

27 October 2015, 21:29:45

I have partitioned a large table in my PG database (6.7 billion rows!) by a date column and in general constraint
exclusionworks well but only in relatively simple case when the partition key is specified exactly as created in the
CHECKconstraint.  I'm curious if there is a way to get it to work a little more generally though. 

For example my CHECK constraint (see code below) specifying a hard-coded field value works well (#1 and #2).
Specifyinga function that returns a value even though it is the appropriate type scans all of the partitions (#3)
unfortunately. Likewise any join, CTE, or sub-query expression, even for a single row that returns the correct type
alsoresults in a scan of all of the partitions.   

I was curious if there was a way specifically to get #3 to work as the WHERE predicate in this case is stored as an
integerbut the table itself is partitioned by the appropriate date type.  I believe I could work around this issue with
dynamicsql in a function but there are lots of cases of this type of simple conversion and I wanted to avoid the
maintenanceof creating a function per query. 

It's also slightly surprising that queries that join with the appropriate type (#4 & #5) also cause a full partition
scan. Is there a work-around to get constraint_exclusion to work in this case? 

</snip>
-- constraint exclusion tests
-- generate some data
create schema if not exists ptest;
set search_path=ptest;
drop table if exists ptest.tbl cascade;
create table if not exists tbl as select * from (
with a as (
    select
        generate_series('2014-01-01'::date, now(), '1 day'::interval)::date dt
),
b as (
    select
        generate_series(1, 1000) i
)
select
    a.dt,
    b.i,
    md5((random()*4+5)::text) str
from
    a cross join b
) c;

-- create child partitions
create table ptest.tbl_p2014(check (dt >= '2014-01-01'::date and dt < '2015-01-01'::date)) inherits (ptest.tbl);
create table ptest.tbl_p2015(check (dt >= '2015-01-01'::date and dt < '2016-01-01'::date)) inherits (ptest.tbl);

-- populate child partitions
with pd as ( delete from only ptest.tbl where dt >= '2014-01-01'::date and dt < '2015-01-01'::date returning *)
insert into ptest.tbl_p2014 select * from pd;
with pd as ( delete from only ptest.tbl where dt >= '2015-01-01'::date and dt < '2016-01-01'::date returning *)
insert into ptest.tbl_p2015 select * from pd;

-- clean parent of any data
truncate table only ptest.tbl;

-- create dt field indexes
create index i_tbl_dt on ptest.tbl(dt);
create index i_tbl_dt_p2014 on ptest.tbl_p2014(dt);
create index i_tble_dt_p2015 on ptest.tbl_p2015(dt);

-- vacuum
vacuum analyze verbose ptest.tbl;

-- verify parent is empty and partitions have some data (estimated)
select relname, n_live_tup from pg_stat_user_tables where relname like 'tbl%' and schemaname = 'ptest' order by
relname;

-- check that partitions show in parent
\d+ ptest.tbl

-- force constraint_exclusion to partition
set constraint_exclusion = partition;

-- #1: works
explain analyze select count(1) from ptest.tbl where dt = '2014-06-01'::date;

-- #2: works
explain analyze select count(1) from ptest.tbl where dt = DATE '2014-06-01';

-- #3: full scan (no constraint exclusion)
explain analyze select count(1) from ptest.tbl where dt = to_date(201406::text||01::text, 'YYYYMMDD');

-- #4: full scan (no constraint exclusion)
explain analyze select count(1) from ptest.tbl where dt = (select '2014-06-01'::date);

-- #5: full scan (no constraint exclusion)
explain analyze with foo as (select '2014-06-01'::date dt)
select count(1) from ptest.tbl inner join foo on (ptest.tbl.dt = foo.dt);

</snip>

Re: Partition Constraint Exclusion Limits

From

"David G. Johnston"

Date:

27 October 2015, 22:03:10

On Tue, Oct 27, 2015 at 2:29 PM, GMail <mfwilson@gmail.com> wrote:

I have partitioned a large table in my PG database (6.7 billion rows!) by a date column and in general constraint exclusion works well but only in relatively simple case when the partition key is specified exactly as created in the CHECK constraint. I'm curious if there is a way to get it to work a little more generally though.

For example my CHECK constraint (see code below) specifying a hard-coded field value works well (#1 and #2). Specifying a function that returns a value even though it is the appropriate type scans all of the partitions (#3) unfortunately. Likewise any join, CTE, or sub-query expression, even for a single row that returns the correct type also results in a scan of all of the partitions.

I was curious if there was a way specifically to get #3 to work as the WHERE predicate in this case is stored as an integer but the table itself is partitioned by the appropriate date type. I believe I could work around this issue with dynamic sql in a function but there are lots of cases of this type of simple conversion and I wanted to avoid the maintenance of creating a function per query.

Short answer, no.

The planner has the responsibility for performing constraint exclusion and it only has access to constants during its evaluation. It has no clue what kind of transformations a function might do. Various other optimizations are indeed possible but are not presently performed.

So, #3 (

to_date(201406::text||01::text, 'YYYYMMDD');

)

is down-right impossible given the present architecture

; and likely any future architecture.

With #4 (

explain analyze select count(1) from ptest.tbl where dt = (select '2014-06-01'::date);

)

in theory the re-write module could recognize and re-write this remove the sub-select.

But likely real-life is not so simple otherwise the query writer likely would have simply done is directly themself.

In a partitioning scheme the partitioning data has to be injected into the query explicitly so that it is already in place before the planner receives the query. Anything within the query requiring "execution" is handled by the executor and at that point the chance to exclude partitions has come and gone.

David J.

Re: Partition Constraint Exclusion Limits

From

Vitalii Tymchyshyn

Date:

27 October 2015, 23:33:30

BTW: May be it could be feasible in future to perform partition exclusion during the execution? This would be very neat feature.

Regards, Vitalii Tymchyshyn

Вт, 27 жовт. 2015 15:03 David G. Johnston <david.g.johnston@gmail.com> пише:

On Tue, Oct 27, 2015 at 2:29 PM, GMail <mfwilson@gmail.com> wrote:
I have partitioned a large table in my PG database (6.7 billion rows!) by a date column and in general constraint exclusion works well but only in relatively simple case when the partition key is specified exactly as created in the CHECK constraint. I'm curious if there is a way to get it to work a little more generally though.

For example my CHECK constraint (see code below) specifying a hard-coded field value works well (#1 and #2). Specifying a function that returns a value even though it is the appropriate type scans all of the partitions (#3) unfortunately. Likewise any join, CTE, or sub-query expression, even for a single row that returns the correct type also results in a scan of all of the partitions.

I was curious if there was a way specifically to get #3 to work as the WHERE predicate in this case is stored as an integer but the table itself is partitioned by the appropriate date type. I believe I could work around this issue with dynamic sql in a function but there are lots of cases of this type of simple conversion and I wanted to avoid the maintenance of creating a function per query.

Short answer, no.

The planner has the responsibility for performing constraint exclusion and it only has access to constants during its evaluation. It has no clue what kind of transformations a function might do. Various other optimizations are indeed possible but are not presently performed.

So, #3 (
to_date(201406::text||01::text, 'YYYYMMDD');
)
is down-right impossible given the present architecture
; and likely any future architecture.

With #4 (
explain analyze select count(1) from ptest.tbl where dt = (select '2014-06-01'::date);
)
in theory the re-write module could recognize and re-write this remove the sub-select.
But likely real-life is not so simple otherwise the query writer likely would have simply done is directly themself.

In a partitioning scheme the partitioning data has to be injected into the query explicitly so that it is already in place before the planner receives the query. Anything within the query requiring "execution" is handled by the executor and at that point the chance to exclude partitions has come and gone.

David J.

Re: Partition Constraint Exclusion Limits

From

Jim Nasby

Date:

28 October 2015, 05:59:33

On 10/27/15 3:33 PM, Vitalii Tymchyshyn wrote:
> BTW: May be it could be feasible in future to perform partition
> exclusion during the execution? This would be very neat feature.

True exclusion? probably not. The problem is you can't completely
exclude something based on any value that could change during execution.

There has been some work done on declarative partition specification,
where a given value would be fit to the exact partition it belong in.
IIRC that's currently stalled though.

One thing you could try would be to create an index on each partition
that would always be empty. IE, if you have a June 2015 partition, you
could:

CREATE INDEX ... ON( date_field ) WHERE date_field < '2015-6-1'::date OR
date_field >= '2015-7-1'::date;

Because the WHERE clause will never be true, that index will always be
empty, which will make probing it very fast. I suspect that might be
faster than probing a regular index on the date field, but you should
test it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com