Thread: transitive pruning optimization on the right side of a join for partition tables

transitive pruning optimization on the right side of a join for partition tables

From

"Waldo, Ethan"

Date:

30 September 2012, 09:31:59

I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key used to limit the query results is the joined foreign key rather than the primary key on the fact table where the check constraint lives.

For example, here you can see the constraint on one of the child tables: (queries and results have been slightly altered for better readability)
     Table "public.myfact_y2004w51"
id                                | bigint             | not null
recorded_on_id          | integer           | not null
Indexes:
    "myfact_y2004w51_pkey" PRIMARY KEY, btree (id)
    "myfact_y2004w51_recorded_on_id" btree (recorded_on_id)
Check constraints:
    "myfact_y2004w51_recorded_on_id_check" CHECK (recorded_on_id >= 1812 AND recorded_on_id <= 1818)
Inherits: myfact

This query does a sequence scan and append across all the partition tables:
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');

Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the ids in the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN):
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');

Once again I reiterate that I don't have control over the query construction and I am currently running postgresql 9.1.5. My question is, does postgresql support transitive pruning optimization on the right side of a join for partition tables? If so, how do I get that to work? If not, are there plans for this and when should a release with this feature be expected?

Re: transitive pruning optimization on the right side of a join for partition tables

From

Alban Hertroys

Date:

30 September 2012, 12:51:32

On 30 Sep 2012, at 8:36, Waldo, Ethan wrote:

> I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table
partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key
usedto limit the query results is the joined foreign key rather than the primary key on the fact table where the check
constraintlives. 

Don't be too hasty making assumptions there! ;)

> This query does a sequence scan and append across all the partition tables:
> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" =
"dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623',
'4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638',
'4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); 
>
> Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the
idsin the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN): 
> select "dates"."date_description" FROM "myfact" as
> "myfact", "dates" as "dates" where
> "myfact"."recorded_on_id" = "dates"."recorded_on_id" and
> "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621',
> '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630',
> '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639',
> '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');

What does EXPLAIN ANALYSE for these queries show? (might take a while, it performs the actual query)
My guess is that there will be a large difference in selectivity between both tables for those ID's.

Those id's, seeing that you're partitioning on them and they're in a year/week table, do those numbers have some
meaning?Or is it perhaps just a daily increment that happens to have some sort of correlation to the date? Without more
information,it seems a peculiar column to use for partitioning. 

It's possible that the issue here is just related to planner statistics, but it's also possible that it's necessary to
changeyour partitioning to aid your BI tools. 
Another possibility is submitting a case with the people behind that BI software.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

Re: transitive pruning optimization on the right side of a join for partition tables

From

"Waldo, Ethan"

Date:

30 September 2012, 13:12:18

I don't need help troubleshooting the problem.  I know exactly what the problem is.  Either you know if postgresql
supportstransitive pruning optimization on the right side of a join via a check constraint defined on a table partition
oryou don't.  If you do know, I'd appreciate knowledgeable input in that regard.  Thank you for your response. 

----- Original Message -----
From: "Alban Hertroys" <haramrae@gmail.com>
Sent: Sun, 9/30/2012 4:51am
To: "Waldo, Ethan" <ewaldo@healthetechs.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables

On 30 Sep 2012, at 8:36, Waldo, Ethan wrote:

> I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table
partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key
usedto limit the query results is the joined foreign key rather than the primary key on the fact table where the check
constraintlives.  

Don't be too hasty making assumptions there! ;)

> This query does a sequence scan and append across all the partition tables:
> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" =
"dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623',
'4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638',
'4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');  
>
> Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the
idsin the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN):  
> select "dates"."date_description" FROM "myfact" as
> "myfact", "dates" as "dates" where
> "myfact"."recorded_on_id" = "dates"."recorded_on_id" and
> "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621',
> '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630',
> '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639',
> '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');

What does EXPLAIN ANALYSE for these queries show? (might take a while, it performs the actual query)
My guess is that there will be a large difference in selectivity between both tables for those ID's.

Those id's, seeing that you're partitioning on them and they're in a year/week table, do those numbers have some
meaning?Or is it perhaps just a daily increment that happens to have some sort of correlation to the date? Without more
information,it seems a peculiar column to use for partitioning.  

It's possible that the issue here is just related to planner statistics, but it's also possible that it's necessary to
changeyour partitioning to aid your BI tools.  
Another possibility is submitting a case with the people behind that BI software.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

Re: transitive pruning optimization on the right side of a join for partition tables

From

Ondrej Ivanič

Date:

30 September 2012, 13:31:55

Hi,

On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote:
> Once again I reiterate that I don't have control over the query construction
> and I am currently running postgresql 9.1.5.  My question is, does
> postgresql support transitive pruning optimization on the right side of a
> join for partition tables?  If so, how do I get that to work?  If not, are
> there plans for this and when should a release with this feature be
> expected?

No, postgres is not smart enough. You need to use same condition
("dates"."recorded_on_id" = ...) for myfact table
("myfact"."recorded_on_id" = ) but you do not have control over the
query construction...


--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

Ondrej Ivanič

Date:

30 September 2012, 13:40:44

Hi,

On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote:
> My question is, does
> postgresql support transitive pruning optimization on the right side of a
> join for partition tables?  If so, how do I get that to work?  If not, are
> there plans for this and when should a release with this feature be
> expected?

Few guys implemented this using Postgres (8.3) and published their results:

Join Optimization Techniques for Partitioned Tables

ABSTRACT
Table partitioning splits a table into smaller parts that can be
accessed, stored, and maintained independent of one an- other. The
main use of partitioning used to be in reduc- ing the time to access
large base tables in parallel systems. Partitioning has evolved into a
powerful mechanism to im- prove the overall manageability of both
centralized and par- allel database systems. Partitioning simplifies
administra- tive tasks like data loading, removal, backup, statistics
main- tenance, and storage provisioning. More importantly, SQL
extensions and MapReduce frameworks now enable applica- tions and user
queries to specify how derived tables should be partitioned. However,
query optimization techniques have not kept pace with the rapid
advances in usage and user con- trol of table partitioning. We address
this gap by developing new techniques to generate efficient plans for
SQL queries involving multiway joins over partitioned tables. Our
tech- niques are designed for easy incorporation into bottom-up query
optimizers in centralized and parallel database sys- tems. We have
prototyped these techniques in PostgreSQL and in a parallel database
system composed of PostgreSQL nodes managed by Hadoop. An extensive
evaluation shows that our partition-aware optimization techniques,
with low overhead, generate plans that are often an order of magni-
tude better than plans produced by current optimizers.

8. CONCLUSION
Query optimization technology has not kept pace with the growing usage
and user control over table partitioning. We addressed this gap by
developing new partition-aware optimization techniques to generate
efficient plans for SQL queries. We made the following contributions:
• Our new techniques are designed for easy incorporation into
bottom-up query optimizers for both centralized and parallel systems.
• We have prototyped these techniques in PostgreSQL and in a parallel
shared-nothing database system composed of PostgreSQL nodes managed by
Hadoop.
• An extensive evaluation showed that our optimizer, with low
optimization-time overhead, generates plans that are often an order of
magnitude better than plans produced by current optimizers.

www.cs.duke.edu/~shivnath/tmp/paqo_draft.pdf

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

"Waldo, Ethan"

Date:

30 September 2012, 13:55:17

Yeah, I actually saw that paper but couldn't find a date on it.  Currently their techniques are well outside of the
scopeof my current problem particularly in consideration that I could switch to MySQL which does support the right side
joinpruning.  I figured if MySQL can do it, there might be a good chance Postgres can too or will soon. 

----- Original Message -----
From: "Ondrej Ivanič" <ondrej.ivanic@gmail.com>
Sent: Sun, 9/30/2012 5:40am
To: "Waldo, Ethan" <ewaldo@healthetechs.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables

Hi,

On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote:
> My question is, does
> postgresql support transitive pruning optimization on the right side of a
> join for partition tables?  If so, how do I get that to work?  If not, are
> there plans for this and when should a release with this feature be
> expected?

Few guys implemented this using Postgres (8.3) and published their results:

Join Optimization Techniques for Partitioned Tables

ABSTRACT
Table partitioning splits a table into smaller parts that can be
accessed, stored, and maintained independent of one an- other. The
main use of partitioning used to be in reduc- ing the time to access
large base tables in parallel systems. Partitioning has evolved into a
powerful mechanism to im- prove the overall manageability of both
centralized and par- allel database systems. Partitioning simplifies
administra- tive tasks like data loading, removal, backup, statistics
main- tenance, and storage provisioning. More importantly, SQL
extensions and MapReduce frameworks now enable applica- tions and user
queries to specify how derived tables should be partitioned. However,
query optimization techniques have not kept pace with the rapid
advances in usage and user con- trol of table partitioning. We address
this gap by developing new techniques to generate efficient plans for
SQL queries involving multiway joins over partitioned tables. Our
tech- niques are designed for easy incorporation into bottom-up query
optimizers in centralized and parallel database sys- tems. We have
prototyped these techniques in PostgreSQL and in a parallel database
system composed of PostgreSQL nodes managed by Hadoop. An extensive
evaluation shows that our partition-aware optimization techniques,
with low overhead, generate plans that are often an order of magni-
tude better than plans produced by current optimizers.

8. CONCLUSION
Query optimization technology has not kept pace with the growing usage
and user control over table partitioning. We addressed this gap by
developing new partition-aware optimization techniques to generate
efficient plans for SQL queries. We made the following contributions:
• Our new techniques are designed for easy incorporation into
bottom-up query optimizers for both centralized and parallel systems.
• We have prototyped these techniques in PostgreSQL and in a parallel
shared-nothing database system composed of PostgreSQL nodes managed by
Hadoop.
• An extensive evaluation showed that our optimizer, with low
optimization-time overhead, generates plans that are often an order of
magnitude better than plans produced by current optimizers.
 www.cs.duke.edu/~shivnath/tmp/paqo_draft.pdf

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

Ondrej Ivanič

Date:

30 September 2012, 14:24:11

Hi,

On 30 September 2012 21:00, Waldo, Ethan <ewaldo@healthetechs.com> wrote:
> Yeah, I actually saw that paper but couldn't find a date on it.  Currently their techniques are well outside
> of the scope of my current problem particularly in consideration that I could switch to MySQL which does support
> the right side join pruning.  I figured if MySQL can do it, there might be a good chance Postgres can too or will
soon.

Google says 2010 (Google Scholar). I was in the same position 2 years
ago but I was able to change / optimise our BI solution. Good to know
that MySQL can do this.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

Tom Lane

Date:

30 September 2012, 18:14:36

"Waldo, Ethan" <ewaldo@healthetechs.com> writes:
> This query does a sequence scan and append across all the partition tables:
> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" =
"dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623',
'4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638',
'4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); 

When I try that in 9.1, I get a plan with inner indexscans for each
child table; which, while not exactly what you're asking for, should
perform well enough when the fact table is large enough that
partitioning is actually a useful activity.

I suspect you're committing one of the ten deadly sins of Postgres
optimization, which is to assume that the plan you get on a toy test
case is the same plan you'd get for monster tables.  Planning choices
are nonlinear.

            regards, tom lane

Re: transitive pruning optimization on the right side of a join for partition tables

From

Ondrej Ivanič

Date:

01 October 2012, 04:35:55

Hi,

On 1 October 2012 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Waldo, Ethan" <ewaldo@healthetechs.com> writes:
>> This query does a sequence scan and append across all the partition tables:
>> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" =
"dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623',
'4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638',
'4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); 
>
> When I try that in 9.1, I get a plan with inner indexscans for each
> child table; which, while not exactly what you're asking for, should
> perform well enough when the fact table is large enough that
> partitioning is actually a useful activity.

I do not have 9.1 handy but this is from 9.0.4:

db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Hash Join  (cost=33.46..86124.15 rows=2858831 width=16)
   Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id)
   ->  Append  (cost=0.00..46245.73 rows=3001773 width=8)
         ->  Seq Scan on myfact  (cost=0.00..27.70 rows=1770 width=8)
         ->  Seq Scan on myfact_y2004w51 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
         ->  Seq Scan on myfact_y2004w52 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
         ->  Seq Scan on myfact_y2004w53 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
   ->  Hash  (cost=33.21..33.21 rows=20 width=16)
         ->  Append  (cost=0.00..33.21 rows=20 width=16)
               ->  Seq Scan on dates  (cost=0.00..32.12 rows=18 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Seq Scan on dates_y2004w51 dates  (cost=0.00..1.09
rows=2 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
(13 rows)

Postgres can't infer that myfact.recorded_on_id is in (1813, 1814)
from the join condition (myfact.recorded_on_id = dates.recorded_on_id)
hence all partitons are included (myfact_y2004w51, myfact_y2004w53,
myfact_y2004w53). Adding "myfact.recorded_on_id in (1813, 1814)"
creates much better plan:

db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in
(1813, 1814);
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=33.46..18878.72 rows=296673 width=16)
   Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id)
   ->  Append  (cost=0.00..14710.38 rows=311507 width=8)
         ->  Seq Scan on myfact  (cost=0.00..32.12 rows=18 width=8)
               Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
         ->  Bitmap Heap Scan on myfact_y2004w51 myfact
(cost=5378.64..14678.25 rows=311489 width=8)
               Recheck Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Bitmap Index Scan on myfact_y2004w51_recorded_on_id
 (cost=0.00..5300.77 rows=311489 width=0)
                     Index Cond: (recorded_on_id = ANY
('{1813,1814}'::bigint[]))
   ->  Hash  (cost=33.21..33.21 rows=20 width=16)
         ->  Append  (cost=0.00..33.21 rows=20 width=16)
               ->  Seq Scan on dates  (cost=0.00..32.12 rows=18 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Seq Scan on dates_y2004w51 dates  (cost=0.00..1.09
rows=2 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
(15 rows)


--------------
create table myfact (
    id bigint not null,
    recorded_on_id bigint not null
);

create table myfact_y2004w51 (
    check (recorded_on_id >= 1812 and recorded_on_id <= 1818),
    primary key (id)
) inherits (myfact);
create table myfact_y2004w52 (
    check (recorded_on_id >= 1819 and recorded_on_id <= 1825),
    primary key (id)
) inherits (myfact);
create table myfact_y2004w53 (
    check (recorded_on_id >= 1826 and recorded_on_id <= 1832),
    primary key (id)
) inherits (myfact);

create table dates (
    datetime timestamp without time zone not null,
    recorded_on_id bigint not null
);

create table dates_y2004w51 (
    check (recorded_on_id >= 1812 and recorded_on_id <= 1818),
    primary key (datetime)
) inherits (dates);
create table dates_y2004w52 (
    check (recorded_on_id >= 1819 and recorded_on_id <= 1825),
    primary key (datetime)
) inherits (dates);
create table dates_y2004w53 (
    check (recorded_on_id >= 1826 and recorded_on_id <= 1832),
    primary key (datetime)
) inherits (dates);

insert into myfact_y2004w51 select s.i, (random() * 6 + 1812)::integer
from generate_series(0, 1000000, 1) as s(i);
insert into myfact_y2004w52 select s.i, (random() * 6 + 1819)::integer
from generate_series(0, 1000000, 1) as s(i);
insert into myfact_y2004w53 select s.i, (random() * 6 + 1826)::integer
from generate_series(0, 1000000, 1) as s(i);

insert into dates_y2004w51 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w51;
insert into dates_y2004w52 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w52;
insert into dates_y2004w53 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w53;

create index myfact_y2004w51_recorded_on_id on myfact_y2004w51(recorded_on_id);
create index myfact_y2004w52_recorded_on_id on myfact_y2004w52(recorded_on_id);
create index myfact_y2004w53_recorded_on_id on myfact_y2004w53(recorded_on_id);

create index dates_y2004w51_recorded_on_id on dates_y2004w51(recorded_on_id);
create index dates_y2004w52_recorded_on_id on dates_y2004w52(recorded_on_id);
create index dates_y2004w53_recorded_on_id on dates_y2004w53(recorded_on_id);

ANALYZE;ANALYZE;ANALYZE;ANALYZE;ANALYZE;

EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814);
EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in
(1813, 1814);
----------------------

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

"Waldo, Ethan"

Date:

01 October 2012, 06:25:24

Thank you very much for confirming my query results and taking the time to iterate out a test environment and document
insuch detail.  It would be really awesome if the Postgres query plans could infer check constraints from the where
clause'sforeign key usage in addition to the native key usage.  Hopefully the postgres team will take an interest in
thiscapability. 

----- Original Message -----
From: "Ondrej Ivanič" <ondrej.ivanic@gmail.com>
Sent: Sun, 9/30/2012 8:35pm
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: "Waldo, Ethan" <ewaldo@healthetechs.com> ; pgsql-general@postgresql.org
Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables

Hi,

On 1 October 2012 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Waldo, Ethan" <ewaldo@healthetechs.com> writes:
>> This query does a sequence scan and append across all the partition tables:
>> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" =
"dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623',
'4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638',
'4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); 
>
> When I try that in 9.1, I get a plan with inner indexscans for each
> child table; which, while not exactly what you're asking for, should
> perform well enough when the fact table is large enough that
> partitioning is actually a useful activity.

I do not have 9.1 handy but this is from 9.0.4:

db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Hash Join  (cost=33.46..86124.15 rows=2858831 width=16)
   Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id)
   ->  Append  (cost=0.00..46245.73 rows=3001773 width=8)
         ->  Seq Scan on myfact  (cost=0.00..27.70 rows=1770 width=8)
         ->  Seq Scan on myfact_y2004w51 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
         ->  Seq Scan on myfact_y2004w52 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
         ->  Seq Scan on myfact_y2004w53 myfact  (cost=0.00..15406.01
rows=1000001 width=8)
   ->  Hash  (cost=33.21..33.21 rows=20 width=16)
         ->  Append  (cost=0.00..33.21 rows=20 width=16)
               ->  Seq Scan on dates  (cost=0.00..32.12 rows=18 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Seq Scan on dates_y2004w51 dates  (cost=0.00..1.09
rows=2 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
(13 rows)

Postgres can't infer that myfact.recorded_on_id is in (1813, 1814)
from the join condition (myfact.recorded_on_id = dates.recorded_on_id)
hence all partitons are included (myfact_y2004w51, myfact_y2004w53,
myfact_y2004w53). Adding "myfact.recorded_on_id in (1813, 1814)"
creates much better plan:

db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in
(1813, 1814);
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=33.46..18878.72 rows=296673 width=16)
   Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id)
   ->  Append  (cost=0.00..14710.38 rows=311507 width=8)
         ->  Seq Scan on myfact  (cost=0.00..32.12 rows=18 width=8)
               Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
         ->  Bitmap Heap Scan on myfact_y2004w51 myfact
(cost=5378.64..14678.25 rows=311489 width=8)
               Recheck Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Bitmap Index Scan on myfact_y2004w51_recorded_on_id
 (cost=0.00..5300.77 rows=311489 width=0)
                     Index Cond: (recorded_on_id = ANY
('{1813,1814}'::bigint[]))
   ->  Hash  (cost=33.21..33.21 rows=20 width=16)
         ->  Append  (cost=0.00..33.21 rows=20 width=16)
               ->  Seq Scan on dates  (cost=0.00..32.12 rows=18 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
               ->  Seq Scan on dates_y2004w51 dates  (cost=0.00..1.09
rows=2 width=16)
                     Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[]))
(15 rows)


--------------
create table myfact (
    id bigint not null,
    recorded_on_id bigint not null
);

create table myfact_y2004w51 (
    check (recorded_on_id >= 1812 and recorded_on_id <= 1818),
    primary key (id)
) inherits (myfact);
create table myfact_y2004w52 (
    check (recorded_on_id >= 1819 and recorded_on_id <= 1825),
    primary key (id)
) inherits (myfact);
create table myfact_y2004w53 (
    check (recorded_on_id >= 1826 and recorded_on_id <= 1832),
    primary key (id)
) inherits (myfact);

create table dates (
    datetime timestamp without time zone not null,
    recorded_on_id bigint not null
);

create table dates_y2004w51 (
    check (recorded_on_id >= 1812 and recorded_on_id <= 1818),
    primary key (datetime)
) inherits (dates);
create table dates_y2004w52 (
    check (recorded_on_id >= 1819 and recorded_on_id <= 1825),
    primary key (datetime)
) inherits (dates);
create table dates_y2004w53 (
    check (recorded_on_id >= 1826 and recorded_on_id <= 1832),
    primary key (datetime)
) inherits (dates);

insert into myfact_y2004w51 select s.i, (random() * 6 + 1812)::integer
from generate_series(0, 1000000, 1) as s(i);
insert into myfact_y2004w52 select s.i, (random() * 6 + 1819)::integer
from generate_series(0, 1000000, 1) as s(i);
insert into myfact_y2004w53 select s.i, (random() * 6 + 1826)::integer
from generate_series(0, 1000000, 1) as s(i);

insert into dates_y2004w51 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w51;
insert into dates_y2004w52 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w52;
insert into dates_y2004w53 select distinct on (recorded_on_id)
to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w53;

create index myfact_y2004w51_recorded_on_id on myfact_y2004w51(recorded_on_id);
create index myfact_y2004w52_recorded_on_id on myfact_y2004w52(recorded_on_id);
create index myfact_y2004w53_recorded_on_id on myfact_y2004w53(recorded_on_id);

create index dates_y2004w51_recorded_on_id on dates_y2004w51(recorded_on_id);
create index dates_y2004w52_recorded_on_id on dates_y2004w52(recorded_on_id);
create index dates_y2004w53_recorded_on_id on dates_y2004w53(recorded_on_id);

ANALYZE;ANALYZE;ANALYZE;ANALYZE;ANALYZE;

EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814);
EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact,
dates where myfact.recorded_on_id = dates.recorded_on_id and
dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in
(1813, 1814);
----------------------

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: transitive pruning optimization on the right side of a join for partition tables

From

madhukiranj

Date:

16 October 2012, 15:56:47

Hi,

I have tried more on this, if you just use an equality constraint on the
foreign key in the right side of the join, it works. Other constraints like
between, in, >,< do not work. Anyone else got to achieve this in postgresql?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/transitive-pruning-optimization-on-the-right-side-of-a-join-for-partition-tables-tp5726019p5728401.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.