Thread: transitive pruning optimization on the right side of a join for partition tables
transitive pruning optimization on the right side of a join for partition tables
From
"Waldo, Ethan"
Date:
I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key used to limit the query results is the joined foreign key rather than the primary key on the fact table where the check constraint lives.
For example, here you can see the constraint on one of the child tables: (queries and results have been slightly altered for better readability)
Table "public.myfact_y2004w51"
id | bigint | not null
recorded_on_id | integer | not null
Indexes:
"myfact_y2004w51_pkey" PRIMARY KEY, btree (id)
"myfact_y2004w51_recorded_on_id" btree (recorded_on_id)
Check constraints:
"myfact_y2004w51_recorded_on_id_check" CHECK (recorded_on_id >= 1812 AND recorded_on_id <= 1818)
Inherits: myfact
This query does a sequence scan and append across all the partition tables:
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');
Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the ids in the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN):
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');
Once again I reiterate that I don't have control over the query construction and I am currently running postgresql 9.1.5. My question is, does postgresql support transitive pruning optimization on the right side of a join for partition tables? If so, how do I get that to work? If not, are there plans for this and when should a release with this feature be expected?
For example, here you can see the constraint on one of the child tables: (queries and results have been slightly altered for better readability)
Table "public.myfact_y2004w51"
id | bigint | not null
recorded_on_id | integer | not null
Indexes:
"myfact_y2004w51_pkey" PRIMARY KEY, btree (id)
"myfact_y2004w51_recorded_on_id" btree (recorded_on_id)
Check constraints:
"myfact_y2004w51_recorded_on_id_check" CHECK (recorded_on_id >= 1812 AND recorded_on_id <= 1818)
Inherits: myfact
This query does a sequence scan and append across all the partition tables:
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');
Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the ids in the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN):
select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id" and "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647');
Once again I reiterate that I don't have control over the query construction and I am currently running postgresql 9.1.5. My question is, does postgresql support transitive pruning optimization on the right side of a join for partition tables? If so, how do I get that to work? If not, are there plans for this and when should a release with this feature be expected?
Re: transitive pruning optimization on the right side of a join for partition tables
From
Alban Hertroys
Date:
On 30 Sep 2012, at 8:36, Waldo, Ethan wrote: > I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key usedto limit the query results is the joined foreign key rather than the primary key on the fact table where the check constraintlives. Don't be too hasty making assumptions there! ;) > This query does a sequence scan and append across all the partition tables: > select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); > > Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the idsin the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN): > select "dates"."date_description" FROM "myfact" as > "myfact", "dates" as "dates" where > "myfact"."recorded_on_id" = "dates"."recorded_on_id" and > "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', > '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', > '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', > '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); What does EXPLAIN ANALYSE for these queries show? (might take a while, it performs the actual query) My guess is that there will be a large difference in selectivity between both tables for those ID's. Those id's, seeing that you're partitioning on them and they're in a year/week table, do those numbers have some meaning?Or is it perhaps just a daily increment that happens to have some sort of correlation to the date? Without more information,it seems a peculiar column to use for partitioning. It's possible that the issue here is just related to planner statistics, but it's also possible that it's necessary to changeyour partitioning to aid your BI tools. Another possibility is submitting a case with the people behind that BI software. Alban Hertroys -- The scale of a problem often equals the size of an ego.
Re: transitive pruning optimization on the right side of a join for partition tables
From
"Waldo, Ethan"
Date:
I don't need help troubleshooting the problem. I know exactly what the problem is. Either you know if postgresql supportstransitive pruning optimization on the right side of a join via a check constraint defined on a table partition oryou don't. If you do know, I'd appreciate knowledgeable input in that regard. Thank you for your response. ----- Original Message ----- From: "Alban Hertroys" <haramrae@gmail.com> Sent: Sun, 9/30/2012 4:51am To: "Waldo, Ethan" <ewaldo@healthetechs.com> Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables On 30 Sep 2012, at 8:36, Waldo, Ethan wrote: > I have a situation where I cannot explicitly control the queries generated from our BI and I would like to use table partitioning. Unfortunately the queries don't appear to be taking advantage of the table partitions because the key usedto limit the query results is the joined foreign key rather than the primary key on the fact table where the check constraintlives. Don't be too hasty making assumptions there! ;) > This query does a sequence scan and append across all the partition tables: > select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); > > Whereas this query correctly uses just the partition tables whose check constraints specify id ranges that match the idsin the IN list: (notice the subtle difference is the "dates"."recorded_on_id" IN vs. "myfact"."recorded_on_id" IN): > select "dates"."date_description" FROM "myfact" as > "myfact", "dates" as "dates" where > "myfact"."recorded_on_id" = "dates"."recorded_on_id" and > "myfact"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', > '4622', '4623', '4624', '4625', '4626', '4627', '4628', '4629', '4630', > '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639', > '4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); What does EXPLAIN ANALYSE for these queries show? (might take a while, it performs the actual query) My guess is that there will be a large difference in selectivity between both tables for those ID's. Those id's, seeing that you're partitioning on them and they're in a year/week table, do those numbers have some meaning?Or is it perhaps just a daily increment that happens to have some sort of correlation to the date? Without more information,it seems a peculiar column to use for partitioning. It's possible that the issue here is just related to planner statistics, but it's also possible that it's necessary to changeyour partitioning to aid your BI tools. Another possibility is submitting a case with the people behind that BI software. Alban Hertroys -- The scale of a problem often equals the size of an ego.
Re: transitive pruning optimization on the right side of a join for partition tables
From
Ondrej Ivanič
Date:
Hi, On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote: > Once again I reiterate that I don't have control over the query construction > and I am currently running postgresql 9.1.5. My question is, does > postgresql support transitive pruning optimization on the right side of a > join for partition tables? If so, how do I get that to work? If not, are > there plans for this and when should a release with this feature be > expected? No, postgres is not smart enough. You need to use same condition ("dates"."recorded_on_id" = ...) for myfact table ("myfact"."recorded_on_id" = ) but you do not have control over the query construction... -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
Ondrej Ivanič
Date:
Hi, On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote: > My question is, does > postgresql support transitive pruning optimization on the right side of a > join for partition tables? If so, how do I get that to work? If not, are > there plans for this and when should a release with this feature be > expected? Few guys implemented this using Postgres (8.3) and published their results: Join Optimization Techniques for Partitioned Tables ABSTRACT Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained independent of one an- other. The main use of partitioning used to be in reduc- ing the time to access large base tables in parallel systems. Partitioning has evolved into a powerful mechanism to im- prove the overall manageability of both centralized and par- allel database systems. Partitioning simplifies administra- tive tasks like data loading, removal, backup, statistics main- tenance, and storage provisioning. More importantly, SQL extensions and MapReduce frameworks now enable applica- tions and user queries to specify how derived tables should be partitioned. However, query optimization techniques have not kept pace with the rapid advances in usage and user con- trol of table partitioning. We address this gap by developing new techniques to generate efficient plans for SQL queries involving multiway joins over partitioned tables. Our tech- niques are designed for easy incorporation into bottom-up query optimizers in centralized and parallel database sys- tems. We have prototyped these techniques in PostgreSQL and in a parallel database system composed of PostgreSQL nodes managed by Hadoop. An extensive evaluation shows that our partition-aware optimization techniques, with low overhead, generate plans that are often an order of magni- tude better than plans produced by current optimizers. 8. CONCLUSION Query optimization technology has not kept pace with the growing usage and user control over table partitioning. We addressed this gap by developing new partition-aware optimization techniques to generate efficient plans for SQL queries. We made the following contributions: • Our new techniques are designed for easy incorporation into bottom-up query optimizers for both centralized and parallel systems. • We have prototyped these techniques in PostgreSQL and in a parallel shared-nothing database system composed of PostgreSQL nodes managed by Hadoop. • An extensive evaluation showed that our optimizer, with low optimization-time overhead, generates plans that are often an order of magnitude better than plans produced by current optimizers. www.cs.duke.edu/~shivnath/tmp/paqo_draft.pdf -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
"Waldo, Ethan"
Date:
Yeah, I actually saw that paper but couldn't find a date on it. Currently their techniques are well outside of the scopeof my current problem particularly in consideration that I could switch to MySQL which does support the right side joinpruning. I figured if MySQL can do it, there might be a good chance Postgres can too or will soon. ----- Original Message ----- From: "Ondrej Ivanič" <ondrej.ivanic@gmail.com> Sent: Sun, 9/30/2012 5:40am To: "Waldo, Ethan" <ewaldo@healthetechs.com> Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables Hi, On 30 September 2012 16:36, Waldo, Ethan <ewaldo@healthetechs.com> wrote: > My question is, does > postgresql support transitive pruning optimization on the right side of a > join for partition tables? If so, how do I get that to work? If not, are > there plans for this and when should a release with this feature be > expected? Few guys implemented this using Postgres (8.3) and published their results: Join Optimization Techniques for Partitioned Tables ABSTRACT Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained independent of one an- other. The main use of partitioning used to be in reduc- ing the time to access large base tables in parallel systems. Partitioning has evolved into a powerful mechanism to im- prove the overall manageability of both centralized and par- allel database systems. Partitioning simplifies administra- tive tasks like data loading, removal, backup, statistics main- tenance, and storage provisioning. More importantly, SQL extensions and MapReduce frameworks now enable applica- tions and user queries to specify how derived tables should be partitioned. However, query optimization techniques have not kept pace with the rapid advances in usage and user con- trol of table partitioning. We address this gap by developing new techniques to generate efficient plans for SQL queries involving multiway joins over partitioned tables. Our tech- niques are designed for easy incorporation into bottom-up query optimizers in centralized and parallel database sys- tems. We have prototyped these techniques in PostgreSQL and in a parallel database system composed of PostgreSQL nodes managed by Hadoop. An extensive evaluation shows that our partition-aware optimization techniques, with low overhead, generate plans that are often an order of magni- tude better than plans produced by current optimizers. 8. CONCLUSION Query optimization technology has not kept pace with the growing usage and user control over table partitioning. We addressed this gap by developing new partition-aware optimization techniques to generate efficient plans for SQL queries. We made the following contributions: • Our new techniques are designed for easy incorporation into bottom-up query optimizers for both centralized and parallel systems. • We have prototyped these techniques in PostgreSQL and in a parallel shared-nothing database system composed of PostgreSQL nodes managed by Hadoop. • An extensive evaluation showed that our optimizer, with low optimization-time overhead, generates plans that are often an order of magnitude better than plans produced by current optimizers. www.cs.duke.edu/~shivnath/tmp/paqo_draft.pdf -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
Ondrej Ivanič
Date:
Hi, On 30 September 2012 21:00, Waldo, Ethan <ewaldo@healthetechs.com> wrote: > Yeah, I actually saw that paper but couldn't find a date on it. Currently their techniques are well outside > of the scope of my current problem particularly in consideration that I could switch to MySQL which does support > the right side join pruning. I figured if MySQL can do it, there might be a good chance Postgres can too or will soon. Google says 2010 (Google Scholar). I was in the same position 2 years ago but I was able to change / optimise our BI solution. Good to know that MySQL can do this. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
Tom Lane
Date:
"Waldo, Ethan" <ewaldo@healthetechs.com> writes: > This query does a sequence scan and append across all the partition tables: > select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); When I try that in 9.1, I get a plan with inner indexscans for each child table; which, while not exactly what you're asking for, should perform well enough when the fact table is large enough that partitioning is actually a useful activity. I suspect you're committing one of the ten deadly sins of Postgres optimization, which is to assume that the plan you get on a toy test case is the same plan you'd get for monster tables. Planning choices are nonlinear. regards, tom lane
Re: transitive pruning optimization on the right side of a join for partition tables
From
Ondrej Ivanič
Date:
Hi, On 1 October 2012 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Waldo, Ethan" <ewaldo@healthetechs.com> writes: >> This query does a sequence scan and append across all the partition tables: >> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); > > When I try that in 9.1, I get a plan with inner indexscans for each > child table; which, while not exactly what you're asking for, should > perform well enough when the fact table is large enough that > partitioning is actually a useful activity. I do not have 9.1 handy but this is from 9.0.4: db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814); QUERY PLAN -------------------------------------------------------------------------------------------- Hash Join (cost=33.46..86124.15 rows=2858831 width=16) Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id) -> Append (cost=0.00..46245.73 rows=3001773 width=8) -> Seq Scan on myfact (cost=0.00..27.70 rows=1770 width=8) -> Seq Scan on myfact_y2004w51 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Seq Scan on myfact_y2004w52 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Seq Scan on myfact_y2004w53 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Hash (cost=33.21..33.21 rows=20 width=16) -> Append (cost=0.00..33.21 rows=20 width=16) -> Seq Scan on dates (cost=0.00..32.12 rows=18 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Seq Scan on dates_y2004w51 dates (cost=0.00..1.09 rows=2 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) (13 rows) Postgres can't infer that myfact.recorded_on_id is in (1813, 1814) from the join condition (myfact.recorded_on_id = dates.recorded_on_id) hence all partitons are included (myfact_y2004w51, myfact_y2004w53, myfact_y2004w53). Adding "myfact.recorded_on_id in (1813, 1814)" creates much better plan: db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in (1813, 1814); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Hash Join (cost=33.46..18878.72 rows=296673 width=16) Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id) -> Append (cost=0.00..14710.38 rows=311507 width=8) -> Seq Scan on myfact (cost=0.00..32.12 rows=18 width=8) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Bitmap Heap Scan on myfact_y2004w51 myfact (cost=5378.64..14678.25 rows=311489 width=8) Recheck Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Bitmap Index Scan on myfact_y2004w51_recorded_on_id (cost=0.00..5300.77 rows=311489 width=0) Index Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Hash (cost=33.21..33.21 rows=20 width=16) -> Append (cost=0.00..33.21 rows=20 width=16) -> Seq Scan on dates (cost=0.00..32.12 rows=18 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Seq Scan on dates_y2004w51 dates (cost=0.00..1.09 rows=2 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) (15 rows) -------------- create table myfact ( id bigint not null, recorded_on_id bigint not null ); create table myfact_y2004w51 ( check (recorded_on_id >= 1812 and recorded_on_id <= 1818), primary key (id) ) inherits (myfact); create table myfact_y2004w52 ( check (recorded_on_id >= 1819 and recorded_on_id <= 1825), primary key (id) ) inherits (myfact); create table myfact_y2004w53 ( check (recorded_on_id >= 1826 and recorded_on_id <= 1832), primary key (id) ) inherits (myfact); create table dates ( datetime timestamp without time zone not null, recorded_on_id bigint not null ); create table dates_y2004w51 ( check (recorded_on_id >= 1812 and recorded_on_id <= 1818), primary key (datetime) ) inherits (dates); create table dates_y2004w52 ( check (recorded_on_id >= 1819 and recorded_on_id <= 1825), primary key (datetime) ) inherits (dates); create table dates_y2004w53 ( check (recorded_on_id >= 1826 and recorded_on_id <= 1832), primary key (datetime) ) inherits (dates); insert into myfact_y2004w51 select s.i, (random() * 6 + 1812)::integer from generate_series(0, 1000000, 1) as s(i); insert into myfact_y2004w52 select s.i, (random() * 6 + 1819)::integer from generate_series(0, 1000000, 1) as s(i); insert into myfact_y2004w53 select s.i, (random() * 6 + 1826)::integer from generate_series(0, 1000000, 1) as s(i); insert into dates_y2004w51 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w51; insert into dates_y2004w52 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w52; insert into dates_y2004w53 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w53; create index myfact_y2004w51_recorded_on_id on myfact_y2004w51(recorded_on_id); create index myfact_y2004w52_recorded_on_id on myfact_y2004w52(recorded_on_id); create index myfact_y2004w53_recorded_on_id on myfact_y2004w53(recorded_on_id); create index dates_y2004w51_recorded_on_id on dates_y2004w51(recorded_on_id); create index dates_y2004w52_recorded_on_id on dates_y2004w52(recorded_on_id); create index dates_y2004w53_recorded_on_id on dates_y2004w53(recorded_on_id); ANALYZE;ANALYZE;ANALYZE;ANALYZE;ANALYZE; EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814); EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in (1813, 1814); ---------------------- -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
"Waldo, Ethan"
Date:
Thank you very much for confirming my query results and taking the time to iterate out a test environment and document insuch detail. It would be really awesome if the Postgres query plans could infer check constraints from the where clause'sforeign key usage in addition to the native key usage. Hopefully the postgres team will take an interest in thiscapability. ----- Original Message ----- From: "Ondrej Ivanič" <ondrej.ivanic@gmail.com> Sent: Sun, 9/30/2012 8:35pm To: "Tom Lane" <tgl@sss.pgh.pa.us> Cc: "Waldo, Ethan" <ewaldo@healthetechs.com> ; pgsql-general@postgresql.org Subject: Re: [GENERAL] transitive pruning optimization on the right side of a join for partition tables Hi, On 1 October 2012 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Waldo, Ethan" <ewaldo@healthetechs.com> writes: >> This query does a sequence scan and append across all the partition tables: >> select "dates"."date_description" FROM "myfact" as "myfact", "dates" as "dates" where "myfact"."recorded_on_id" = "dates"."recorded_on_id"and "dates"."recorded_on_id" IN ('4617', '4618', '4619', '4620', '4621', '4622', '4623', '4624','4625', '4626', '4627', '4628', '4629', '4630', '4631', '4632', '4633', '4634', '4635', '4636', '4637', '4638', '4639','4640', '4641', '4642', '4643', '4644', '4645', '4646', '4647'); > > When I try that in 9.1, I get a plan with inner indexscans for each > child table; which, while not exactly what you're asking for, should > perform well enough when the fact table is large enough that > partitioning is actually a useful activity. I do not have 9.1 handy but this is from 9.0.4: db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814); QUERY PLAN -------------------------------------------------------------------------------------------- Hash Join (cost=33.46..86124.15 rows=2858831 width=16) Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id) -> Append (cost=0.00..46245.73 rows=3001773 width=8) -> Seq Scan on myfact (cost=0.00..27.70 rows=1770 width=8) -> Seq Scan on myfact_y2004w51 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Seq Scan on myfact_y2004w52 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Seq Scan on myfact_y2004w53 myfact (cost=0.00..15406.01 rows=1000001 width=8) -> Hash (cost=33.21..33.21 rows=20 width=16) -> Append (cost=0.00..33.21 rows=20 width=16) -> Seq Scan on dates (cost=0.00..32.12 rows=18 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Seq Scan on dates_y2004w51 dates (cost=0.00..1.09 rows=2 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) (13 rows) Postgres can't infer that myfact.recorded_on_id is in (1813, 1814) from the join condition (myfact.recorded_on_id = dates.recorded_on_id) hence all partitons are included (myfact_y2004w51, myfact_y2004w53, myfact_y2004w53). Adding "myfact.recorded_on_id in (1813, 1814)" creates much better plan: db=# EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in (1813, 1814); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Hash Join (cost=33.46..18878.72 rows=296673 width=16) Hash Cond: (public.myfact.recorded_on_id = public.dates.recorded_on_id) -> Append (cost=0.00..14710.38 rows=311507 width=8) -> Seq Scan on myfact (cost=0.00..32.12 rows=18 width=8) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Bitmap Heap Scan on myfact_y2004w51 myfact (cost=5378.64..14678.25 rows=311489 width=8) Recheck Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Bitmap Index Scan on myfact_y2004w51_recorded_on_id (cost=0.00..5300.77 rows=311489 width=0) Index Cond: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Hash (cost=33.21..33.21 rows=20 width=16) -> Append (cost=0.00..33.21 rows=20 width=16) -> Seq Scan on dates (cost=0.00..32.12 rows=18 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) -> Seq Scan on dates_y2004w51 dates (cost=0.00..1.09 rows=2 width=16) Filter: (recorded_on_id = ANY ('{1813,1814}'::bigint[])) (15 rows) -------------- create table myfact ( id bigint not null, recorded_on_id bigint not null ); create table myfact_y2004w51 ( check (recorded_on_id >= 1812 and recorded_on_id <= 1818), primary key (id) ) inherits (myfact); create table myfact_y2004w52 ( check (recorded_on_id >= 1819 and recorded_on_id <= 1825), primary key (id) ) inherits (myfact); create table myfact_y2004w53 ( check (recorded_on_id >= 1826 and recorded_on_id <= 1832), primary key (id) ) inherits (myfact); create table dates ( datetime timestamp without time zone not null, recorded_on_id bigint not null ); create table dates_y2004w51 ( check (recorded_on_id >= 1812 and recorded_on_id <= 1818), primary key (datetime) ) inherits (dates); create table dates_y2004w52 ( check (recorded_on_id >= 1819 and recorded_on_id <= 1825), primary key (datetime) ) inherits (dates); create table dates_y2004w53 ( check (recorded_on_id >= 1826 and recorded_on_id <= 1832), primary key (datetime) ) inherits (dates); insert into myfact_y2004w51 select s.i, (random() * 6 + 1812)::integer from generate_series(0, 1000000, 1) as s(i); insert into myfact_y2004w52 select s.i, (random() * 6 + 1819)::integer from generate_series(0, 1000000, 1) as s(i); insert into myfact_y2004w53 select s.i, (random() * 6 + 1826)::integer from generate_series(0, 1000000, 1) as s(i); insert into dates_y2004w51 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w51; insert into dates_y2004w52 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w52; insert into dates_y2004w53 select distinct on (recorded_on_id) to_timestamp(recorded_on_id), recorded_on_id from myfact_y2004w53; create index myfact_y2004w51_recorded_on_id on myfact_y2004w51(recorded_on_id); create index myfact_y2004w52_recorded_on_id on myfact_y2004w52(recorded_on_id); create index myfact_y2004w53_recorded_on_id on myfact_y2004w53(recorded_on_id); create index dates_y2004w51_recorded_on_id on dates_y2004w51(recorded_on_id); create index dates_y2004w52_recorded_on_id on dates_y2004w52(recorded_on_id); create index dates_y2004w53_recorded_on_id on dates_y2004w53(recorded_on_id); ANALYZE;ANALYZE;ANALYZE;ANALYZE;ANALYZE; EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814); EXPLAIN select dates.datetime, myfact.recorded_on_id from myfact, dates where myfact.recorded_on_id = dates.recorded_on_id and dates.recorded_on_id in (1813, 1814) and myfact.recorded_on_id in (1813, 1814); ---------------------- -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Re: transitive pruning optimization on the right side of a join for partition tables
From
madhukiranj
Date:
Hi, I have tried more on this, if you just use an equality constraint on the foreign key in the right side of the join, it works. Other constraints like between, in, >,< do not work. Anyone else got to achieve this in postgresql? -- View this message in context: http://postgresql.1045698.n5.nabble.com/transitive-pruning-optimization-on-the-right-side-of-a-join-for-partition-tables-tp5726019p5728401.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.