Thread: TPC-R benchmarks
I am running TPC-R benchmarks with a scale factor of 1, which correspond to approximately 1 GB database size on PostgreSQL 7.3.4 installed on CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges installation.
Most of the queries were able to complete in a matter of minutes, but query 17 was taking hours and hours. The query is show below. Is there any way to optimize it ?
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#11'
and p_container = 'SM PKG'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#11'
and p_container = 'SM PKG'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);
Thanks.
Oleg
*************************************
This e-mail may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network.
*************************************
I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel 2.5.74. Q17 can always finish in about 7 seconds on my system. The execution plan is: ---------------------------------------------------------------------------------------------------- Aggregate (cost=780402.43..780402.43 rows=1 width=48) -> Nested Loop (cost=0.00..780397.50 rows=1973 width=48) Join Filter: ("inner".l_quantity < (subplan)) -> Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container = 'LG CASE'::bpchar)) -> Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=36) Index Cond: ("outer".p_partkey = lineitem.l_partkey) SubPlan -> Aggregate (cost=124.40..124.40 rows=1 width=11) -> Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=11) Index Cond: (l_partkey = $0) (11 rows) Hope this helps, Jenny On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: > I am running TPC-R benchmarks with a scale factor of 1, which correspond > to approximately 1 GB database size on PostgreSQL 7.3.4 installed on > CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges > installation. > Most of the queries were able to complete in a matter of minutes, but > query 17 was taking hours and hours. The query is show below. Is there > any way to optimize it ? > > select > sum(l_extendedprice) / 7.0 as avg_yearly > from > lineitem, > part > where > p_partkey = l_partkey > and p_brand = 'Brand#11' > and p_container = 'SM PKG' > and l_quantity < ( > select > 0.2 * avg(l_quantity) > from > lineitem > where > l_partkey = p_partkey > ); > > Thanks. > > Oleg > > ************************************* > > This e-mail may contain privileged or confidential material intended for the named recipient only. > If you are not the named recipient, delete this message and all attachments. > Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. > We reserve the right to monitor e-mail sent through our network. > > *************************************
Seems like in your case postgres uses an i_l_partkey index on lineitem table. I have a foreign key constraint defined between the lineitem and part table, but didn't create an special indexes. Here is my query plan: -> Aggregate (cost=1517604222.32..1517604222.32 rows=1 width=31) -> Hash Join (cost=8518.49..1517604217.39 rows=1969 width=31) Hash Cond: ("outer".l_partkey = "inner".p_partkey) Join Filter: ("outer".l_quantity < (subplan)) -> Seq Scan on lineitem (cost=0.00..241889.15 rows=6001215 widt h=27) -> Hash (cost=8518.00..8518.00 rows=197 width=4) -> Seq Scan on part (cost=0.00..8518.00 rows=197 width=4) Filter: ((p_brand = 'Brand#11'::bpchar) AND (p_contai ner = 'SM PKG'::bpchar)) SubPlan -> Aggregate (cost=256892.28..256892.28 rows=1 width=11) -> Seq Scan on lineitem (cost=0.00..256892.19 rows=37 w idth=11) Filter: (l_partkey = $0) -----Original Message----- From: Jenny Zhang [mailto:jenny@osdl.org] Sent: Thursday, September 25, 2003 3:33 PM To: Oleg Lebedev Cc: pgsql-performance@postgresql.org; osdldbt-general@lists.courceforge.net Subject: Re: [PERFORM] TPC-R benchmarks I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel 2.5.74. Q17 can always finish in about 7 seconds on my system. The execution plan is: ------------------------------------------------------------------------ ---------------------------- Aggregate (cost=780402.43..780402.43 rows=1 width=48) -> Nested Loop (cost=0.00..780397.50 rows=1973 width=48) Join Filter: ("inner".l_quantity < (subplan)) -> Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container = 'LG CASE'::bpchar)) -> Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=36) Index Cond: ("outer".p_partkey = lineitem.l_partkey) SubPlan -> Aggregate (cost=124.40..124.40 rows=1 width=11) -> Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=11) Index Cond: (l_partkey = $0) (11 rows) Hope this helps, Jenny On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: > I am running TPC-R benchmarks with a scale factor of 1, which > correspond to approximately 1 GB database size on PostgreSQL 7.3.4 > installed on CygWin on Windows XP. I dedicated 128 MB of shared memory > to my postrges installation. Most of the queries were able to complete > in a matter of minutes, but query 17 was taking hours and hours. The > query is show below. Is there any way to optimize it ? > > select > sum(l_extendedprice) / 7.0 as avg_yearly > from > lineitem, > part > where > p_partkey = l_partkey > and p_brand = 'Brand#11' > and p_container = 'SM PKG' > and l_quantity < ( > select > 0.2 * avg(l_quantity) > from > lineitem > where > l_partkey = p_partkey > ); > > Thanks. > > Oleg > > ************************************* > > This e-mail may contain privileged or confidential material intended > for the named recipient only. If you are not the named recipient, > delete this message and all attachments. Unauthorized reviewing, > copying, printing, disclosing, or otherwise using information in this > e-mail is prohibited. We reserve the right to monitor e-mail sent > through our network. > > ************************************* ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
The index is created by: create index i_l_partkey on lineitem (l_partkey); I do not have any foreign key defined. Does the spec require foreign keys? When you create a foreign key reference, does PG create an index automatically? Can you try with the index? Jenny On Thu, 2003-09-25 at 14:39, Oleg Lebedev wrote: > Seems like in your case postgres uses an i_l_partkey index on lineitem > table. I have a foreign key constraint defined between the lineitem and > part table, but didn't create an special indexes. Here is my query plan: > > -> Aggregate (cost=1517604222.32..1517604222.32 rows=1 width=31) > -> Hash Join (cost=8518.49..1517604217.39 rows=1969 width=31) > Hash Cond: ("outer".l_partkey = "inner".p_partkey) > Join Filter: ("outer".l_quantity < (subplan)) > -> Seq Scan on lineitem (cost=0.00..241889.15 > rows=6001215 widt > h=27) > -> Hash (cost=8518.00..8518.00 rows=197 width=4) > -> Seq Scan on part (cost=0.00..8518.00 rows=197 > width=4) > > Filter: ((p_brand = 'Brand#11'::bpchar) AND > (p_contai > ner = 'SM PKG'::bpchar)) > SubPlan > -> Aggregate (cost=256892.28..256892.28 rows=1 > width=11) > -> Seq Scan on lineitem (cost=0.00..256892.19 > rows=37 w > idth=11) > Filter: (l_partkey = $0) > > -----Original Message----- > From: Jenny Zhang [mailto:jenny@osdl.org] > Sent: Thursday, September 25, 2003 3:33 PM > To: Oleg Lebedev > Cc: pgsql-performance@postgresql.org; > osdldbt-general@lists.courceforge.net > Subject: Re: [PERFORM] TPC-R benchmarks > > > I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel > 2.5.74. Q17 can always finish in about 7 seconds on my system. The > execution plan is: > ------------------------------------------------------------------------ > ---------------------------- > Aggregate (cost=780402.43..780402.43 rows=1 width=48) > -> Nested Loop (cost=0.00..780397.50 rows=1973 width=48) > Join Filter: ("inner".l_quantity < (subplan)) > -> Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) > Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container > = 'LG CASE'::bpchar)) > -> Index Scan using i_l_partkey on lineitem > (cost=0.00..124.32 rows=30 width=36) > Index Cond: ("outer".p_partkey = lineitem.l_partkey) > SubPlan > -> Aggregate (cost=124.40..124.40 rows=1 width=11) > -> Index Scan using i_l_partkey on lineitem > (cost=0.00..124.32 rows=30 width=11) > Index Cond: (l_partkey = $0) > (11 rows) > > Hope this helps, > Jenny > On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: > > I am running TPC-R benchmarks with a scale factor of 1, which > > correspond to approximately 1 GB database size on PostgreSQL 7.3.4 > > installed on CygWin on Windows XP. I dedicated 128 MB of shared memory > > > to my postrges installation. Most of the queries were able to complete > > > in a matter of minutes, but query 17 was taking hours and hours. The > > query is show below. Is there any way to optimize it ? > > > > select > > sum(l_extendedprice) / 7.0 as avg_yearly > > from > > lineitem, > > part > > where > > p_partkey = l_partkey > > and p_brand = 'Brand#11' > > and p_container = 'SM PKG' > > and l_quantity < ( > > select > > 0.2 * avg(l_quantity) > > from > > lineitem > > where > > l_partkey = p_partkey > > ); > > > > Thanks. > > > > Oleg > > > > ************************************* > > > > This e-mail may contain privileged or confidential material intended > > for the named recipient only. If you are not the named recipient, > > delete this message and all attachments. Unauthorized reviewing, > > copying, printing, disclosing, or otherwise using information in this > > e-mail is prohibited. We reserve the right to monitor e-mail sent > > through our network. > > > > ************************************* > > ************************************* > > This e-mail may contain privileged or confidential material intended for the named recipient only. > If you are not the named recipient, delete this message and all attachments. > Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. > We reserve the right to monitor e-mail sent through our network. > > ************************************* > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Jenny, > create index i_l_partkey on lineitem (l_partkey); > I do not have any foreign key defined. Does the spec require foreign > keys? > > When you create a foreign key reference, does PG create an index > automatically? No. A index is not required to enforce a foriegn key, and is sometimes not useful (for example, FK fields with only 3 possible values). So it may be that you need to create an index on that field. -- Josh Berkus Aglio Database Solutions San Francisco
Oleg Lebedev <oleg.lebedev@waterford.org> writes: > Seems like in your case postgres uses an i_l_partkey index on lineitem > table. I have a foreign key constraint defined between the lineitem and > part table, but didn't create an special indexes. Here is my query plan: The planner is obviously unhappy with this plan (note the large cost numbers), but it can't find a way to do better. An index on lineitem.l_partkey would help, I think. The whole query seems like it's written in a very inefficient fashion; couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across multiple join rows? But I dunno whether the TPC rules allow for significant manual rewriting of the given query. regards, tom lane
The TPC-H/R rules allow only minor changes to the SQL that are necessary due to SQL implementation differences. They do not allow changes made to improve performance. It is their way to test optimizer's ability to recognize an inefficient SQL statement and do the rewrite. The rule makes sense for the TPC-H, which is supposed to represent ad-Hoc query. One might argue that for TPC-R, which is suppose to represent "Reporting" with pre-knowledge of the query, that re-write should be allowed. However, that is currently not the case. Since the RDBMS's represented on the TPC council are competing with TPC-H, their optimizers already do the re-write, so (IMHO) there is no motivation to relax the rules for the TPC-R. On Thu, 2003-09-25 at 21:28, Tom Lane wrote: > Oleg Lebedev <oleg.lebedev@waterford.org> writes: > > Seems like in your case postgres uses an i_l_partkey index on lineitem > > table. I have a foreign key constraint defined between the lineitem and > > part table, but didn't create an special indexes. Here is my query plan: > > The planner is obviously unhappy with this plan (note the large cost > numbers), but it can't find a way to do better. An index on > lineitem.l_partkey would help, I think. > > The whole query seems like it's written in a very inefficient fashion; > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across > multiple join rows? But I dunno whether the TPC rules allow for > significant manual rewriting of the given query. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab
I left my TPC-R query #17 working over the weekend and it took 3988 mins ~ 10 hours to complete. And this is considering that I am using a TPC-R database created with a scale factor of 1, which corresponds to ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM. Here is an excerpt from my postgresql.conf file (the rest of the settings are commented out): # # Shared Memory Size # shared_buffers = 16384 # 2*max_connections, min 16, typically 8KB each # # Non-shared Memory Sizes # sort_mem = 32768 # # Optimizer Parameters # effective_cache_size = 32000 # typically 8KB each Any suggestions on how to optimize these settings? I agree with Jenny that declaring additional indexes on the TPC-R tables may alter the validity of the benchmarks. Are there any official TPC benchmarks submitted by PostgreSQL? Thanks. Oleg -----Original Message----- From: Mary Edie Meredith [mailto:maryedie@osdl.org] Sent: Friday, September 26, 2003 10:12 AM To: Tom Lane Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance Subject: Re: [PERFORM] TPC-R benchmarks The TPC-H/R rules allow only minor changes to the SQL that are necessary due to SQL implementation differences. They do not allow changes made to improve performance. It is their way to test optimizer's ability to recognize an inefficient SQL statement and do the rewrite. The rule makes sense for the TPC-H, which is supposed to represent ad-Hoc query. One might argue that for TPC-R, which is suppose to represent "Reporting" with pre-knowledge of the query, that re-write should be allowed. However, that is currently not the case. Since the RDBMS's represented on the TPC council are competing with TPC-H, their optimizers already do the re-write, so (IMHO) there is no motivation to relax the rules for the TPC-R. On Thu, 2003-09-25 at 21:28, Tom Lane wrote: > Oleg Lebedev <oleg.lebedev@waterford.org> writes: > > Seems like in your case postgres uses an i_l_partkey index on > > lineitem table. I have a foreign key constraint defined between the > > lineitem and part table, but didn't create an special indexes. Here > > is my query plan: > > The planner is obviously unhappy with this plan (note the large cost > numbers), but it can't find a way to do better. An index on > lineitem.l_partkey would help, I think. > > The whole query seems like it's written in a very inefficient fashion; > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across > multiple join rows? But I dunno whether the TPC rules allow for > significant manual rewriting of the given query. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Oleg Lebedev wrote: > effective_cache_size = 32000 # typically 8KB each That is 256MB. You can raise it to 350+MB if nothing else is running on the box. Also if you have fast disk drives, you can reduce random page cost to 2 or 1.5. I don't know how much this will make any difference to benchmark results but usually this helps when queries are slow. HTH Shridhar
On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote: > I left my TPC-R query #17 working over the weekend and it took 3988 mins > ~ 10 hours to complete. And this is considering that I am using a TPC-R > database created with a scale factor of 1, which corresponds to ~1 GB of > data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM. Was this run with or without the l_partkey index that Jenny suggested? > > Here is an excerpt from my postgresql.conf file (the rest of the > settings are commented out): > > # > # Shared Memory Size > # > shared_buffers = 16384 # 2*max_connections, min 16, typically > 8KB each > > # > # Non-shared Memory Sizes > # > sort_mem = 32768 > > # > # Optimizer Parameters > # > effective_cache_size = 32000 # typically 8KB each > > Any suggestions on how to optimize these settings? > > I agree with Jenny that declaring additional indexes on the TPC-R tables > may alter the validity of the benchmarks. Are there any official TPC > benchmarks submitted by PostgreSQL? Actually, for the TPC-R you _are allowed to declare additional indexes. With TPC-H you are restricted to a specific set listed in the spec (an index on l_partkey is allowed for both). What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the query for the purposes of making the query run faster. Sorry if I was unclear. Valid TPC-R benchmark results are on the TPC web site: http://www.tpc.org/tpcr/default.asp I do not see one for PostgreSQL. Regards, Mary -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab > > Thanks. > > Oleg > > -----Original Message----- > From: Mary Edie Meredith [mailto:maryedie@osdl.org] > Sent: Friday, September 26, 2003 10:12 AM > To: Tom Lane > Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance > Subject: Re: [PERFORM] TPC-R benchmarks > > > The TPC-H/R rules allow only minor changes to the SQL that are necessary > due to SQL implementation differences. They do not allow changes made to > improve performance. It is their way to test optimizer's ability to > recognize an inefficient SQL statement and do the rewrite. > > The rule makes sense for the TPC-H, which is supposed to represent > ad-Hoc query. One might argue that for TPC-R, which is suppose to > represent "Reporting" with pre-knowledge of the query, that re-write > should be allowed. However, that is currently not the case. Since the > RDBMS's represented on the TPC council are competing with TPC-H, their > optimizers already do the re-write, so (IMHO) there is no motivation to > relax the rules for the TPC-R. > > > On Thu, 2003-09-25 at 21:28, Tom Lane wrote: > > Oleg Lebedev <oleg.lebedev@waterford.org> writes: > > > Seems like in your case postgres uses an i_l_partkey index on > > > lineitem table. I have a foreign key constraint defined between the > > > lineitem and part table, but didn't create an special indexes. Here > > > is my query plan: > > > > The planner is obviously unhappy with this plan (note the large cost > > numbers), but it can't find a way to do better. An index on > > lineitem.l_partkey would help, I think. > > > > The whole query seems like it's written in a very inefficient fashion; > > > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across > > > multiple join rows? But I dunno whether the TPC rules allow for > > significant manual rewriting of the given query. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that > your > > message can get through to the mailing list cleanly
Mary Edie Meredith <maryedie@osdl.org> writes: > Valid TPC-R benchmark results are on the TPC web site: > http://www.tpc.org/tpcr/default.asp > I do not see one for PostgreSQL. I'm pretty certain that there are no TPC-certified test results for Postgres, because to date no organization has cared to spend the money needed to perform a certifiable test. From what I understand you need a pretty significant commitment of people and hardware to jump through all the hoops involved... regards, tom lane
It took 10 hours to compute the query without the index on lineitem.l_partkey. Once I created the index on lineitem.l_partkey, it took only 32 secs to run the same query. After VACUUM ANALYZE it took 72 secs to run the query. All the subsequent runs took under 3 seconds! That's quite amazing! I just checked -----Original Message----- From: Mary Edie Meredith [mailto:maryedie@osdl.org] Sent: Monday, September 29, 2003 10:04 AM To: Oleg Lebedev Cc: Tom Lane; Jenny Zhang; pgsql-performance Subject: RE: [PERFORM] TPC-R benchmarks On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote: > I left my TPC-R query #17 working over the weekend and it took 3988 > mins ~ 10 hours to complete. And this is considering that I am using a > TPC-R database created with a scale factor of 1, which corresponds to > ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 > MB RAM. Was this run with or without the l_partkey index that Jenny suggested? > > Here is an excerpt from my postgresql.conf file (the rest of the > settings are commented out): > > # > # Shared Memory Size > # > shared_buffers = 16384 # 2*max_connections, min 16, typically > 8KB each > > # > # Non-shared Memory Sizes > # > sort_mem = 32768 > > # > # Optimizer Parameters > # > effective_cache_size = 32000 # typically 8KB each > > Any suggestions on how to optimize these settings? > > I agree with Jenny that declaring additional indexes on the TPC-R > tables may alter the validity of the benchmarks. Are there any > official TPC benchmarks submitted by PostgreSQL? Actually, for the TPC-R you _are allowed to declare additional indexes. With TPC-H you are restricted to a specific set listed in the spec (an index on l_partkey is allowed for both). What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the query for the purposes of making the query run faster. Sorry if I was unclear. Valid TPC-R benchmark results are on the TPC web site: http://www.tpc.org/tpcr/default.asp I do not see one for PostgreSQL. Regards, Mary -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab > > Thanks. > > Oleg > > -----Original Message----- > From: Mary Edie Meredith [mailto:maryedie@osdl.org] > Sent: Friday, September 26, 2003 10:12 AM > To: Tom Lane > Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance > Subject: Re: [PERFORM] TPC-R benchmarks > > > The TPC-H/R rules allow only minor changes to the SQL that are > necessary due to SQL implementation differences. They do not allow > changes made to improve performance. It is their way to test > optimizer's ability to recognize an inefficient SQL statement and do > the rewrite. > > The rule makes sense for the TPC-H, which is supposed to represent > ad-Hoc query. One might argue that for TPC-R, which is suppose to > represent "Reporting" with pre-knowledge of the query, that re-write > should be allowed. However, that is currently not the case. Since the > RDBMS's represented on the TPC council are competing with TPC-H, their > optimizers already do the re-write, so (IMHO) there is no motivation > to relax the rules for the TPC-R. > > > On Thu, 2003-09-25 at 21:28, Tom Lane wrote: > > Oleg Lebedev <oleg.lebedev@waterford.org> writes: > > > Seems like in your case postgres uses an i_l_partkey index on > > > lineitem table. I have a foreign key constraint defined between the > > > lineitem and part table, but didn't create an special indexes. Here > > > is my query plan: > > > > The planner is obviously unhappy with this plan (note the large cost > > numbers), but it can't find a way to do better. An index on > > lineitem.l_partkey would help, I think. > > > > The whole query seems like it's written in a very inefficient > > fashion; > > > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized > > across > > > multiple join rows? But I dunno whether the TPC rules allow for > > significant manual rewriting of the given query. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that > your > > message can get through to the mailing list cleanly ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Oops, my previous message got cut off. Here is the end of it: I just checked the restrictions on the TPC-R and TPC-H schemas and it seems that all indexes are allowed in TPC-R and only those that index parts of primary or foreign keys are allowed in TPC-H. Thanks. Oleg -----Original Message----- From: Oleg Lebedev Sent: Monday, September 29, 2003 11:23 AM To: Mary Edie Meredith Cc: Jenny Zhang; pgsql-performance Subject: Re: [PERFORM] TPC-R benchmarks Importance: Low It took 10 hours to compute the query without the index on lineitem.l_partkey. Once I created the index on lineitem.l_partkey, it took only 32 secs to run the same query. After VACUUM ANALYZE it took 72 secs to run the query. All the subsequent runs took under 3 seconds! That's quite amazing! I just checked -----Original Message----- From: Mary Edie Meredith [mailto:maryedie@osdl.org] Sent: Monday, September 29, 2003 10:04 AM To: Oleg Lebedev Cc: Tom Lane; Jenny Zhang; pgsql-performance Subject: RE: [PERFORM] TPC-R benchmarks On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote: > I left my TPC-R query #17 working over the weekend and it took 3988 > mins ~ 10 hours to complete. And this is considering that I am using a > TPC-R database created with a scale factor of 1, which corresponds to > ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 > MB RAM. Was this run with or without the l_partkey index that Jenny suggested? > > Here is an excerpt from my postgresql.conf file (the rest of the > settings are commented out): > > # > # Shared Memory Size > # > shared_buffers = 16384 # 2*max_connections, min 16, typically > 8KB each > > # > # Non-shared Memory Sizes > # > sort_mem = 32768 > > # > # Optimizer Parameters > # > effective_cache_size = 32000 # typically 8KB each > > Any suggestions on how to optimize these settings? > > I agree with Jenny that declaring additional indexes on the TPC-R > tables may alter the validity of the benchmarks. Are there any > official TPC benchmarks submitted by PostgreSQL? Actually, for the TPC-R you _are allowed to declare additional indexes. With TPC-H you are restricted to a specific set listed in the spec (an index on l_partkey is allowed for both). What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the query for the purposes of making the query run faster. Sorry if I was unclear. Valid TPC-R benchmark results are on the TPC web site: http://www.tpc.org/tpcr/default.asp I do not see one for PostgreSQL. Regards, Mary -- Mary Edie Meredith <maryedie@osdl.org> Open Source Development Lab > > Thanks. > > Oleg > > -----Original Message----- > From: Mary Edie Meredith [mailto:maryedie@osdl.org] > Sent: Friday, September 26, 2003 10:12 AM > To: Tom Lane > Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance > Subject: Re: [PERFORM] TPC-R benchmarks > > > The TPC-H/R rules allow only minor changes to the SQL that are > necessary due to SQL implementation differences. They do not allow > changes made to improve performance. It is their way to test > optimizer's ability to recognize an inefficient SQL statement and do > the rewrite. > > The rule makes sense for the TPC-H, which is supposed to represent > ad-Hoc query. One might argue that for TPC-R, which is suppose to > represent "Reporting" with pre-knowledge of the query, that re-write > should be allowed. However, that is currently not the case. Since the > RDBMS's represented on the TPC council are competing with TPC-H, their > optimizers already do the re-write, so (IMHO) there is no motivation > to relax the rules for the TPC-R. > > > On Thu, 2003-09-25 at 21:28, Tom Lane wrote: > > Oleg Lebedev <oleg.lebedev@waterford.org> writes: > > > Seems like in your case postgres uses an i_l_partkey index on > > > lineitem table. I have a foreign key constraint defined between the > > > lineitem and part table, but didn't create an special indexes. Here > > > is my query plan: > > > > The planner is obviously unhappy with this plan (note the large cost > > numbers), but it can't find a way to do better. An index on > > lineitem.l_partkey would help, I think. > > > > The whole query seems like it's written in a very inefficient > > fashion; > > > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized > > across > > > multiple join rows? But I dunno whether the TPC rules allow for > > significant manual rewriting of the given query. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that > your > > message can get through to the mailing list cleanly ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. ************************************* ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > Also if you have fast disk drives, you can reduce random page cost to 2 or 1.5. Note however that most of the people who have found smaller random_page_cost to be helpful are in situations where most of their data fits in RAM. Reducing the cost towards 1 simply reflects the fact that there's no sequential-fetch advantage when grabbing data that's already in RAM. When benchmarking with data sets considerably larger than available buffer cache, I rather doubt that small random_page_cost would be a good idea. Still, you might as well experiment to see. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I'm pretty certain that there are no TPC-certified test results for > Postgres, because to date no organization has cared to spend the money > needed to perform a certifiable test. Anyone have a rough idea of the costs involved? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200309291344 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE/eG+avJuQZxSWSsgRApDFAJ4md34LacZhJbjnydjNGzqfLy2IzQCg5m/8 XiD273M2ugzCWd7YF5zbkio= =jGkx -----END PGP SIGNATURE-----
On Mon, Sep 29, 2003 at 05:43:26PM -0000, greg@turnstep.com wrote: > > Anyone have a rough idea of the costs involved? I did a back-of-an-envelope calculation one day and stopped when I got to $10,000. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Oleg, > I just checked the restrictions on the TPC-R and TPC-H schemas and it > seems that all indexes are allowed in TPC-R and only those that index > parts of primary or foreign keys are allowed in TPC-H. That would be appropriate for this case though, yes? That column is part of a foriegn key, unless I've totally lost track. As I remarked before, Postgres does *not* automatically create indexes for FKs. Many, but not all, other database products do, so comparing PostgreSQL against those products without the index is unfair. -- Josh Berkus Aglio Database Solutions San Francisco
Yes Josh, L_partkey is a part of the foreign key on the Lineitem table, and it was ok to create an index on it according to the TPC-R specs. I just created indices on the rest of the FK columns in the TPC-R database and will continue my evaluations. Thanks. Oleg -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Monday, September 29, 2003 12:11 PM To: Oleg Lebedev; Mary Edie Meredith Cc: Jenny Zhang; pgsql-performance Subject: Re: [PERFORM] TPC-R benchmarks Oleg, > I just checked the restrictions on the TPC-R and TPC-H schemas and it > seems that all indexes are allowed in TPC-R and only those that index > parts of primary or foreign keys are allowed in TPC-H. That would be appropriate for this case though, yes? That column is part of a foriegn key, unless I've totally lost track. As I remarked before, Postgres does *not* automatically create indexes for FKs. Many, but not all, other database products do, so comparing PostgreSQL against those products without the index is unfair. -- Josh Berkus Aglio Database Solutions San Francisco ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
I continue struggling with the TPC-R benchmarks and wonder if anyone could help me optimize the query below. ANALYZE statistics indicate that the query should run relatively fast, but it takes hours to complete. I attached the query plan to this posting. Thanks. select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%aquamarine%' ) as profit group by nation, o_year order by nation, o_year desc; ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
On Tue, 30 Sep 2003, Oleg Lebedev wrote: > I continue struggling with the TPC-R benchmarks and wonder if anyone > could help me optimize the query below. ANALYZE statistics indicate that > the query should run relatively fast, but it takes hours to complete. I > attached the query plan to this posting. > Thanks. What are the differences between estimated and real rows and such of an explain analyze on that query? Are there any estimates that are just way off?
Tom Lane wrote: > When benchmarking with data sets considerably larger than available > buffer cache, I rather doubt that small random_page_cost would be a > good idea. Still, you might as well experiment to see. From experience, I know the difference in response time can be huge when postgres incorrectly chooses a sequential scan over an index scan. In practice, do people experience as great a difference when postgres incorrectly chooses an index scan over a sequential scan? My intuition is that the speed difference is a lot less for incorrectly choosing an index scan. If this is the case, it would be safer to chose a small value for random_page_cost. George Essig
Oleg, > I continue struggling with the TPC-R benchmarks and wonder if anyone > could help me optimize the query below. ANALYZE statistics indicate that > the query should run relatively fast, but it takes hours to complete. I > attached the query plan to this posting. Even though it takes hours to complete, I think we need you to run EXPLAIN ANALYZE instead of just EXPLAIN. Without the real-time statistics, we simply can't see what's slowing the query down. -- Josh Berkus Aglio Database Solutions San Francisco
The output of the query should contain about 200 rows. So, I guess the planer is off assuming that the query should return 1 row. I will start EXPLAIN ANALYZE now. Thanks. Oleg -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Wednesday, October 01, 2003 7:23 AM To: Oleg Lebedev Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks On Tue, 30 Sep 2003, Oleg Lebedev wrote: > I continue struggling with the TPC-R benchmarks and wonder if anyone > could help me optimize the query below. ANALYZE statistics indicate > that the query should run relatively fast, but it takes hours to > complete. I attached the query plan to this posting. Thanks. What are the differences between estimated and real rows and such of an explain analyze on that query? Are there any estimates that are just way off? ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Oleg, > The output of the query should contain about 200 rows. So, I guess the > planer is off assuming that the query should return 1 row. Oh, also did you post the query before? Can you re-post it with the planner results? -- Josh Berkus Aglio Database Solutions San Francisco
Sure, below is the query. I attached the plan to this posting. select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc; -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Wednesday, October 01, 2003 11:42 AM To: Oleg Lebedev; scott.marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Oleg, > The output of the query should contain about 200 rows. So, I guess the > planer is off assuming that the query should return 1 row. Oh, also did you post the query before? Can you re-post it with the planner results? -- Josh Berkus Aglio Database Solutions San Francisco ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
All right, my query just finished running with EXPLAIN ANALYZE. I show the plan below and also attached it as a file. Any ideas? -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual time=6674562.03..6674562.15 rows=175 loops=1) Sort Key: nation.n_name, date_part('year'::text, orders.o_orderdate) -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) (actual time=6668919.41..6674522.48 rows=175 loops=1) -> Group (cost=54597.45..54597.47 rows=3 width=121) (actual time=6668872.68..6672136.96 rows=348760 loops=1) -> Sort (cost=54597.45..54597.46 rows=3 width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) Sort Key: nation.n_name, date_part('year'::text, orders.o_orderdate) -> Hash Join (cost=54596.00..54597.42 rows=3 width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) Hash Cond: ("outer".n_nationkey = "inner".s_nationkey) -> Seq Scan on nation (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 loops=1) -> Hash (cost=54596.00..54596.00 rows=3 width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) -> Nested Loop (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 rows=348760 loops=1) Join Filter: ("inner".s_suppkey = "outer".l_suppkey) -> Nested Loop (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 rows=348760 loops=1) -> Nested Loop (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 rows=348760 loops=1) Join Filter: ("outer".p_partkey = "inner".ps_partkey) -> Nested Loop (cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10 rows=348760 loops=1) -> Seq Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual time=33.64..15651.90 rows=11637 loops=1) Filter: (p_name ~~ '%green%'::text) -> Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 width=45) (actual time=10.71..302.67 rows=30 loops=11637) Index Cond: ("outer".p_partkey = lineitem.l_partkey) -> Index Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) (actual time=0.09..0.09 rows=1 loops=348760) Index Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND (partsupp.ps_suppkey = "outer".l_suppkey)) -> Index Scan using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual time=8.62..8.62 rows=1 loops=348760) Index Cond: (orders.o_orderkey = "outer".l_orderkey) -> Index Scan using pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.08 rows=1 loops=348760) Index Cond: ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 msec (28 rows) -----Original Message----- From: Oleg Lebedev Sent: Wednesday, October 01, 2003 12:00 PM To: Josh Berkus; scott.marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Importance: Low Sure, below is the query. I attached the plan to this posting. select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc; -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Wednesday, October 01, 2003 11:42 AM To: Oleg Lebedev; scott.marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Oleg, > The output of the query should contain about 200 rows. So, I guess the > planer is off assuming that the query should return 1 row. Oh, also did you post the query before? Can you re-post it with the planner results? -- Josh Berkus Aglio Database Solutions San Francisco ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. ************************************* ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
For troubleshooting, can you try it with "set enable_nestloop = false" and rerun the query and see how long it takes? It looks like the estimates of rows returned is WAY off (estimate is too low compared to what really comes back.) Also, you might try to alter the table.column to have a higher target on the rows p_partkey and ps_partkey and any others where the estimate is so far off of the reality. On Wed, 1 Oct 2003, Oleg Lebedev wrote: > All right, my query just finished running with EXPLAIN ANALYZE. > I show the plan below and also attached it as a file. > Any ideas? > > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual > time=6674562.03..6674562.15 rows=175 loops=1) > Sort Key: nation.n_name, date_part('year'::text, > orders.o_orderdate) > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) > (actual time=6668919.41..6674522.48 rows=175 loops=1) > -> Group (cost=54597.45..54597.47 rows=3 width=121) > (actual time=6668872.68..6672136.96 rows=348760 loops=1) > -> Sort (cost=54597.45..54597.46 rows=3 > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) > Sort Key: nation.n_name, > date_part('year'::text, orders.o_orderdate) > -> Hash Join (cost=54596.00..54597.42 > rows=3 > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) > Hash Cond: ("outer".n_nationkey = > "inner".s_nationkey) > -> Seq Scan on nation > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 > loops=1) > -> Hash (cost=54596.00..54596.00 > rows=3 > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) > -> Nested Loop > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > rows=348760 loops=1) > Join Filter: > ("inner".s_suppkey = "outer".l_suppkey) > -> Nested Loop > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > rows=348760 loops=1) > -> Nested Loop > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > rows=348760 loops=1) > Join Filter: > ("outer".p_partkey = "inner".ps_partkey) > -> Nested Loop > (cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10 > rows=348760 loops=1) > -> Seq > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual > time=33.64..15651.90 rows=11637 loops=1) > > Filter: (p_name ~~ '%green%'::text) > -> Index > Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 width=45) > (actual time=10.71..302.67 rows=30 loops=11637) > > Index > Cond: ("outer".p_partkey = lineitem.l_partkey) > -> Index Scan > using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) (actual > time=0.09..0.09 rows=1 loops=348760) > Index > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND > (partsupp.ps_suppkey = > "outer".l_suppkey)) > -> Index Scan using > pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual > time=8.62..8.62 rows=1 loops=348760) > Index Cond: > (orders.o_orderkey = "outer".l_orderkey) > -> Index Scan using > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual > time=0.08..0.08 rows=1 loops=348760) > Index Cond: > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 > msec (28 rows) > > > -----Original Message----- > From: Oleg Lebedev > Sent: Wednesday, October 01, 2003 12:00 PM > To: Josh Berkus; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > Importance: Low > > > Sure, below is the query. I attached the plan to this posting. > > select > nation, > o_year, > sum(amount) as sum_profit > from > ( > select > n_name as nation, > extract(year from o_orderdate) as o_year, > l_extendedprice * (1 - l_discount) - > ps_supplycost * l_quantity as amount > from > part, > supplier, > lineitem, > partsupp, > orders, > nation > where > s_suppkey = l_suppkey > and ps_suppkey = l_suppkey > and ps_partkey = l_partkey > and p_partkey = l_partkey > and o_orderkey = l_orderkey > and s_nationkey = n_nationkey > and p_name like '%green%' > ) as profit > group by > nation, > o_year > order by > nation, > o_year desc; > > > -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Wednesday, October 01, 2003 11:42 AM > To: Oleg Lebedev; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > > > Oleg, > > > The output of the query should contain about 200 rows. So, I guess the > > > planer is off assuming that the query should return 1 row. > > Oh, also did you post the query before? Can you re-post it with the > planner > results? > >
Oleg Lebedev <oleg.lebedev@waterford.org> writes: > All right, my query just finished running with EXPLAIN ANALYZE. > I show the plan below and also attached it as a file. > Any ideas? Uh, have you done an ANALYZE (or VACUUM ANALYZE) on this database? It sure looks like the planner thinks the tables are a couple of orders of magnitude smaller than they actually are. Certainly the estimated sizes of the joins are way off :-( If you did analyze, it might help to increase the statistics target and re-analyze. regards, tom lane
Oleg, > All right, my query just finished running with EXPLAIN ANALYZE. > I show the plan below and also attached it as a file. > Any ideas? Yes. Your problem appears to be right here: > -> Nested Loop > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > rows=348760 loops=1) > Join Filter: > ("inner".s_suppkey = "outer".l_suppkey) > -> Nested Loop > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > rows=348760 loops=1) > -> Nested Loop > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > rows=348760 loops=1) > Join Filter: > ("outer".p_partkey = "inner".ps_partkey) > -> Nested Loop > (cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10 > rows=348760 loops=1) For some reason, the row estimate on the supplier --> lineitem join is bad, as is the estimate on part --> partsupp. Let me first check two things: 1) You have an index on l_suppkey and on ps_partkey. 2) you have run ANALYZE on your whole database before the query If both of those are true, I'd like to see the lines in pg_stats that apply to ps_partkey and l_suppkey; please do a: SELECT * FROM pg_stats WHERE attname = 'l_suppkey' or attname = 'ps_partkey' -- -Josh Berkus Aglio Database Solutions San Francisco
I ran VACUUM FULL ANALYZE yesterday and the re-ran the query with EXPLAIN ANALYZE. I got the same query plan and execution time. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, October 01, 2003 4:20 PM To: Oleg Lebedev Cc: Josh Berkus; scott.marlowe; pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Oleg Lebedev <oleg.lebedev@waterford.org> writes: > All right, my query just finished running with EXPLAIN ANALYZE. I show > the plan below and also attached it as a file. Any ideas? Uh, have you done an ANALYZE (or VACUUM ANALYZE) on this database? It sure looks like the planner thinks the tables are a couple of orders of magnitude smaller than they actually are. Certainly the estimated sizes of the joins are way off :-( If you did analyze, it might help to increase the statistics target and re-analyze. regards, tom lane ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
As Scott recommended, I did the following: # set enable_nestloop = false; # vacuum full analyze; After this I re-ran the query and its execution time went down from 2 hours to 2 minutes. I attached the new query plan to this posting. Is there any way to optimize it even further? What should I do to make this query run fast without hurting the performance of the other queries? Thanks. Oleg -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Wednesday, October 01, 2003 4:00 PM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks For troubleshooting, can you try it with "set enable_nestloop = false" and rerun the query and see how long it takes? It looks like the estimates of rows returned is WAY off (estimate is too low compared to what really comes back.) Also, you might try to alter the table.column to have a higher target on the rows p_partkey and ps_partkey and any others where the estimate is so far off of the reality. On Wed, 1 Oct 2003, Oleg Lebedev wrote: > All right, my query just finished running with EXPLAIN ANALYZE. I show > the plan below and also attached it as a file. Any ideas? > > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual > time=6674562.03..6674562.15 rows=175 loops=1) > Sort Key: nation.n_name, date_part('year'::text, > orders.o_orderdate) > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) > (actual time=6668919.41..6674522.48 rows=175 loops=1) > -> Group (cost=54597.45..54597.47 rows=3 width=121) > (actual time=6668872.68..6672136.96 rows=348760 loops=1) > -> Sort (cost=54597.45..54597.46 rows=3 > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) > Sort Key: nation.n_name, > date_part('year'::text, orders.o_orderdate) > -> Hash Join (cost=54596.00..54597.42 > rows=3 > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) > Hash Cond: ("outer".n_nationkey = > "inner".s_nationkey) > -> Seq Scan on nation > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 > loops=1) > -> Hash (cost=54596.00..54596.00 > rows=3 > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) > -> Nested Loop > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > rows=348760 loops=1) > Join Filter: > ("inner".s_suppkey = "outer".l_suppkey) > -> Nested Loop > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > rows=348760 loops=1) > -> Nested Loop > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > rows=348760 loops=1) > Join Filter: > ("outer".p_partkey = "inner".ps_partkey) > -> Nested > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual > time=146.85..3541433.10 rows=348760 loops=1) > -> Seq > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual > time=33.64..15651.90 rows=11637 loops=1) > > Filter: (p_name ~~ '%green%'::text) > -> > Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 > width=45) (actual time=10.71..302.67 rows=30 loops=11637) > > Index > Cond: ("outer".p_partkey = lineitem.l_partkey) > -> Index > Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) > (actual time=0.09..0.09 rows=1 loops=348760) > Index > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND > (partsupp.ps_suppkey = > "outer".l_suppkey)) > -> Index Scan > using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual > time=8.62..8.62 rows=1 loops=348760) > Index Cond: > (orders.o_orderkey = "outer".l_orderkey) > -> Index Scan using > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual > time=0.08..0.08 rows=1 loops=348760) > Index Cond: > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 > msec (28 rows) > > > -----Original Message----- > From: Oleg Lebedev > Sent: Wednesday, October 01, 2003 12:00 PM > To: Josh Berkus; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > Importance: Low > > > Sure, below is the query. I attached the plan to this posting. > > select > nation, > o_year, > sum(amount) as sum_profit > from > ( > select > n_name as nation, > extract(year from o_orderdate) as o_year, > l_extendedprice * (1 - l_discount) - > ps_supplycost * l_quantity as amount > from > part, > supplier, > lineitem, > partsupp, > orders, > nation > where > s_suppkey = l_suppkey > and ps_suppkey = l_suppkey > and ps_partkey = l_partkey > and p_partkey = l_partkey > and o_orderkey = l_orderkey > and s_nationkey = n_nationkey > and p_name like '%green%' > ) as profit > group by > nation, > o_year > order by > nation, > o_year desc; > > > -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Wednesday, October 01, 2003 11:42 AM > To: Oleg Lebedev; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > > > Oleg, > > > The output of the query should contain about 200 rows. So, I guess > > the > > > planer is off assuming that the query should return 1 row. > > Oh, also did you post the query before? Can you re-post it with the > planner > results? > > ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
Oleg, > I ran VACUUM FULL ANALYZE yesterday and the re-ran the query with > EXPLAIN ANALYZE. > I got the same query plan and execution time. How about my question? Those rows from pg_stats would be really useful in diagnosing the problem. -- Josh Berkus Aglio Database Solutions San Francisco
Have you tried increasing the statistics target for those columns that are getting bad estimates yet and then turning back on enable_nestloop and rerunning analyze and seeing how the query does? The idea being to try and get a good enough estimate of your statistics so the planner stops using nestloops on its own rather than forcing it to with enable_nestloop = false. On Thu, 2 Oct 2003, Oleg Lebedev wrote: > As Scott recommended, I did the following: > # set enable_nestloop = false; > # vacuum full analyze; > > After this I re-ran the query and its execution time went down from 2 > hours to 2 minutes. I attached the new query plan to this posting. > Is there any way to optimize it even further? > What should I do to make this query run fast without hurting the > performance of the other queries? > Thanks. > > Oleg > > -----Original Message----- > From: scott.marlowe [mailto:scott.marlowe@ihs.com] > Sent: Wednesday, October 01, 2003 4:00 PM > To: Oleg Lebedev > Cc: Josh Berkus; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > > > For troubleshooting, can you try it with "set enable_nestloop = false" > and > rerun the query and see how long it takes? > > It looks like the estimates of rows returned is WAY off (estimate is too > > low compared to what really comes back.) > > Also, you might try to alter the table.column to have a higher target on > > the rows p_partkey and ps_partkey and any others where the estimate is > so > far off of the reality. > > On Wed, 1 Oct 2003, Oleg Lebedev wrote: > > > All right, my query just finished running with EXPLAIN ANALYZE. I show > > > the plan below and also attached it as a file. Any ideas? > > > > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual > > time=6674562.03..6674562.15 rows=175 loops=1) > > Sort Key: nation.n_name, date_part('year'::text, > > orders.o_orderdate) > > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) > > (actual time=6668919.41..6674522.48 rows=175 loops=1) > > -> Group (cost=54597.45..54597.47 rows=3 width=121) > > (actual time=6668872.68..6672136.96 rows=348760 loops=1) > > -> Sort (cost=54597.45..54597.46 rows=3 > > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) > > Sort Key: nation.n_name, > > date_part('year'::text, orders.o_orderdate) > > -> Hash Join (cost=54596.00..54597.42 > > rows=3 > > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) > > Hash Cond: ("outer".n_nationkey = > > "inner".s_nationkey) > > -> Seq Scan on nation > > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 > > loops=1) > > -> Hash (cost=54596.00..54596.00 > > rows=3 > > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) > > -> Nested Loop > > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > > rows=348760 loops=1) > > Join Filter: > > ("inner".s_suppkey = "outer".l_suppkey) > > -> Nested Loop > > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > > rows=348760 loops=1) > > -> Nested Loop > > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > > rows=348760 loops=1) > > Join Filter: > > ("outer".p_partkey = "inner".ps_partkey) > > -> Nested > > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual > > time=146.85..3541433.10 rows=348760 loops=1) > > -> Seq > > > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual > > time=33.64..15651.90 rows=11637 loops=1) > > > > Filter: (p_name ~~ '%green%'::text) > > -> > > Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 > > width=45) (actual time=10.71..302.67 rows=30 loops=11637) > > > > Index > > Cond: ("outer".p_partkey = lineitem.l_partkey) > > -> Index > > Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) > > (actual time=0.09..0.09 rows=1 loops=348760) > > Index > > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND > > (partsupp.ps_suppkey = > > "outer".l_suppkey)) > > -> Index Scan > > using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual > > time=8.62..8.62 rows=1 loops=348760) > > Index Cond: > > (orders.o_orderkey = "outer".l_orderkey) > > -> Index Scan using > > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual > > time=0.08..0.08 rows=1 loops=348760) > > Index Cond: > > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 > > msec (28 rows) > > > > > > -----Original Message----- > > From: Oleg Lebedev > > Sent: Wednesday, October 01, 2003 12:00 PM > > To: Josh Berkus; scott.marlowe > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] TPC-R benchmarks > > Importance: Low > > > > > > Sure, below is the query. I attached the plan to this posting. > > > > select > > nation, > > o_year, > > sum(amount) as sum_profit > > from > > ( > > select > > n_name as nation, > > extract(year from o_orderdate) as o_year, > > l_extendedprice * (1 - l_discount) - > > ps_supplycost * l_quantity as amount > > from > > part, > > supplier, > > lineitem, > > partsupp, > > orders, > > nation > > where > > s_suppkey = l_suppkey > > and ps_suppkey = l_suppkey > > and ps_partkey = l_partkey > > and p_partkey = l_partkey > > and o_orderkey = l_orderkey > > and s_nationkey = n_nationkey > > and p_name like '%green%' > > ) as profit > > group by > > nation, > > o_year > > order by > > nation, > > o_year desc; > > > > > > -----Original Message----- > > From: Josh Berkus [mailto:josh@agliodbs.com] > > Sent: Wednesday, October 01, 2003 11:42 AM > > To: Oleg Lebedev; scott.marlowe > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] TPC-R benchmarks > > > > > > Oleg, > > > > > The output of the query should contain about 200 rows. So, I guess > > > the > > > > > planer is off assuming that the query should return 1 row. > > > > Oh, also did you post the query before? Can you re-post it with the > > planner > > results? > > > > > > ************************************* > > This e-mail may contain privileged or confidential material intended for the named recipient only. > If you are not the named recipient, delete this message and all attachments. > Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. > We reserve the right to monitor e-mail sent through our network. > > ************************************* >
I was trying to get the pg_stats information to Josh and decided to recreate the indexes on all my tables. After that I ran vacuum full analyze, re-enabled nestloop and ran explain analyze on the query. It ran in about 2 minutes. I attached the new query plan. I am not sure what did the trick, but 2 minutes is much better than 2 hours. But then again, I can't take long lunches anymore :) Is there any way to make this query run even faster without increasing the memory dedicated to postgres? Thanks. Oleg -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Thursday, October 02, 2003 10:29 AM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: RE: [PERFORM] TPC-R benchmarks Have you tried increasing the statistics target for those columns that are getting bad estimates yet and then turning back on enable_nestloop and rerunning analyze and seeing how the query does? The idea being to try and get a good enough estimate of your statistics so the planner stops using nestloops on its own rather than forcing it to with enable_nestloop = false. On Thu, 2 Oct 2003, Oleg Lebedev wrote: > As Scott recommended, I did the following: > # set enable_nestloop = false; > # vacuum full analyze; > > After this I re-ran the query and its execution time went down from 2 > hours to 2 minutes. I attached the new query plan to this posting. Is > there any way to optimize it even further? What should I do to make > this query run fast without hurting the performance of the other > queries? Thanks. > > Oleg > > -----Original Message----- > From: scott.marlowe [mailto:scott.marlowe@ihs.com] > Sent: Wednesday, October 01, 2003 4:00 PM > To: Oleg Lebedev > Cc: Josh Berkus; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > > > For troubleshooting, can you try it with "set enable_nestloop = false" > and rerun the query and see how long it takes? > > It looks like the estimates of rows returned is WAY off (estimate is > too > > low compared to what really comes back.) > > Also, you might try to alter the table.column to have a higher target > on > > the rows p_partkey and ps_partkey and any others where the estimate is > so far off of the reality. > > On Wed, 1 Oct 2003, Oleg Lebedev wrote: > > > All right, my query just finished running with EXPLAIN ANALYZE. I > > show > > > the plan below and also attached it as a file. Any ideas? > > > > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual > > time=6674562.03..6674562.15 rows=175 loops=1) > > Sort Key: nation.n_name, date_part('year'::text, > > orders.o_orderdate) > > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) > > (actual time=6668919.41..6674522.48 rows=175 loops=1) > > -> Group (cost=54597.45..54597.47 rows=3 width=121) > > (actual time=6668872.68..6672136.96 rows=348760 loops=1) > > -> Sort (cost=54597.45..54597.46 rows=3 > > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) > > Sort Key: nation.n_name, > > date_part('year'::text, orders.o_orderdate) > > -> Hash Join (cost=54596.00..54597.42 > > rows=3 > > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) > > Hash Cond: ("outer".n_nationkey = > > "inner".s_nationkey) > > -> Seq Scan on nation > > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 > > loops=1) > > -> Hash (cost=54596.00..54596.00 > > rows=3 > > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) > > -> Nested Loop > > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > > rows=348760 loops=1) > > Join Filter: > > ("inner".s_suppkey = "outer".l_suppkey) > > -> Nested Loop > > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > > rows=348760 loops=1) > > -> Nested Loop > > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > > rows=348760 loops=1) > > Join Filter: > > ("outer".p_partkey = "inner".ps_partkey) > > -> Nested > > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual > > time=146.85..3541433.10 rows=348760 loops=1) > > -> Seq > > > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual > > time=33.64..15651.90 rows=11637 loops=1) > > > > Filter: (p_name ~~ '%green%'::text) > > -> > > Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 > > width=45) (actual time=10.71..302.67 rows=30 loops=11637) > > > > Index > > Cond: ("outer".p_partkey = lineitem.l_partkey) > > -> Index > > Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) > > (actual time=0.09..0.09 rows=1 loops=348760) > > Index > > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND > > (partsupp.ps_suppkey = > > "outer".l_suppkey)) > > -> Index Scan > > using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual > > time=8.62..8.62 rows=1 loops=348760) > > Index Cond: > > (orders.o_orderkey = "outer".l_orderkey) > > -> Index Scan using > > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual > > time=0.08..0.08 rows=1 loops=348760) > > Index Cond: > > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 > > msec (28 rows) > > > > > > -----Original Message----- > > From: Oleg Lebedev > > Sent: Wednesday, October 01, 2003 12:00 PM > > To: Josh Berkus; scott.marlowe > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] TPC-R benchmarks > > Importance: Low > > > > > > Sure, below is the query. I attached the plan to this posting. > > > > select > > nation, > > o_year, > > sum(amount) as sum_profit > > from > > ( > > select > > n_name as nation, > > extract(year from o_orderdate) as o_year, > > l_extendedprice * (1 - l_discount) - > > ps_supplycost * l_quantity as amount > > from > > part, > > supplier, > > lineitem, > > partsupp, > > orders, > > nation > > where > > s_suppkey = l_suppkey > > and ps_suppkey = l_suppkey > > and ps_partkey = l_partkey > > and p_partkey = l_partkey > > and o_orderkey = l_orderkey > > and s_nationkey = n_nationkey > > and p_name like '%green%' > > ) as profit > > group by > > nation, > > o_year > > order by > > nation, > > o_year desc; > > > > > > -----Original Message----- > > From: Josh Berkus [mailto:josh@agliodbs.com] > > Sent: Wednesday, October 01, 2003 11:42 AM > > To: Oleg Lebedev; scott.marlowe > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] TPC-R benchmarks > > > > > > Oleg, > > > > > The output of the query should contain about 200 rows. So, I guess > > > the > > > > > planer is off assuming that the query should return 1 row. > > > > Oh, also did you post the query before? Can you re-post it with the > > planner > > results? > > > > > > ************************************* > > This e-mail may contain privileged or confidential material intended > for the named recipient only. If you are not the named recipient, > delete this message and all attachments. Unauthorized reviewing, > copying, printing, disclosing, or otherwise using information in this > e-mail is prohibited. We reserve the right to monitor e-mail sent > through our network. > > ************************************* > ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
On Thu, 2 Oct 2003, Oleg Lebedev wrote: > I was trying to get the pg_stats information to Josh and decided to > recreate the indexes on all my tables. After that I ran vacuum full > analyze, re-enabled nestloop and ran explain analyze on the query. It > ran in about 2 minutes. > I attached the new query plan. I am not sure what did the trick, but 2 > minutes is much better than 2 hours. But then again, I can't take long > lunches anymore :) > Is there any way to make this query run even faster without increasing > the memory dedicated to postgres? > Thanks. As long as the estimated row counts and real ones match up, and postgresql seems to be picking the right plan, there's probably not a lot to be done. You might want to look at increasing sort_mem a bit, but don't go crazy, as being too high can result in swap storms under load, which are a very bad thing. I'd check for index growth. You may have been reloading your data over and over and had an index growth problem. Next time instead of recreating the indexed completely, you might wanna try reindex indexname. Also, 7.4 mostly fixes the index growth issue, especially as it applies to truncating/reloading a table over and over, so moving to 7.4 beta3/4 and testing might be a good idea (if you aren't there already). What you want to avoid is having postgresql switch back to that nestloop join on you in the middle of the day, and to prevent that you might need to have higher statistics targets so the planner gets the right number all the time.
Thanks everyone for the help. I have another question. How do I optimize my indexes for the query that contains a lot of ORed blocks, each of which contains a bunch of ANDed expressions? The structure of each ORed block is the same except the right-hand-side values vary. The first expression of each AND-block is a join condition. However, postgres tries to use a sequential scan on both of the tables applying the OR-ed blocks of ANDed expressions. So, the cost of the plan is around 700,000,000,000. Here is an example: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 4 and l_quantity <= 4 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 18 and l_quantity <= 18 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Thursday, October 02, 2003 1:44 PM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: RE: [PERFORM] TPC-R benchmarks On Thu, 2 Oct 2003, Oleg Lebedev wrote: > I was trying to get the pg_stats information to Josh and decided to > recreate the indexes on all my tables. After that I ran vacuum full > analyze, re-enabled nestloop and ran explain analyze on the query. It > ran in about 2 minutes. I attached the new query plan. I am not sure > what did the trick, but 2 minutes is much better than 2 hours. But > then again, I can't take long lunches anymore :) > Is there any way to make this query run even faster without increasing > the memory dedicated to postgres? > Thanks. As long as the estimated row counts and real ones match up, and postgresql seems to be picking the right plan, there's probably not a lot to be done. You might want to look at increasing sort_mem a bit, but don't go crazy, as being too high can result in swap storms under load, which are a very bad thing. I'd check for index growth. You may have been reloading your data over and over and had an index growth problem. Next time instead of recreating the indexed completely, you might wanna try reindex indexname. Also, 7.4 mostly fixes the index growth issue, especially as it applies to truncating/reloading a table over and over, so moving to 7.4 beta3/4 and testing might be a good idea (if you aren't there already). What you want to avoid is having postgresql switch back to that nestloop join on you in the middle of the day, and to prevent that you might need to have higher statistics targets so the planner gets the right number all the time. ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Oleg, > I have another question. How do I optimize my indexes for the query that > contains a lot of ORed blocks, each of which contains a bunch of ANDed > expressions? The structure of each ORed block is the same except the > right-hand-side values vary. Given the example, I'd do a multicolumn index on p_brand, p_container, p_size and a second multicolumn index on l_partkey, l_quantity, l_shipmode. Hmmm ... or maybe seperate indexes, one on l_partkey and one on l_quantity, l_shipmode & l_instruct. Test both configurations. Mind you, if this is also an OLTP table, then you'd want to test those multi-column indexes to determine the least columns you need for the indexes still to be used, since more columns = more index maintainence. -- Josh Berkus Aglio Database Solutions San Francisco
Josh, I declared all the indexes that you suggested and ran vacuum full analyze. The query plan has not changed and it's still trying to use seqscan. I tried to disable seqscan, but the plan didn't change. Any other suggestions? I started explain analyze on the query, but I doubt it will finish any time soon. Thanks. Oleg -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Thursday, October 02, 2003 11:27 PM To: Oleg Lebedev; scott.marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Oleg, > I have another question. How do I optimize my indexes for the query > that contains a lot of ORed blocks, each of which contains a bunch of > ANDed expressions? The structure of each ORed block is the same except > the right-hand-side values vary. Given the example, I'd do a multicolumn index on p_brand, p_container, p_size and a second multicolumn index on l_partkey, l_quantity, l_shipmode. Hmmm ... or maybe seperate indexes, one on l_partkey and one on l_quantity, l_shipmode & l_instruct. Test both configurations. Mind you, if this is also an OLTP table, then you'd want to test those multi-column indexes to determine the least columns you need for the indexes still to be used, since more columns = more index maintainence. -- Josh Berkus Aglio Database Solutions San Francisco ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Oleg, > I declared all the indexes that you suggested and ran vacuum full > analyze. The query plan has not changed and it's still trying to use > seqscan. I tried to disable seqscan, but the plan didn't change. Any > other suggestions? > I started explain analyze on the query, but I doubt it will finish any > time soon. Can I get a copy of the database so that I can tinker? I'm curious now, plus I want our benchmarks to look good. I have a private FTP if that helps. -- Josh Berkus Aglio Database Solutions San Francisco
Josh, My data directory is 3.8 GB. I can send you flat data files and scripts to create indices, but still it would be about 1.3 GB of data. Do you still want me to transfer data to you? If yes, then just give me your FTP address. Thanks. Oleg -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Friday, October 03, 2003 11:22 AM To: Oleg Lebedev; scott.marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks Oleg, > I declared all the indexes that you suggested and ran vacuum full > analyze. The query plan has not changed and it's still trying to use > seqscan. I tried to disable seqscan, but the plan didn't change. Any > other suggestions? I started explain analyze on the query, but I doubt > it will finish any time soon. Can I get a copy of the database so that I can tinker? I'm curious now, plus I want our benchmarks to look good. I have a private FTP if that helps. -- Josh Berkus Aglio Database Solutions San Francisco ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
On Thu, 2003-09-25 at 14:32, Jenny Zhang wrote: > I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel > 2.5.74. Q17 can always finish in about 7 seconds on my system. The > execution plan is: I just want to point out that we are the OSDL are not running a TPC-X anything. We have fair use implementations of these benchmarks but because of differences our performance tests can not be compared with the TPCC's benchmark results. > ---------------------------------------------------------------------------------------------------- > Aggregate (cost=780402.43..780402.43 rows=1 width=48) > -> Nested Loop (cost=0.00..780397.50 rows=1973 width=48) > Join Filter: ("inner".l_quantity < (subplan)) > -> Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) > Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container > = 'LG CASE'::bpchar)) > -> Index Scan using i_l_partkey on lineitem > (cost=0.00..124.32 rows=30 width=36) > Index Cond: ("outer".p_partkey = lineitem.l_partkey) > SubPlan > -> Aggregate (cost=124.40..124.40 rows=1 width=11) > -> Index Scan using i_l_partkey on lineitem > (cost=0.00..124.32 rows=30 width=11) > Index Cond: (l_partkey = $0) > (11 rows) > > Hope this helps, > Jenny > On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: > > I am running TPC-R benchmarks with a scale factor of 1, which correspond > > to approximately 1 GB database size on PostgreSQL 7.3.4 installed on > > CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges > > installation. > > Most of the queries were able to complete in a matter of minutes, but > > query 17 was taking hours and hours. The query is show below. Is there > > any way to optimize it ? > > > > select > > sum(l_extendedprice) / 7.0 as avg_yearly > > from > > lineitem, > > part > > where > > p_partkey = l_partkey > > and p_brand = 'Brand#11' > > and p_container = 'SM PKG' > > and l_quantity < ( > > select > > 0.2 * avg(l_quantity) > > from > > lineitem > > where > > l_partkey = p_partkey > > ); > > > > Thanks. > > > > Oleg > > > > ************************************* > > > > This e-mail may contain privileged or confidential material intended for the named recipient only. > > If you are not the named recipient, delete this message and all attachments. > > Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. > > We reserve the right to monitor e-mail sent through our network. > > > > ************************************* > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Timothy D. Witham - Lab Director - wookie@osdlab.org Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005 (503)-626-2455 x11 (office) (503)-702-2871 (cell) (503)-626-2436 (fax)
Tom, I've found the problem with TPC-R query #19. And it, unfortunately, appears to be a problem in the PostgreSQL query planner. To sum up the below: it appears that whenever a set of WHERE conditions exceeds a certain level of complexity, the planner just ignores all applicable indexes and goes for a seq scan. While this may be unavoidable to some degree, it seems to me that we need to raise the threshold of complexity at which it does this. tpcr=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226 (prerelease) (SuSE Linux) (1 row) I've tested a number of indexes on the query, and found the two most efficient on subsets of the query. Thus: explain analyze select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10380.70..10380.70 rows=1 width=30) (actual time=161.61..161.61 rows=1 loops=1) -> Nested Loop (cost=0.00..10380.67 rows=13 width=30) (actual time=81.54..161.47 rows=17 loops=1) -> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=62 width=4) (actual time=81.21..137.24 rows=98 loops=1) Index Cond: (p_brand = 'Brand#33'::bpchar) Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container = 'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5)) -> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.84 rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98) Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND (lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric)) Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) Total runtime: 161.71 msec This also works for a similar query: explain analyze select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 14 and l_quantity <= 14 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Aggregate (cost=11449.36..11449.36 rows=1 width=30) (actual time=195.72..195.72 rows=1 loops=1) -> Nested Loop (cost=0.00..11449.29 rows=28 width=30) (actual time=56.42..195.39 rows=48 loops=1) -> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=139 width=4) (actual time=56.15..153.17 rows=166 loops=1) Index Cond: (p_brand = 'Brand#52'::bpchar) Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container = 'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container = 'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10)) -> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.29 rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166) Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND (lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <= 24::numeric)) Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) Total runtime: 195.82 msec (9 rows) If, however, I combine the two where clauses with an OR, the planner gets confused and insists on loading the entire tables into memory (even though I don't have that much memory): explain select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 14 and l_quantity <= 14 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Aggregate (cost=488301096525.25..488301096525.25 rows=1 width=146) -> Nested Loop (cost=0.00..488301096525.15 rows=42 width=146) Join Filter: ((("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar))) -> Seq Scan on lineitem (cost=0.00..235620.15 rows=6001215 width=95) -> Seq Scan on part (cost=0.00..7367.00 rows=200000 width=51) You'll pardon me for not doing an "ANALYZE", but I didn't want to wait overnight. Manually disabling Seqscan and Nestloop did nothing to affect this query plan; neither did removing the aggregate. Tommorrow I will test 7.4 Beta 4. How can we fix this? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > To sum up the below: it appears that whenever a set of WHERE conditions > exceeds a certain level of complexity, the planner just ignores all > applicable indexes and goes for a seq scan. It looks to me like the planner is coercing the WHERE clause into canonical OR-of-ANDs form (DNF). Which is often a good heuristic but it seems unhelpful for this query. > How can we fix this? Feel free to propose improvements to the heuristics in src/backend/optimizer/prep/prepqual.c ... regards, tom lane