Thread: 7.0.2 and 6.5.3 performance
After upgrading to 7.0.2, one of my queries ran much slower on 7.0.2: 7.0.2 took 17 minutes (After 'optimization' by SET ENABLE_SEQSCAN=OFF) 6.5.3 took 2 minutes The following is the query/explain under 6.5.3: super=> select version(); version ------------------------------------------------------------------- PostgreSQL 6.5.3 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66 (1 row) explain select itm_section, itm_brand, itm_itemno, sum(stl_qty) from stl, item where stl_itemno = itm_itemno and stl_discounttype in ('','S','V') and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' and stl_status='N' and itm_sectiontype='O' and itm_section >='101' and itm_section <='135' group by itm_section, itm_brand, itm_itemno; NOTICE: QUERY PLAN: Aggregate (cost=25780.13 rows=55386 width=72) -> Group (cost=25780.13 rows=55386 width=72) -> Sort (cost=25780.13 rows=55386 width=72) -> Hash Join (cost=25780.13 rows=55386 width=72) -> Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36) -> Hash (cost=3225.66 rows=2879 width=36) -> Seq Scan on item (cost=3225.66 rows=2879 width=36) Under 7.0.2, there are no Hash Join or Hash Are there any way to optimize this type of query? I have SET ENABLE_SEQSCAN=OFF and increased the buffer size. Without SEQSCAN=OFF the query took even longer than 17 minutes. Thanks, Thomas.
lec <englim@pc.jaring.my> writes: > Under 7.0.2, there are no Hash Join or Hash So, then, what *does* it do? Without the explain for 7.0.2 it's impossible to tell what's happening. > Are there any way to optimize this type of query? Um, have you vacuum analyzed the tables involved? regards, tom lane
Tom Lane wrote: > lec <englim@pc.jaring.my> writes: > > Under 7.0.2, there are no Hash Join or Hash > > So, then, what *does* it do? Without the explain for 7.0.2 it's > impossible to tell what's happening. I'll re-install 7.0.2 later & run the explain. > > > > Are there any way to optimize this type of query? > > Um, have you vacuum analyzed the tables involved? Yes.
Tom Lane wrote: > lec <englim@pc.jaring.my> writes: > > Under 7.0.2, there are no Hash Join or Hash > > So, then, what *does* it do? Without the explain for 7.0.2 it's > impossible to tell what's happening. ======================================================== The following is the query/explain under 7.0.2: super=> select version(); version --------------------------------------------------------------------- PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66 (1 row) set enable_seqscan=on; psql:explainsum.sql:48: NOTICE: QUERY PLAN: Aggregate (cost=27770.89..27770.90 rows=0 width=72) -> Group (cost=27770.89..27770.89 rows=1 width=72) -> Sort (cost=27770.89..27770.89 rows=1 width=72) -> Nested Loop (cost=0.00..27770.88 rows=1 width=72) -> Seq Scan on item (cost=0.00..2549.45 rows=1 width=36) -> Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36) set enable_seqscan=off; psql:explainsum.sql:62: NOTICE: QUERY PLAN: Aggregate (cost=42806.95..42806.96 rows=0 width=72) -> Group (cost=42806.95..42806.96 rows=1 width=72) -> Sort (cost=42806.95..42806.95 rows=1 width=72) -> Nested Loop (cost=0.00..42806.94 rows=1 width=72) -> Index Scan using item_pkey on item (cost=0.00..17585.52 rows=1 width=36) -> Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36) ======================================================== The following is the query/explain under 6.5.3: super=> select version(); version ------------------------------------------------------------------- PostgreSQL 6.5.3 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66 (1 row) explain select itm_section, itm_brand, itm_itemno, sum(stl_qty) from stl, item where stl_itemno = itm_itemno and stl_discounttype in ('','S','V') and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' and stl_status='N' and itm_sectiontype='O' and itm_section >='101' and itm_section <='135' group by itm_section, itm_brand, itm_itemno; NOTICE: QUERY PLAN: Aggregate (cost=25780.13 rows=55386 width=72) -> Group (cost=25780.13 rows=55386 width=72) -> Sort (cost=25780.13 rows=55386 width=72) -> Hash Join (cost=25780.13 rows=55386 width=72) -> Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36) -> Hash (cost=3225.66 rows=2879 width=36) -> Seq Scan on item (cost=3225.66 rows=2879 width=36) > > > > Are there any way to optimize this type of query? > > Um, have you vacuum analyzed the tables involved? The tables were vacuumed before explain. The time required for 7.0.2 is exponential to the amount of data being queried whereas for 6.5.3, it is linear to the amount of data. Thanks & regards, Thomas.
On Sat, Sep 16, 2000 at 12:42:30PM +0800, lec wrote: > Tom Lane wrote: > > > > Um, have you vacuum analyzed the tables involved? > > The tables were vacuumed before explain. > Just to clairify: 'VACUUM'ed, or 'VACUUM ANALYZE'd? There's a difference. A plain vacuum just reclaims space, the ANALYZE keyword is needed to update the stats. -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
"Ross J. Reedstrom" wrote: > On Sat, Sep 16, 2000 at 12:42:30PM +0800, lec wrote: > > Tom Lane wrote: > > > > > > Um, have you vacuum analyzed the tables involved? > > > > The tables were vacuumed before explain. > > > > Just to clairify: 'VACUUM'ed, or 'VACUUM ANALYZE'd? There's a > difference. A plain vacuum just reclaims space, the ANALYZE keyword is > needed to update the stats. Yes, the tables were VACUUM ANALYZE'd. Regards, Thomas.
lec <englim@pc.jaring.my> writes: > explain select itm_section, itm_brand, itm_itemno, sum(stl_qty) > from stl, item where stl_itemno = itm_itemno > and stl_discounttype in ('','S','V') > and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' > and stl_status='N' > and itm_sectiontype='O' > and itm_section >='101' > and itm_section <='135' > group by itm_section, itm_brand, itm_itemno; > The following is the query/explain under 7.0.2: > -> Nested Loop (cost=0.00..27770.88 rows=1 width=72) > -> Seq Scan on item (cost=0.00..2549.45 rows=1 width=36) > -> Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36) > > The following is the query/explain under 6.5.3: > -> Hash Join (cost=25780.13 rows=55386 width=72) > -> Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36) > -> Hash (cost=3225.66 rows=2879 width=36) > -> Seq Scan on item (cost=3225.66 rows=2879 width=36) What's driving the difference in plan choice is evidently the much smaller number of rows that 7.0 estimates will match the query restrictions. Specifically, it's figuring that only one item row will match and itm_sectiontype='O' and itm_section >='101' and itm_section <='135' whereas 6.5 estimates 2879. With only one row to match, it figures it might as well just do the join with a nested loop and not bother to set up a hashtable. Since you're complaining, I suppose this statistical estimate is way off :-(. What do you actually get from select count(*) from stl where stl_discounttype in ('','S','V') and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' and stl_status='N'; select count(*) from item where itm_sectiontype='O' and itm_section >='101' and itm_section <='135'; ? I suspect the reason for the difference in estimated row counts is that 7.0 recognizes that these are range queries; instead of treating "stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'" as two independent clauses, or "itm_section >='101' and itm_section <='135'" as independent clauses, it combines those clauses and tries to estimate the fraction of the data range being requested. This is usually a win but it's possible to lose rather badly if the requested range covers a spike in the data's frequency distribution. We don't yet have adequate statistics to know that a spike may be involved... regards, tom lane
Tom Lane wrote: > Since you're complaining, I suppose this statistical estimate is way > off :-(. What do you actually get from > > select count(*) from stl where > stl_discounttype in ('','S','V') > and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' > and stl_status='N'; 7678 > > > select count(*) from item where > itm_sectiontype='O' > and itm_section >='101' > and itm_section <='135'; > ? > 9973 Table stl is quite big. Eg. if I run select count(*) from stl where stl_discounttype in ('','S','V') and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 31, 2000' and stl_status='N'; The count is 157,997. select count(*) from stl; The count is 1,935,135. Thanks & regards, Thomas.