Thread: 7.0.2 and 6.5.3 performance

7.0.2 and 6.5.3 performance

From
lec
Date:
After upgrading to 7.0.2, one of my queries ran much slower on 7.0.2:

7.0.2  took 17 minutes  (After 'optimization' by SET ENABLE_SEQSCAN=OFF)

6.5.3 took 2 minutes

The following is the query/explain under 6.5.3:

super=> select version();
version
-------------------------------------------------------------------
PostgreSQL 6.5.3 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66
(1 row)

explain  select itm_section, itm_brand, itm_itemno, sum(stl_qty)
     from stl, item where stl_itemno = itm_itemno
        and stl_discounttype in ('','S','V')
        and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
        and stl_status='N'
        and itm_sectiontype='O'
        and itm_section >='101'
        and itm_section <='135'
        group by itm_section, itm_brand, itm_itemno;
NOTICE:  QUERY PLAN:

Aggregate  (cost=25780.13 rows=55386 width=72)
  ->  Group  (cost=25780.13 rows=55386 width=72)
        ->  Sort  (cost=25780.13 rows=55386 width=72)
              ->  Hash Join  (cost=25780.13 rows=55386 width=72)
                    ->  Index Scan using stl_datetrx on stl
(cost=15488.75
rows=210658 width=36)
                    ->  Hash  (cost=3225.66 rows=2879 width=36)
                          ->  Seq Scan on item  (cost=3225.66 rows=2879
width=36)

Under 7.0.2, there are no Hash Join or Hash

Are there any way to optimize this type of query?  I have SET
ENABLE_SEQSCAN=OFF
and increased the buffer size.  Without SEQSCAN=OFF the query took even
longer than
17 minutes.


Thanks,
Thomas.






Re: 7.0.2 and 6.5.3 performance

From
Tom Lane
Date:
lec <englim@pc.jaring.my> writes:
> Under 7.0.2, there are no Hash Join or Hash

So, then, what *does* it do?  Without the explain for 7.0.2 it's
impossible to tell what's happening.

> Are there any way to optimize this type of query?

Um, have you vacuum analyzed the tables involved?

            regards, tom lane

Re: 7.0.2 and 6.5.3 performance

From
lec
Date:
Tom Lane wrote:

> lec <englim@pc.jaring.my> writes:
> > Under 7.0.2, there are no Hash Join or Hash
>
> So, then, what *does* it do?  Without the explain for 7.0.2 it's
> impossible to tell what's happening.

I'll re-install 7.0.2 later & run the explain.

>
>
> > Are there any way to optimize this type of query?
>
> Um, have you vacuum analyzed the tables involved?

Yes.





Re: 7.0.2 and 6.5.3 performance

From
lec
Date:
Tom Lane wrote:

> lec <englim@pc.jaring.my> writes:
> > Under 7.0.2, there are no Hash Join or Hash
>
> So, then, what *does* it do?  Without the explain for 7.0.2 it's
> impossible to tell what's happening.

========================================================
The following is the query/explain under 7.0.2:

super=> select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66
(1 row)


set enable_seqscan=on;

psql:explainsum.sql:48: NOTICE:  QUERY PLAN:

Aggregate  (cost=27770.89..27770.90 rows=0 width=72)
  ->  Group  (cost=27770.89..27770.89 rows=1 width=72)
        ->  Sort  (cost=27770.89..27770.89 rows=1 width=72)
              ->  Nested Loop  (cost=0.00..27770.88 rows=1 width=72)
                    ->  Seq Scan on item  (cost=0.00..2549.45 rows=1
width=36)
                    ->  Index Scan using stl_datetrx on stl
(cost=0.00..25157.34 rows=5127 width=36)


set enable_seqscan=off;

psql:explainsum.sql:62: NOTICE:  QUERY PLAN:

Aggregate  (cost=42806.95..42806.96 rows=0 width=72)
  ->  Group  (cost=42806.95..42806.96 rows=1 width=72)
        ->  Sort  (cost=42806.95..42806.95 rows=1 width=72)
              ->  Nested Loop  (cost=0.00..42806.94 rows=1 width=72)
                    ->  Index Scan using item_pkey on item
(cost=0.00..17585.52 rows=1 width=36)
                    ->  Index Scan using stl_datetrx on stl
(cost=0.00..25157.34 rows=5127 width=36)



========================================================
The following is the query/explain under 6.5.3:

super=> select version();
version
-------------------------------------------------------------------
PostgreSQL 6.5.3 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66
(1 row)

explain  select itm_section, itm_brand, itm_itemno, sum(stl_qty)
     from stl, item where stl_itemno = itm_itemno
        and stl_discounttype in ('','S','V')
        and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
        and stl_status='N'
        and itm_sectiontype='O'
        and itm_section >='101'
        and itm_section <='135'
        group by itm_section, itm_brand, itm_itemno;
NOTICE:  QUERY PLAN:

Aggregate  (cost=25780.13 rows=55386 width=72)
  ->  Group  (cost=25780.13 rows=55386 width=72)
        ->  Sort  (cost=25780.13 rows=55386 width=72)
              ->  Hash Join  (cost=25780.13 rows=55386 width=72)
                    ->  Index Scan using stl_datetrx on stl
(cost=15488.75
rows=210658 width=36)
                    ->  Hash  (cost=3225.66 rows=2879 width=36)
                          ->  Seq Scan on item  (cost=3225.66 rows=2879
width=36)


>
>
> > Are there any way to optimize this type of query?
>
> Um, have you vacuum analyzed the tables involved?

The tables were vacuumed before explain.

The time required for 7.0.2 is exponential to the amount of data being
queried whereas for 6.5.3, it is linear to the amount of data.


Thanks & regards,
Thomas.


Re: 7.0.2 and 6.5.3 performance

From
"Ross J. Reedstrom"
Date:
On Sat, Sep 16, 2000 at 12:42:30PM +0800, lec wrote:
> Tom Lane wrote:
> >
> > Um, have you vacuum analyzed the tables involved?
>
> The tables were vacuumed before explain.
>

Just to clairify: 'VACUUM'ed, or 'VACUUM ANALYZE'd? There's a
difference. A plain vacuum just reclaims space, the ANALYZE keyword is
needed to update the stats.


--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: 7.0.2 and 6.5.3 performance

From
lec
Date:
"Ross J. Reedstrom" wrote:

> On Sat, Sep 16, 2000 at 12:42:30PM +0800, lec wrote:
> > Tom Lane wrote:
> > >
> > > Um, have you vacuum analyzed the tables involved?
> >
> > The tables were vacuumed before explain.
> >
>
> Just to clairify: 'VACUUM'ed, or 'VACUUM ANALYZE'd? There's a
> difference. A plain vacuum just reclaims space, the ANALYZE keyword is
> needed to update the stats.

Yes, the tables were VACUUM ANALYZE'd.

Regards,
Thomas.


Re: 7.0.2 and 6.5.3 performance

From
Tom Lane
Date:
lec <englim@pc.jaring.my> writes:
> explain  select itm_section, itm_brand, itm_itemno, sum(stl_qty)
>      from stl, item where stl_itemno = itm_itemno
>         and stl_discounttype in ('','S','V')
>         and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
>         and stl_status='N'
>         and itm_sectiontype='O'
>         and itm_section >='101'
>         and itm_section <='135'
>         group by itm_section, itm_brand, itm_itemno;

> The following is the query/explain under 7.0.2:
>    ->  Nested Loop  (cost=0.00..27770.88 rows=1 width=72)
>          ->  Seq Scan on item  (cost=0.00..2549.45 rows=1 width=36)
>          ->  Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36)
>
> The following is the query/explain under 6.5.3:
>    ->  Hash Join  (cost=25780.13 rows=55386 width=72)
>          ->  Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36)
>          ->  Hash  (cost=3225.66 rows=2879 width=36)
>                ->  Seq Scan on item  (cost=3225.66 rows=2879 width=36)

What's driving the difference in plan choice is evidently the much
smaller number of rows that 7.0 estimates will match the query
restrictions.  Specifically, it's figuring that only one item row
will match
        and itm_sectiontype='O'
        and itm_section >='101'
        and itm_section <='135'
whereas 6.5 estimates 2879.  With only one row to match, it figures it
might as well just do the join with a nested loop and not bother to set
up a hashtable.

Since you're complaining, I suppose this statistical estimate is way
off :-(.  What do you actually get from

     select count(*) from stl where
            stl_discounttype in ('','S','V')
        and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
        and stl_status='N';

     select count(*) from item where
            itm_sectiontype='O'
        and itm_section >='101'
        and itm_section <='135';
?

I suspect the reason for the difference in estimated row counts is
that 7.0 recognizes that these are range queries; instead of treating
"stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'" as two
independent clauses, or "itm_section >='101' and itm_section <='135'"
as independent clauses, it combines those clauses and tries to estimate
the fraction of the data range being requested.  This is usually a win
but it's possible to lose rather badly if the requested range covers
a spike in the data's frequency distribution.  We don't yet have
adequate statistics to know that a spike may be involved...

            regards, tom lane

Re: 7.0.2 and 6.5.3 performance

From
lec
Date:
Tom Lane wrote:

> Since you're complaining, I suppose this statistical estimate is way
> off :-(.  What do you actually get from
>
>      select count(*) from stl where
>             stl_discounttype in ('','S','V')
>         and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
>         and stl_status='N';

7678

>
>
>      select count(*) from item where
>             itm_sectiontype='O'
>         and itm_section >='101'
>         and itm_section <='135';
> ?
>

9973

Table stl is quite big.  Eg.  if I run

     select count(*) from stl where
           stl_discounttype in ('','S','V')
       and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 31, 2000'
       and stl_status='N';

The count is 157,997.

     select count(*) from stl;

The count is 1,935,135.


Thanks & regards,
Thomas.