Re: index problem - Mailing list pgsql-sql

From Stephan Szabo
Subject Re: index problem
Date
Msg-id Pine.BSF.4.21.0110171037560.22689-100000@megazone23.bigpanda.com
Whole thread Raw
In response to index problem  (Szabo Zoltan <col@mportal.hu>)
List pgsql-sql
> > Hmm, does the explain change if you vacuum analyze the other table
> > (prog_data)?  If not, what does explain show if you do a
> > set enable_seqscan='off';
> > before it?

Did you do the vacuum analyze on the other table (prog_data) as well?
It seems to be overestimating the number of joined rows, and I wonder
if it would choose a different plan if it had the correct number.

> The result:
> db=>set enable_seqscan='off';
> db=>explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date 
> from prog_dgy_xy,prog_data  where pxygy_pid=prog_id;
> NOTICE:  QUERY PLAN:
> 
> Unique  (cost=7606982.10..7854887.48 rows=2479054 width=32)
>    ->  Sort  (cost=7606982.10..7606982.10 rows=24790538 width=32)
>          ->  Merge Join  (cost=0.00..335621.73 rows=24790538 width=32)
>                ->  Index Scan using progdgyxy_idx2 on prog_dgy_xy 
> (cost=0.00..323297.05 rows=921013 width=4)
>                ->  Index Scan using prog_data_pkey on prog_data 
> (cost=0.00..701.12 rows=8872 width=28)
> 
> It "seems" index is used, but the same result :(((, and bigger execution 
> time: real   3m41.830s

Well, that means the plan it chose before was better, so enable_seqscan
isn't a win here.

> And why:
> POSTGRES:
> set enable_seqscan ='off'; select count(*) from prog_dgy_xy where 
> pxygy_pid<13161;
>   count
> --------
>   900029
> real    2m34.340s
> explain:
> Aggregate  (cost=327896.89..327896.89 rows=1 width=0)
>    ->  Index Scan using progdgyxy_idx2 on prog_dgy_xy 
> (cost=0.00..325594.54 rows=920940 width=0)

It's estimating the entire table will be seen (or most of it anyway),
so it would choose Seq Scan as faster, but you've basically disallowed
that with the enable_seqscan='off'.  Is it faster without the explicit
hint (it probably will be).  Index Scans are not always better than
Sequence Scans (especially when traversing most of the table as in the
above) and you don't want to use the enable_* unless it actually
is giving you a performance increase.



pgsql-sql by date:

Previous
From: Joel Burton
Date:
Subject: Re: Multiple Parameters to an Aggregate Function
Next
From: Joel Burton
Date:
Subject: Re: nvl() function