Re: index problem - Mailing list pgsql-sql

From CoL
Subject Re: index problem
Date
Msg-id 3BCD9582.1070804@mportal.hu
Whole thread Raw
In response to Re: index problem  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-sql
Hi,

Stephan Szabo wrote:

> On Tue, 16 Oct 2001, CoL wrote:
> 
> 
>>---------------------------
>>The 2 table query, where prog_data has ~8800 rowsn and index on prog_id:
>>bash-2.04$ time echo "explain select distinct 
>>prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data 
>>where pxygy_pid=prog_id " | psql -Uuser db
>>NOTICE:  QUERY PLAN:
>>
>>Unique  (cost=7432549.69..7680455.07 rows=2479054 width=32)
>>   ->  Sort  (cost=7432549.69..7432549.69 rows=24790538 width=32)
>>         ->  Merge Join  (cost=148864.65..161189.33 rows=24790538 width=32)
>>               ->  Index Scan using prog_data_pkey on prog_data 
>>(cost=0.00..701.12 rows=8872 width=28)
>>               ->  Sort  (cost=148864.65..148864.65 rows=921013 width=4)
>>                     ->  Seq Scan on prog_dgy_xy  (cost=0.00..30145.13 
>>rows=921013 width=4)
>>
> 
> I'm guessing that the approximately 25 million row estimate on the join
> has to be wrong as well given that prog_data.prog_id should be unique.
> 
> Hmm, does the explain change if you vacuum analyze the other table
> (prog_data)?  If not, what does explain show if you do a
> set enable_seqscan='off';
> before it?

The result:
db=>set enable_seqscan='off';
db=>explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date 
from prog_dgy_xy,prog_data  where pxygy_pid=prog_id;
NOTICE:  QUERY PLAN:

Unique  (cost=7606982.10..7854887.48 rows=2479054 width=32)  ->  Sort  (cost=7606982.10..7606982.10 rows=24790538
width=32)       ->  Merge Join  (cost=0.00..335621.73 rows=24790538 width=32)              ->  Index Scan using
progdgyxy_idx2on prog_dgy_xy 
 
(cost=0.00..323297.05 rows=921013 width=4)              ->  Index Scan using prog_data_pkey on prog_data 
(cost=0.00..701.12 rows=8872 width=28)

It "seems" index is used, but the same result :(((, and bigger execution 
time: real   3m41.830s

What is in tables?
prog_data contains unique id and other info.
prog_dgy_xy contains that id with x,y coordinates (so many ids from 
prog_data with unique x,y)
#prog_data:
#prog_id, prog_ftype, prog_fcasthour, prog_date
#1 
'type'        6                  2001-10-14 12:00:00
#2 
'type'        12              2001-10-14 12:00:00
#prog_dgy_xy:
#pxygy_pid, pxygy_x, pxygy_y
#1     0.1       0.1
#1          0.1       0.15

How can this query takes real    0m1.755s for mysql, [17 sec for 
oracle], and 2-3 minutes!! for postgres?

And why:
POSTGRES:
set enable_seqscan ='off'; select count(*) from prog_dgy_xy where 
pxygy_pid<13161; count
-------- 900029
real    2m34.340s
explain:
Aggregate  (cost=327896.89..327896.89 rows=1 width=0)  ->  Index Scan using progdgyxy_idx2 on prog_dgy_xy 
(cost=0.00..325594.54 rows=920940 width=0)


MYSQL:
select count(pxygy_pid) from PROG_DGY_XY where pxygy_pid<13161
count(pxygy_pid)
900029
real    0m27.878s
explain:
table   type    possible_keys   key     key_len ref     rows    Extra
PROG_DGY_XY     range   progdgyxy_idx1,progdgyxy_idx2   progdgyxy_idx2 
4       NULL    906856  where used; Using index

The same time difference in case of: = or >, however explain says, cause 
seq scan is off, the index is used.
I did vacuum, and vacuum analyze too before.

PS: I think i have to make a site for that, cause there are many 
questions :), and weird things.
I love postgres but this makes me "hm?". Today i'll make these test 
under 7.1.2.
thx
CoL




pgsql-sql by date:

Previous
From: "Michael Richards"
Date:
Subject: Re: Indexes on functions
Next
From: "Henshall, Stuart - WCP"
Date:
Subject: Re: Deleting obsolete values