Re: sequence scan problem - Mailing list pgsql-performance

From John Beaver
Subject Re: sequence scan problem
Date
Msg-id 4868BC74.6020006@gmail.com
Whole thread Raw
In response to Re: sequence scan problem  (Jeremy Harris <jgh@wizmail.org>)
Responses Re: sequence scan problem
List pgsql-performance
Ok, here's the explain analyze result. Again, this is Postgres 8.3.3 and
I vacuumed-analyzed both tables directly after they were created.


# explain analyze select fls.function_verified, fls.score,
fls.go_category_group_ref, fs1.gene_ref, fs1.function_verified_exactly,
fs2.gene_ref, fs2.function_verified_exactly from
functional_linkage_scores fls, gene_prediction_view fs1,
gene_prediction_view fs2 where fls.gene_ref1 = fs1.gene_ref and
fls.gene_ref2 = fs2.gene_ref and fs1.go_term_ref = 2 and fs2.go_term_ref
= 2;

QUERY
PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1399203593.41..6702491234.74 rows=352770803726
width=22) (actual time=6370194.467..22991303.434 rows=15610535128 loops=1)
   Merge Cond: (fs2.gene_ref = fls.gene_ref2)
   ->  Index Scan using gene_prediction_view_gene_ref on
gene_prediction_view fs2  (cost=0.00..12111899.77 rows=197899 width=5)
(actual time=29.592..469838.583 rows=180629 loops=1)
         Index Cond: (go_term_ref = 2)
   ->  Materialize  (cost=1399069432.20..1483728633.52 rows=6772736105
width=21) (actual time=6370164.864..16623552.417 rows=15610535121 loops=1)
         ->  Sort  (cost=1399069432.20..1416001272.47 rows=6772736105
width=21) (actual time=6370164.860..13081970.248 rows=1897946790 loops=1)
               Sort Key: fls.gene_ref2
               Sort Method:  external merge  Disk: 61192240kB
               ->  Merge Join  (cost=40681244.97..154286110.62
rows=6772736105 width=21) (actual time=592112.778..2043161.851
rows=1897946790 loops=1)
                     Merge Cond: (fs1.gene_ref = fls.gene_ref1)
                     ->  Index Scan using gene_prediction_view_gene_ref
on gene_prediction_view fs1  (cost=0.00..12111899.77 rows=197899
width=5) (actual time=0.015..246613.129 rows=180644 loops=1)
                           Index Cond: (go_term_ref = 2)
                     ->  Materialize  (cost=40586010.10..43490582.70
rows=232365808 width=20) (actual time=592112.755..1121366.375
rows=1897946783 loops=1)
                           ->  Sort  (cost=40586010.10..41166924.62
rows=232365808 width=20) (actual time=592112.721..870349.308
rows=232241678 loops=1)
                                 Sort Key: fls.gene_ref1
                                 Sort Method:  external merge  Disk:
7260856kB
                                 ->  Seq Scan on
functional_linkage_scores fls  (cost=0.00..3928457.08 rows=232365808
width=20) (actual time=14.221..86455.902 rows=232241678 loops=1)
 Total runtime: 24183346.271 ms
(18 rows)





Jeremy Harris wrote:
> John Beaver wrote:
>> I'm having a strange problem with a query. The query is fairly
>> simple, with a few constants and two joins. All relevant columns
>> should be indexed, and I'm pretty sure there aren't any type
>> conversion issues. But the query plan includes a fairly heavy seq
>> scan. The only possible complication is that the tables involved are
>> fairly large - hundreds of millions of rows each.
>>
>> Can anyone explain this? There should only ever be a maximum of about
>> 50 rows returned when the query is executed.
>
> You didn't say when you last vacuumed?
> If there should only be 50 rows returned then the estimates from the
> planner are way out.
>
> If that doesn't help, we'll need version info, and (if you can afford
> the time) an "explain analyze"
>
> Cheers,
>  Jeremy
>

pgsql-performance by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Planner should use index on a LIKE 'foo%' query
Next
From: Moritz Onken
Date:
Subject: Re: Planner should use index on a LIKE 'foo%' query