sequence scan problem - Mailing list pgsql-performance

From John Beaver
Subject sequence scan problem
Date
Msg-id 48680418.7030708@gmail.com
Whole thread Raw
Responses Re: sequence scan problem
Re: sequence scan problem
List pgsql-performance
I'm having a strange problem with a query. The query is fairly simple,
with a few constants and two joins. All relevant columns should be
indexed, and I'm pretty sure there aren't any type conversion issues.
But the query plan includes a fairly heavy seq scan. The only possible
complication is that the tables involved are fairly large - hundreds of
millions of rows each.

Can anyone explain this? There should only ever be a maximum of about 50
rows returned when the query is executed.

Query:

select fls.function_verified, fls.score, fls.go_category_group_ref,
fs1.gene_ref, fs1.function_verified_exactly, fs2.gene_ref,
fs2.function_verified_exactly from functional_linkage_scores fls,
gene_prediction_view fs1, gene_prediction_view fs2 where fls.gene_ref1 =
fs1.gene_ref and fls.gene_ref2 = fs2.gene_ref and fs1.go_term_ref = 2
and fs2.go_term_ref = 2

Explain on query:
Merge Join  (cost=1331863800.16..6629339921.15 rows=352770803726 width=22)
   Merge Cond: (fs2.gene_ref = fls.gene_ref2)
   ->  Index Scan using gene_prediction_view_gene_ref on
gene_prediction_view fs2  (cost=0.00..6235287.98 rows=197899 width=5)
         Index Cond: (go_term_ref = 2)
   ->  Materialize  (cost=1331794730.41..1416453931.72 rows=6772736105
width=21)
         ->  Sort  (cost=1331794730.41..1348726570.67 rows=6772736105
width=21)
               Sort Key: fls.gene_ref2
               ->  Merge Join  (cost=38762951.04..146537410.33
rows=6772736105 width=21)
                     Merge Cond: (fs1.gene_ref = fls.gene_ref1)
                     ->  Index Scan using gene_prediction_view_gene_ref
on gene_prediction_view fs1  (cost=0.00..6235287.98 rows=197899 width=5)
                           Index Cond: (go_term_ref = 2)
                     ->  Materialize  (cost=38713921.60..41618494.20
rows=232365808 width=20)
                           ->  Sort  (cost=38713921.60..39294836.12
rows=232365808 width=20)
                                 Sort Key: fls.gene_ref1
                                 ->  Seq Scan on
functional_linkage_scores fls  (cost=0.00..3928457.08 rows=232365808
width=20)


\d on functional_linkage_scores (232241678 rows):
                                    Table "public.functional_linkage_scores"
        Column         |     Type      |
Modifiers
-----------------------+---------------+------------------------------------------------------------------------
 id                    | integer       | not null default
nextval('functional_linkage_scores_id_seq'::regclass)
 gene_ref1             | integer       | not null
 gene_ref2             | integer       | not null
 function_verified     | boolean       | not null
 score                 | numeric(12,4) | not null
 go_category_group_ref | integer       | not null
 go_term_ref           | integer       |
Indexes:
    "functional_linkage_scores_pkey" PRIMARY KEY, btree (id)
    "functional_linkage_scores_gene_ref1_key" UNIQUE, btree (gene_ref1,
gene_ref2, go_category_group_ref, go_term_ref)
    "ix_functional_linkage_scores_gene_ref2" btree (gene_ref2)
Foreign-key constraints:
    "functional_linkage_scores_gene_ref1_fkey" FOREIGN KEY (gene_ref1)
REFERENCES genes(id)
    "functional_linkage_scores_gene_ref2_fkey" FOREIGN KEY (gene_ref2)
REFERENCES genes(id)
    "functional_linkage_scores_go_category_group_ref_fkey" FOREIGN KEY
(go_category_group_ref) REFERENCES go_category_groups(id)

\d on gene_prediction_view (568654245 rows):
                                              Table
"public.gene_prediction_view"
              Column              |          Type
|                             Modifiers

----------------------------------+------------------------+-------------------------------------------------------------------
 id                               | integer                | not null
default nextval('gene_prediction_view_id_seq'::regclass)
 gene_ref                         | integer                | not null
 go_term_ref                      | integer                | not null
 go_description                   | character varying(200) | not null
 go_category                      | character varying(50)  | not null
 function_verified_exactly        | boolean                | not null
 function_verified_with_parent_go | boolean                | not null
 score                            | numeric(12,4)          | not null
 prediction_method_ref            | integer                |
 functional_score_ref             | integer                |
Indexes:
    "gene_prediction_view_pkey" PRIMARY KEY, btree (id)
    "gene_prediction_view_functional_score_ref_key" UNIQUE, btree
(functional_score_ref)
    "gene_prediction_view_gene_ref" UNIQUE, btree (gene_ref,
go_term_ref, prediction_method_ref)
Foreign-key constraints:
    "gene_prediction_view_functional_score_ref_fkey" FOREIGN KEY
(functional_score_ref) REFERENCES functional_scores(id)
    "gene_prediction_view_gene_ref_fkey" FOREIGN KEY (gene_ref)
REFERENCES genes(id)
    "gene_prediction_view_go_term_ref_fkey" FOREIGN KEY (go_term_ref)
REFERENCES go_terms(term)

...and just in case someone can give advice on more aggressive settings
that might help out the planner for this particular comptuer...
This computer: Mac Pro / 4 gigs ram / software Raid 0 across two hard
drives.
Production computer: Xeon 3ghz / 32 gigs ram / Debian


pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Sources of information about sizing of hardwares to run PostgreSQL
Next
From: Jeremy Harris
Date:
Subject: Re: sequence scan problem