Re: suggestions on improving a query - Mailing list pgsql-general

From Adam Rich
Subject Re: suggestions on improving a query
Date
Msg-id 00bf01c74fe1$e6eb1370$6400a8c0@dualcore
Whole thread Raw
In response to suggestions on improving a query  (Rajarshi Guha <rguha@indiana.edu>)
Responses Re: suggestions on improving a query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
This line:

Index Scan using plp_total_idx on dockscore_plp
(cost=0.00..16733229.92 rows=4669988 width=80)
(actual time=98.323..322537.605 rows=25197 loops=1)

Means the planner did what it did, because it estimated there would be
nearly 5 million rows.  However, there were only 25,000.

Have these tables been vacuumed & analyzed recently?  Your first step
should be to vacuum & analyze these, and send us the new "explain
analyze".



-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rajarshi Guha
Sent: Monday, February 12, 2007 2:24 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] suggestions on improving a query


Hi, I have a query that involves 3 tables. T

    select pubchem_compound.openeye_can_smiles,
pubchem_compound.nist_inchi, dock.cid, dockscore_plp.*
    from dock, dockscore_plp, pubchem_compound
    where
    dock.target = '1YC1' and
    dock.dockid = dockscore_plp.id  and
    dock.cid = pubchem_compound.cid
    order by dockscore_plp.total
    limit 10;

The output of explain analyze is

 Limit  (cost=0.00..387.36 rows=10 width=297) (actual
time=242977.644..462748.215 rows=10 loops=1)
   ->  Nested Loop  (cost=0.00..37325186.12 rows=963575 width=297)
(actual time=242977.638..462748.175 rows=10 loops=1)
         ->  Nested Loop  (cost=0.00..31523550.51 rows=963575 width=90)
(actual time=242900.629..461810.902 rows=10 loops=1)
               ->  Index Scan using plp_total_idx on dockscore_plp
(cost=0.00..16733229.92 rows=4669988 width=80) (actual
time=98.323..322537.605 rows=25197 loops=1)
               ->  Index Scan using dock_pkey on dock  (cost=0.00..3.15
rows=1 width=18) (actual time=5.521..5.521 rows=0 loops=25197)
                     Index Cond: (dock.dockid = "outer".id)
                     Filter: (target = '1YC1'::text)
         ->  Index Scan using pubchem_compound_pkey on pubchem_compound
(cost=0.00..6.01 rows=1 width=216) (actual time=93.699..93.704 rows=1
loops=10)
               Index Cond: (("outer".cid)::text =
(pubchem_compound.cid)::text)
 Total runtime: 462748.439 ms
(10 rows)

Now, the tables 'dock' and 'dockscore_plp' have 4.6M rows and
'pubchem_compound' has 10M rows.

However the clause:

    dock.target = '1YC1' and
    dock.dockid = dockscore_plp.id

reduces the number of rows from 4.6M to 96K. I had figured that after
this the query would be very fast. But the explain analyze seems to
indicate that the dockscore_plp table is being sorted (using the index
plp_total_idx) in its entirety. This would then be the bottleneck

Is this a correct interpretation?

What I expected was that the sort would be applied to the 96K subset of
dockscore_plp, rather than the whole table.

Is it possible to restructure the query such that the sort is done on
96K rows rather than 4.6M rows?

Thanks,


-------------------------------------------------------------------
Rajarshi Guha <rguha@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
"I'd love to go out with you, but my favorite commercial is on TV."



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: SEQUENCE primary key
Next
From: Tom Lane
Date:
Subject: Re: suggestions on improving a query