Re: suggestions on improving a query - Mailing list pgsql-general
From | Adam Rich |
---|---|
Subject | Re: suggestions on improving a query |
Date | |
Msg-id | 00bf01c74fe1$e6eb1370$6400a8c0@dualcore Whole thread Raw |
In response to | suggestions on improving a query (Rajarshi Guha <rguha@indiana.edu>) |
Responses |
Re: suggestions on improving a query
|
List | pgsql-general |
This line: Index Scan using plp_total_idx on dockscore_plp (cost=0.00..16733229.92 rows=4669988 width=80) (actual time=98.323..322537.605 rows=25197 loops=1) Means the planner did what it did, because it estimated there would be nearly 5 million rows. However, there were only 25,000. Have these tables been vacuumed & analyzed recently? Your first step should be to vacuum & analyze these, and send us the new "explain analyze". -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rajarshi Guha Sent: Monday, February 12, 2007 2:24 PM To: pgsql-general@postgresql.org Subject: [GENERAL] suggestions on improving a query Hi, I have a query that involves 3 tables. T select pubchem_compound.openeye_can_smiles, pubchem_compound.nist_inchi, dock.cid, dockscore_plp.* from dock, dockscore_plp, pubchem_compound where dock.target = '1YC1' and dock.dockid = dockscore_plp.id and dock.cid = pubchem_compound.cid order by dockscore_plp.total limit 10; The output of explain analyze is Limit (cost=0.00..387.36 rows=10 width=297) (actual time=242977.644..462748.215 rows=10 loops=1) -> Nested Loop (cost=0.00..37325186.12 rows=963575 width=297) (actual time=242977.638..462748.175 rows=10 loops=1) -> Nested Loop (cost=0.00..31523550.51 rows=963575 width=90) (actual time=242900.629..461810.902 rows=10 loops=1) -> Index Scan using plp_total_idx on dockscore_plp (cost=0.00..16733229.92 rows=4669988 width=80) (actual time=98.323..322537.605 rows=25197 loops=1) -> Index Scan using dock_pkey on dock (cost=0.00..3.15 rows=1 width=18) (actual time=5.521..5.521 rows=0 loops=25197) Index Cond: (dock.dockid = "outer".id) Filter: (target = '1YC1'::text) -> Index Scan using pubchem_compound_pkey on pubchem_compound (cost=0.00..6.01 rows=1 width=216) (actual time=93.699..93.704 rows=1 loops=10) Index Cond: (("outer".cid)::text = (pubchem_compound.cid)::text) Total runtime: 462748.439 ms (10 rows) Now, the tables 'dock' and 'dockscore_plp' have 4.6M rows and 'pubchem_compound' has 10M rows. However the clause: dock.target = '1YC1' and dock.dockid = dockscore_plp.id reduces the number of rows from 4.6M to 96K. I had figured that after this the query would be very fast. But the explain analyze seems to indicate that the dockscore_plp table is being sorted (using the index plp_total_idx) in its entirety. This would then be the bottleneck Is this a correct interpretation? What I expected was that the sort would be applied to the 96K subset of dockscore_plp, rather than the whole table. Is it possible to restructure the query such that the sort is done on 96K rows rather than 4.6M rows? Thanks, ------------------------------------------------------------------- Rajarshi Guha <rguha@indiana.edu> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- "I'd love to go out with you, but my favorite commercial is on TV." ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
pgsql-general by date: