Thread: Re: query efficiency - Can I speed it up?

Re: query efficiency - Can I speed it up?

From
ann hedley
Date:
Thanks but no.  I think because it's description~ (like) rather than
description= therefore it has to search the whole string in each
description.

 From the output below it seems to search the 'go' table once, cost 266,
then for each record found searches the 'gotcha' table (cost 46809). So
it's the gotcha part of the query I need to speed up, yes?

Maybe I just need a clever bit of PHP that checks the cost and if it's
excessive asks for a more specific search string?

Thanks

nemdb3=# explain analyse select pept_id,description,confidence from
gotcha inner join go on (gotcha.go_term=go.go_term) where go.description
like 'tRNA acetylation' and ( spid='ALP' or spid='ASP' or spid='DIP' or
spid='GPP' );
                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..47078.69 rows=51 width=54) (actual
time=6.266..6.407 rows=1 loops=1)
   ->  Seq Scan on go  (cost=0.00..266.50 rows=1 width=50) (actual
time=6.104..6.164 rows=1 loops=1)
         Filter: (description ~~ 'tRNA acetylation'::text)
   ->  Index Scan using gotcha_go_term_sp_id on gotcha
(cost=0.00..46809.29 rows=232 width=32) (actual time=0.134..0.205 rows=1
loops=1)
         Index Cond: ((gotcha.go_term)::text = ("outer".go_term)::text)
         Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
'GPP'::text))
 Total runtime: 6.493 ms
(7 rows)

nemdb3=# explain analyse select pept_id,description,confidence from
gotcha inner join go on (gotcha.go_term=go.go_term) where
go.description~'oxygen' and ( spid='ALP' or spid='ASP' or spid='DIP' or
spid='GPP' );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..187515.25 rows=203 width=54) (actual
time=625.017..360545.979 rows=7401 loops=1)
   ->  Seq Scan on go  (cost=0.00..266.50 rows=4 width=50) (actual
time=0.691..79.282 rows=111 loops=1)
         Filter: (description ~ 'oxygen'::text)
   ->  Index Scan using gotcha_go_term_sp_id on gotcha
(cost=0.00..46809.29 rows=232 width=32) (actual time=110.987..3246.825
rows=67 loops=111)
         Index Cond: ((gotcha.go_term)::text = ("outer".go_term)::text)
         Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
'GPP'::text))
 Total runtime: 360572.482 ms
(7 rows)




Duncan Garland wrote:
> Does putting an index on description help?
>
> -----Original Message-----
> From: ann hedley [mailto:ann.hedley@ed.ac.uk]
> Sent: 31 January 2007 17:34
> To: pgsql-novice-owner@postgresql.org
> Subject: query efficiency - Can I speed it up?
>
>
> Dear All
>
> Can anyone tell me if/how I can speed up this query?  You could just
> point me at a good 'guide to building efficient queries' if I'm missing
> something obvious.  I've tried a few online sources and most of my
> queries are much faster now but this one ???
>
> Thanks
>
> nemdb3=# explain select pept_id,description,confidence from gotcha inner
> join go on (gotcha.go_term=go.go_term) where go.description~'oxygen' and
> ( spid='ALP' or spid='ASP' or spid='DIP' or spid='GPP' ) order by
> gotcha.confidence;
>
> QUERY PLAN
> ----------------------------------------------------------------------------
> --------------------------------------------------------------------------
>  Sort  (cost=187523.03..187523.54 rows=203 width=54)
>    Sort Key: gotcha.confidence
>    ->  Nested Loop  (cost=0.00..187515.25 rows=203 width=54)
>          ->  Seq Scan on go  (cost=0.00..266.50 rows=4 width=50)
>                Filter: (description ~ 'oxygen'::text)
>          ->  Index Scan using gotcha_go_term_sp_id on gotcha
> (cost=0.00..46809.29 rows=232 width=32)
>                Index Cond: ((gotcha.go_term)::text =
> ("outer".go_term)::text)
>                Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
> 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
> 'GPP'::text))
> (8 rows)
>
> nemdb3=# \d go
>                   Table "public.go"
>      Column      |         Type          | Modifiers
> -----------------+-----------------------+-----------
>  go_term         | character varying(14) |
>  go_asp          | character varying(1)  |
>  description     | text                  |
>  hierarchy_level | integer               |
> Indexes:
>     "go_go_term" btree (go_term)
>
> nemdb3=# \d gotcha
>              Table "public.gotcha"
>    Column   |         Type          | Modifiers
> ------------+-----------------------+-----------
>  run_no     | character varying(20) |
>  pept_id    | character varying(15) |
>  contig     | integer               |
>  confidence | integer               |
>  go_term    | character varying(14) |
>  hits       | integer               |
>  spid       | character varying(5)  |
> Indexes:
>     "gotcha_go_term_sp_id" btree (go_term, spid)
>     "gotcha_pept_id" btree (pept_id)
>     "gotcha_spid" btree (spid)
>
> nemdb3=# select relname, relkind, reltuples, relpages from pg_class
> where relname like 'go%';
>         relname         | relkind |  reltuples  | relpages
> ------------------------+---------+-------------+----------
>  gotcha                 | r       | 2.70698e+07 |   695188
>  go                     | r       |       10600 |      134
>  gotcha_pept_id         | i       | 2.70698e+07 |   104266
>  go_go_term             | i       |       10600 |       43
>  gotcha_spid            | i       | 2.70698e+07 |    74225
>  gotcha_go_term_sp_id   | i       | 2.70698e+07 |   134201
>  (11 rows)
>
> nemdb3=#
>
>
>
>
> --
> Ann
>
> "In a world without walls and fences - who needs Windows and Gates ?"
>                                                            (unknown)
>
>
>
>
>


--
Ann

"In a world without walls and fences - who needs Windows and Gates ?"
                                                           (unknown)


Re: query efficiency - Can I speed it up?

From
Sean Davis
Date:
On Thursday 01 February 2007 12:01, ann hedley wrote:
> Thanks but no.  I think because it's description~ (like) rather than
> description= therefore it has to search the whole string in each
> description.
>
>  From the output below it seems to search the 'go' table once, cost 266,
> then for each record found searches the 'gotcha' table (cost 46809). So
> it's the gotcha part of the query I need to speed up, yes?
>
> Maybe I just need a clever bit of PHP that checks the cost and if it's
> excessive asks for a more specific search string?
>
> Thanks
>
> nemdb3=# explain analyse select pept_id,description,confidence from
> gotcha inner join go on (gotcha.go_term=go.go_term) where go.description
> like 'tRNA acetylation' and ( spid='ALP' or spid='ASP' or spid='DIP' or
> spid='GPP' );

Ann,

You might think about full-text searching of the descriptions.  It may be
quite a bit faster than a "like" search.  It may be worth looking.

Sean

Re: query efficiency - Can I speed it up?

From
Tom Lane
Date:
Sean Davis <sdavis2@mail.nih.gov> writes:
> You might think about full-text searching of the descriptions.  It may be
> quite a bit faster than a "like" search.  It may be worth looking.

It's not the LIKE search that's her problem though --- the time is going
into fetching the matching rows from the other table.

If the "go" table becomes large then it'd be time to worry about that,
but right now it's only costing her 80 msec to seqscan it...

            regards, tom lane

Re: query efficiency - Can I speed it up?

From
Sean Davis
Date:
On Thursday 01 February 2007 13:28, Tom Lane wrote:
> Sean Davis <sdavis2@mail.nih.gov> writes:
> > You might think about full-text searching of the descriptions.  It may be
> > quite a bit faster than a "like" search.  It may be worth looking.
>
> It's not the LIKE search that's her problem though --- the time is going
> into fetching the matching rows from the other table.
>
> If the "go" table becomes large then it'd be time to worry about that,
> but right now it's only costing her 80 msec to seqscan it...

Oops.  Thanks.

Sean