Thread: Performance regarding LIKE searches

Performance regarding LIKE searches

From
randalls@bioinfo.wsu.edu
Date:
Hi,

I am querying a Postgresql 8.3 database table that has approximately 22 million records.  The (explain analyze) query
islisted below: 

gdr_gbrowse_live=> explain analyze SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name
asn WHERE (n.id=f.id AND lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' AND n.display_name>0); 
                                                                   QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.01..5899.93 rows=734 width=884) (actual time=0.033..0.033 rows=0 loops=1)
   ->  Index Scan using name_name_lower_pattern_ops_idx on name n  (cost=0.01..9.53 rows=734 width=4) (actual
time=0.032..0.032rows=0 loops=1) 
         Index Cond: ((lower((name)::text) ~>=~ 'Scaffold:scaffold'::text) AND (lower((name)::text) ~<~
'Scaffold:scaffole'::text))
         Filter: ((display_name > 0) AND (lower((name)::text) ~~ 'Scaffold:scaffold_163:1000..1199%'::text))
   ->  Index Scan using feature_pkey on feature f  (cost=0.00..8.01 rows=1 width=884) (never executed)
         Index Cond: (f.id = n.id)
 Total runtime: 0.119 ms
(7 rows)

I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting.  This is very fast
andperforms like I would expect.  However, when my application, GBrowse, access the database, I see in my slow query
logthis: 

2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms  execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
  FROM feature as f, name as n
  WHERE (n.id=f.id AND lower(n.name) LIKE $1)

","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,

GBrowse is a perl based application.  Looking at the duration for this query is around 21 seconds.  That is a bit long.
Does anyone have any ideas why the query duration is so different? 

Randall Svancara
Systems Administrator/DBA/Developer
Main Bioinformatics Laboratory



Re: Performance regarding LIKE searches

From
Tom Lane
Date:
randalls@bioinfo.wsu.edu writes:
> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting.  This is very
fastand performs like I would expect.  However, when my application, GBrowse, access the database, I see in my slow
querylog this: 

> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms  execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>   FROM feature as f, name as n
>   WHERE (n.id=f.id AND lower(n.name) LIKE $1)

> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,

> GBrowse is a perl based application.  Looking at the duration for this query is around 21 seconds.  That is a bit
long. Does anyone have any ideas why the query duration is so different? 

You're not going to get an index optimization when the LIKE pattern
isn't a constant (and left-anchored, but this is).

It is possible to get the planner to treat a query parameter as a
constant (implying a re-plan on each execution instead of having a
cached plan).  I believe what you have to do at the moment is use
unnamed rather than named prepared statements.  The practicality of
this would depend a lot on your client-side software stack, which
you didn't mention.

            regards, tom lane

Re: Performance regarding LIKE searches

From
randalls@bioinfo.wsu.edu
Date:
Tom,

We are using perl 5.10 with postgresql DBD.  Can you point me in the right direction in terms of unamed and named
preparedstatements? 

Thanks,

Randall Svancara
Systems Administrator/DBA/Developer
Main Bioinformatics Laboratory



----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: randalls@bioinfo.wsu.edu
Cc: pgsql-performance@postgresql.org
Sent: Monday, March 29, 2010 10:00:03 AM
Subject: Re: [PERFORM] Performance regarding LIKE searches

randalls@bioinfo.wsu.edu writes:
> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting.  This is very
fastand performs like I would expect.  However, when my application, GBrowse, access the database, I see in my slow
querylog this: 

> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms  execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>   FROM feature as f, name as n
>   WHERE (n.id=f.id AND lower(n.name) LIKE $1)

> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,

> GBrowse is a perl based application.  Looking at the duration for this query is around 21 seconds.  That is a bit
long. Does anyone have any ideas why the query duration is so different? 

You're not going to get an index optimization when the LIKE pattern
isn't a constant (and left-anchored, but this is).

It is possible to get the planner to treat a query parameter as a
constant (implying a re-plan on each execution instead of having a
cached plan).  I believe what you have to do at the moment is use
unnamed rather than named prepared statements.  The practicality of
this would depend a lot on your client-side software stack, which
you didn't mention.

            regards, tom lane

Re: Performance regarding LIKE searches

From
Andy Colson
Date:
On 3/29/2010 12:23 PM, randalls@bioinfo.wsu.edu wrote:
> Tom,
>
> We are using perl 5.10 with postgresql DBD.  Can you point me in the right direction in terms of unamed and named
preparedstatements? 
>
> Thanks,
>
> Randall Svancara
> Systems Administrator/DBA/Developer
> Main Bioinformatics Laboratory
>
>
>
> ----- Original Message -----
> From: "Tom Lane"<tgl@sss.pgh.pa.us>
> To: randalls@bioinfo.wsu.edu
> Cc: pgsql-performance@postgresql.org
> Sent: Monday, March 29, 2010 10:00:03 AM
> Subject: Re: [PERFORM] Performance regarding LIKE searches
>
> randalls@bioinfo.wsu.edu writes:
>> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting.  This is very
fastand performs like I would expect.  However, when my application, GBrowse, access the database, I see in my slow
querylog this: 
>
>> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms  execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>>    FROM feature as f, name as n
>>    WHERE (n.id=f.id AND lower(n.name) LIKE $1)
>
>> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,
>
>> GBrowse is a perl based application.  Looking at the duration for this query is around 21 seconds.  That is a bit
long. Does anyone have any ideas why the query duration is so different? 
>
> You're not going to get an index optimization when the LIKE pattern
> isn't a constant (and left-anchored, but this is).
>
> It is possible to get the planner to treat a query parameter as a
> constant (implying a re-plan on each execution instead of having a
> cached plan).  I believe what you have to do at the moment is use
> unnamed rather than named prepared statements.  The practicality of
> this would depend a lot on your client-side software stack, which
> you didn't mention.
>
>             regards, tom lane
>

I'm just going to guess, but DBD::Pg can do "real prepare" or "fake
prepare".

It does "real" by default.  Try setting:
$dbh->{pg_server_prepare} = 0;

before you prepare/run that statement and see if it makes a difference.

http://search.cpan.org/dist/DBD-Pg/Pg.pm#prepare


-Andy


Re: Performance regarding LIKE searches

From
Matthew Wakeling
Date:
On Mon, 29 Mar 2010, randalls@bioinfo.wsu.edu wrote:
> WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ...

I'm sure you noticed that this is never going to return any rows?

Matthew

--
 Me... a skeptic?  I trust you have proof?