Thread: Performance regarding LIKE searches
Hi, I am querying a Postgresql 8.3 database table that has approximately 22 million records. The (explain analyze) query islisted below: gdr_gbrowse_live=> explain analyze SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name asn WHERE (n.id=f.id AND lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' AND n.display_name>0); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.01..5899.93 rows=734 width=884) (actual time=0.033..0.033 rows=0 loops=1) -> Index Scan using name_name_lower_pattern_ops_idx on name n (cost=0.01..9.53 rows=734 width=4) (actual time=0.032..0.032rows=0 loops=1) Index Cond: ((lower((name)::text) ~>=~ 'Scaffold:scaffold'::text) AND (lower((name)::text) ~<~ 'Scaffold:scaffole'::text)) Filter: ((display_name > 0) AND (lower((name)::text) ~~ 'Scaffold:scaffold_163:1000..1199%'::text)) -> Index Scan using feature_pkey on feature f (cost=0.00..8.01 rows=1 width=884) (never executed) Index Cond: (f.id = n.id) Total runtime: 0.119 ms (7 rows) I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast andperforms like I would expect. However, when my application, GBrowse, access the database, I see in my slow query logthis: 2010-03-29 09:34:38.083 PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45 PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE $1) ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,, GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? Randall Svancara Systems Administrator/DBA/Developer Main Bioinformatics Laboratory
randalls@bioinfo.wsu.edu writes: > I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow querylog this: > 2010-03-29 09:34:38.083 PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45 PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand > FROM feature as f, name as n > WHERE (n.id=f.id AND lower(n.name) LIKE $1) > ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,, > GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? You're not going to get an index optimization when the LIKE pattern isn't a constant (and left-anchored, but this is). It is possible to get the planner to treat a query parameter as a constant (implying a re-plan on each execution instead of having a cached plan). I believe what you have to do at the moment is use unnamed rather than named prepared statements. The practicality of this would depend a lot on your client-side software stack, which you didn't mention. regards, tom lane
Tom, We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named preparedstatements? Thanks, Randall Svancara Systems Administrator/DBA/Developer Main Bioinformatics Laboratory ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: randalls@bioinfo.wsu.edu Cc: pgsql-performance@postgresql.org Sent: Monday, March 29, 2010 10:00:03 AM Subject: Re: [PERFORM] Performance regarding LIKE searches randalls@bioinfo.wsu.edu writes: > I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow querylog this: > 2010-03-29 09:34:38.083 PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45 PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand > FROM feature as f, name as n > WHERE (n.id=f.id AND lower(n.name) LIKE $1) > ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,, > GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? You're not going to get an index optimization when the LIKE pattern isn't a constant (and left-anchored, but this is). It is possible to get the planner to treat a query parameter as a constant (implying a re-plan on each execution instead of having a cached plan). I believe what you have to do at the moment is use unnamed rather than named prepared statements. The practicality of this would depend a lot on your client-side software stack, which you didn't mention. regards, tom lane
On 3/29/2010 12:23 PM, randalls@bioinfo.wsu.edu wrote: > Tom, > > We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named preparedstatements? > > Thanks, > > Randall Svancara > Systems Administrator/DBA/Developer > Main Bioinformatics Laboratory > > > > ----- Original Message ----- > From: "Tom Lane"<tgl@sss.pgh.pa.us> > To: randalls@bioinfo.wsu.edu > Cc: pgsql-performance@postgresql.org > Sent: Monday, March 29, 2010 10:00:03 AM > Subject: Re: [PERFORM] Performance regarding LIKE searches > > randalls@bioinfo.wsu.edu writes: >> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow querylog this: > >> 2010-03-29 09:34:38.083 PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45 PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand >> FROM feature as f, name as n >> WHERE (n.id=f.id AND lower(n.name) LIKE $1) > >> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,, > >> GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? > > You're not going to get an index optimization when the LIKE pattern > isn't a constant (and left-anchored, but this is). > > It is possible to get the planner to treat a query parameter as a > constant (implying a re-plan on each execution instead of having a > cached plan). I believe what you have to do at the moment is use > unnamed rather than named prepared statements. The practicality of > this would depend a lot on your client-side software stack, which > you didn't mention. > > regards, tom lane > I'm just going to guess, but DBD::Pg can do "real prepare" or "fake prepare". It does "real" by default. Try setting: $dbh->{pg_server_prepare} = 0; before you prepare/run that statement and see if it makes a difference. http://search.cpan.org/dist/DBD-Pg/Pg.pm#prepare -Andy
On Mon, 29 Mar 2010, randalls@bioinfo.wsu.edu wrote: > WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ... I'm sure you noticed that this is never going to return any rows? Matthew -- Me... a skeptic? I trust you have proof?