Thread: Regex query not using index
I'm running a simple query on 8.2. With this syntax, Explain indicate that the index is scanned: select * from eod where name = 'AA' However, when I change the query to use simple regex: select * from eod where name ~ 'AA' now Explain indicates a seq scan: Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27 rows=1 width=149) Index Cond: ((symbol)::text = 'AA'::text) Is there any way to 'encourage' Postgres to hit the index when using regex? Do I need to create a functional index or something? Without the index in play, I really can't use regex on any of my larger tables.
On Feb 19, 2008, at 9:32 PM, Postgres User wrote: > I'm running a simple query on 8.2. With this syntax, Explain indicate > that the index is scanned: > select * from eod where name = 'AA' > > However, when I change the query to use simple regex: > select * from eod where name ~ 'AA' > > now Explain indicates a seq scan: > Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27 > rows=1 width=149) > Index Cond: ((symbol)::text = 'AA'::text) > > Is there any way to 'encourage' Postgres to hit the index when using > regex? Do I need to create a functional index or something? > Without the index in play, I really can't use regex on any of my > larger tables. You need it to be anchored: select * from eod where name ~ '^AA'; If you're looking to be able to use indexes for searches within a string then, for 8.2, you'll need to check out tsearch2. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Thanks, my dumb mistake. I need to perform the equivalent of a WHERE clause OR expression using regex to match exact strings. _________________________________________________________________________ this example hits the index: select * from eod where name ~ '^BA$' but when I try to add another possible value to the regex, it does a row scan: select * from eod where name ~ ^BA$|^AA$' both of these statements return the right results, but the 2nd ignores the index even though both values are left-anchored. any workaround- this behavior doesn't seem to make sense On Feb 19, 2008 8:45 PM, Erik Jones <erik@myemma.com> wrote: > > On Feb 19, 2008, at 9:32 PM, Postgres User wrote: > > > I'm running a simple query on 8.2. With this syntax, Explain indicate > > that the index is scanned: > > select * from eod where name = 'AA' > > > > However, when I change the query to use simple regex: > > select * from eod where name ~ 'AA' > > > > now Explain indicates a seq scan: > > Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27 > > rows=1 width=149) > > Index Cond: ((symbol)::text = 'AA'::text) > > > > Is there any way to 'encourage' Postgres to hit the index when using > > regex? Do I need to create a functional index or something? > > Without the index in play, I really can't use regex on any of my > > larger tables. > > You need it to be anchored: > > select * from eod where name ~ '^AA'; > > If you're looking to be able to use indexes for searches within a > string then, for 8.2, you'll need to check out tsearch2. > > Erik Jones > > DBA | Emma(R) > erik@myemma.com > 800.595.4401 or 615.292.5888 > 615.292.0777 (fax) > > Emma helps organizations everywhere communicate & market in style. > Visit us online at http://www.myemma.com > > > >
Postgres User wrote: > Thanks, my dumb mistake. > I need to perform the equivalent of a WHERE clause OR expression using > regex to match exact strings. > > _________________________________________________________________________ > > this example hits the index: > select * from eod where name ~ '^BA$' > > but when I try to add another possible value to the regex, it does a row scan: > select * from eod where name ~ ^BA$|^AA$' > > both of these statements return the right results, but the 2nd ignores > the index even though both values are left-anchored. > > any workaround- this behavior doesn't seem to make sense try changing it to select * from eod where (name ~ '^BA$' or name ~ '^AA$') though in this example they should both be name = 'XX' rather than regex'es. -- Postgresql & php tutorials http://www.designmagick.com/
Yes that works, but the whole point of the exercise is replace many OR statements with 1 regex expression. So it's not what I'm looking for. On Feb 19, 2008 9:16 PM, Chris <dmagick@gmail.com> wrote: > Postgres User wrote: > > Thanks, my dumb mistake. > > I need to perform the equivalent of a WHERE clause OR expression using > > regex to match exact strings. > > > > _________________________________________________________________________ > > > > this example hits the index: > > select * from eod where name ~ '^BA$' > > > > but when I try to add another possible value to the regex, it does a row scan: > > select * from eod where name ~ ^BA$|^AA$' > > > > both of these statements return the right results, but the 2nd ignores > > the index even though both values are left-anchored. > > > > any workaround- this behavior doesn't seem to make sense > > try changing it to > > select * from eod where (name ~ '^BA$' or name ~ '^AA$') > > though in this example they should both be name = 'XX' rather than regex'es. > > -- > Postgresql & php tutorials > http://www.designmagick.com/ >
Postgres User wrote: > Yes that works, but the whole point of the exercise is replace many OR > statements with 1 regex expression. So it's not what I'm looking for. Why do you want it done this way? You can build an array of strings to check and use an in clause. Using php : $checks = array('AA', 'BA'); $query = "select * from table where name in ('" . implode("','", $checks) . "')"; and it should use an index (up to a point anyway). -- Postgresql & php tutorials http://www.designmagick.com/
im trying to allow the client to pass a varchar param into my function, and want to avoid any parsing of the parameter inside the function, or code to build a sql string. if the function can use this code, it will be compiled and optimized (unlike a dynamic sql stirng) select * from mytable where fielda ~ p_param (where p_param is the input parameter) On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote: > Postgres User wrote: > > Yes that works, but the whole point of the exercise is replace many OR > > statements with 1 regex expression. So it's not what I'm looking for. > > Why do you want it done this way? > > You can build an array of strings to check and use an in clause. > > Using php : > > $checks = array('AA', 'BA'); > > $query = "select * from table where name in ('" . implode("','", > $checks) . "')"; > > and it should use an index (up to a point anyway). > > -- > > Postgresql & php tutorials > http://www.designmagick.com/ >
"Postgres User" <postgres.developer@gmail.com> writes: > Yes that works, but the whole point of the exercise is replace many OR > statements with 1 regex expression. So it's not what I'm looking for. Unfortunately, Postgres is not as intelligent as you are. There is no mechanism to rewrite a multi-branch regex condition into multiple indexscans. I recommend going back to the OR's. regards, tom lane
doh! tom, let me know if you decide to hack out a fix for this one of these nights ;) thanks for your help. On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Postgres User" <postgres.developer@gmail.com> writes: > > Yes that works, but the whole point of the exercise is replace many OR > > statements with 1 regex expression. So it's not what I'm looking for. > > Unfortunately, Postgres is not as intelligent as you are. There is > no mechanism to rewrite a multi-branch regex condition into multiple > indexscans. I recommend going back to the OR's. > > regards, tom lane >
by the way, your example works fine unless it's a null value or empty string unfortunately, postgres isn't smart enough to know that the when p_param below is null, that the WHERE condition can be ignored select * from table where name in (Coalesce(p_param, name)) which is the same as: select * from table where name in (name) postgres does a row scan on the above sql. too slow. On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote: > Postgres User wrote: > > Yes that works, but the whole point of the exercise is replace many OR > > statements with 1 regex expression. So it's not what I'm looking for. > > Why do you want it done this way? > > You can build an array of strings to check and use an in clause. > > Using php : > > $checks = array('AA', 'BA'); > > $query = "select * from table where name in ('" . implode("','", > $checks) . "')"; > > and it should use an index (up to a point anyway). > > -- > > Postgresql & php tutorials > http://www.designmagick.com/ >
Postgres User wrote: > by the way, your example works fine unless it's a null value or empty string > unfortunately, postgres isn't smart enough to know that the when > p_param below is null, that the WHERE condition can be ignored > > select * from table where name in (Coalesce(p_param, name)) > > which is the same as: select * from table where name in (name) > > postgres does a row scan on the above sql. too slow. If there's no where condition, postgres has to do a seq-scan anyway so your argument is void :) -- Postgresql & php tutorials http://www.designmagick.com/
a final question: why does this syntax do a seq scan + filter: select * from tablea where fielda = fielda -or- select * from tablea where fielda in (fielda) while this syntax results in no filter, seq scan only select * from tablea where 1 = 1 it seems that both where clauses should be ignored by the optimizer- or am i missing something On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Postgres User" <postgres.developer@gmail.com> writes: > > Yes that works, but the whole point of the exercise is replace many OR > > statements with 1 regex expression. So it's not what I'm looking for. > > Unfortunately, Postgres is not as intelligent as you are. There is > no mechanism to rewrite a multi-branch regex condition into multiple > indexscans. I recommend going back to the OR's. > > regards, tom lane >
On Wed, Feb 20, 2008 at 12:56:54AM -0800, Postgres User wrote: > a final question: why does this syntax do a seq scan + filter: > > select * from tablea where fielda = fielda -or- select * from > tablea where fielda in (fielda) > > while this syntax results in no filter, seq scan only > > select * from tablea where 1 = 1 > > it seems that both where clauses should be ignored by the optimizer- > or am i missing something WHERE fielda = fielda will only match non-null rows... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Attachment
Postgres User wrote: > im trying to allow the client to pass a varchar param into my > function, and want to avoid any parsing of the parameter inside the > function, or code to build a sql string. > > if the function can use this code, it will be compiled and optimized > (unlike a dynamic sql stirng) > > select * from mytable where fielda ~ p_param No, you should never let users specify raw regex. at best they can hog down your server. Regex is a state engine and you can create endless loops. Maybe we can see the overall picture of your query? Regards Tino
Tino, My users are developers and the goal was to accept a simple comma-delimited list of string values as a function's input parameter. The function would then parse this input param into a valid regex expression. I was trying to write a function that lets me avoid using Execute <string> and instead write in-line SQL with all the benefits of pre-compilation and optimization. Regex offers such a technique- IF it could understand regex that represented a set of logical ORs and do an index scan (my rule is to avoid seq-scans) An example of regex that allows you to use in-line SQL with a condition equivalent to many OR conditions when using basic comparison operators: select * from table1 where name ~ '.*' '^Smith$' |^Jones$': And this works very well- except for the seq scan instead of an index scan On Feb 20, 2008 2:31 AM, Tino Wildenhain <tino@wildenhain.de> wrote: > Postgres User wrote: > > im trying to allow the client to pass a varchar param into my > > function, and want to avoid any parsing of the parameter inside the > > function, or code to build a sql string. > > > > if the function can use this code, it will be compiled and optimized > > (unlike a dynamic sql stirng) > > > > select * from mytable where fielda ~ p_param > > No, you should never let users specify raw regex. at best they can > hog down your server. Regex is a state engine and you can create > endless loops. > > Maybe we can see the overall picture of your query? > > Regards > Tino >
"Postgres User" <postgres.developer@gmail.com> writes: > My users are developers and the goal was to accept a simple > comma-delimited list of string values as a function's input parameter. > The function would then parse this input param into a valid regex > expression. Why are you fixated on this being a regex? If you aren't actually trying to expose regex capabilities to the users, you'll just be having to suppress a bunch of strange behaviors for special characters. ISTM that the best solution is to use an array-of-text parameter, along the lines of where name = any (array['Smith', 'Jones', ...]) For what you're doing, you'd not actually want the array[] syntax, it would look more like where name = any ('{Smith,Jones}'::text[]) This should optimize into an indexscan in 8.2 or later. regards, tom lane
Tom, I was looking for another approach but didn't come across that array syntax in my searches (perhaps because it's newer. Thanks for a solution. Now to end my fixation, one last item. What about the case of a null or empty param value- is there a way to assign a condition value that Postgres will ignore when processing the query? This syntax results in a seq scan: WHERE fielda = Coalesce(param, fielda) because it applies only to non-nulls Is there another way to write this- perhaps using your array syntax on an empty array? Basically I'd PG to ignore the condition just as it ignores WHERE 1 = 1 On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Postgres User" <postgres.developer@gmail.com> writes: > > > My users are developers and the goal was to accept a simple > > comma-delimited list of string values as a function's input parameter. > > The function would then parse this input param into a valid regex > > expression. > > Why are you fixated on this being a regex? If you aren't actually > trying to expose regex capabilities to the users, you'll just be having > to suppress a bunch of strange behaviors for special characters. > > ISTM that the best solution is to use an array-of-text parameter, > along the lines of > > where name = any (array['Smith', 'Jones', ...]) > > For what you're doing, you'd not actually want the array[] syntax, > it would look more like > > where name = any ('{Smith,Jones}'::text[]) > > This should optimize into an indexscan in 8.2 or later. > > regards, tom lane >
On Feb 20, 2008, at 5:51 PM, Postgres User wrote: > Now to end my fixation, one last item. What about the case of a null > or empty param value- is there a way to assign a condition value that > Postgres will ignore when processing the query? > > This syntax results in a seq scan: WHERE fielda = Coalesce(param, > fielda) > because it applies only to non-nulls > > Is there another way to write this- perhaps using your array syntax on > an empty array? Basically I'd PG to ignore the condition just as it > ignores WHERE 1 = 1 Not sure whether you want no rows returned if param is NULL or all rows (all rows looking at your example), but you could simply check param for NULL before comparing it, so either: WHERE param IS NOT NULL AND fielda = param or WHERE param IS NULL OR fielda = param In the second case, if param IS NULL you will get a sequential scan of course, as that's the most efficient way to return all rows. > > On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Postgres User" <postgres.developer@gmail.com> writes: >> >>> My users are developers and the goal was to accept a simple >>> comma-delimited list of string values as a function's input >>> parameter. >>> The function would then parse this input param into a valid regex >>> expression. >> >> Why are you fixated on this being a regex? If you aren't actually >> trying to expose regex capabilities to the users, you'll just be >> having >> to suppress a bunch of strange behaviors for special characters. >> >> ISTM that the best solution is to use an array-of-text parameter, >> along the lines of >> >> where name = any (array['Smith', 'Jones', ...]) >> >> For what you're doing, you'd not actually want the array[] syntax, >> it would look more like >> >> where name = any ('{Smith,Jones}'::text[]) >> >> This should optimize into an indexscan in 8.2 or later. >> >> regards, tom lane >> > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > > > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,47c1522f233091890169212!