Thread: Regex query not using index

Regex query not using index

From
"Postgres User"
Date:
I'm running a simple query on 8.2.  With this syntax, Explain indicate
that the index is scanned:
select * from eod where name = 'AA'

However, when I change the query to use simple regex:
select * from eod where name ~ 'AA'

now Explain indicates a seq scan:
Index Scan using equity_eod_symbol_idx on equity_eod  (cost=0.00..8.27
rows=1 width=149)
Index Cond: ((symbol)::text = 'AA'::text)

Is there any way to 'encourage' Postgres to hit the index when using
regex?  Do I need to create a functional index or something?
Without the index in play, I really can't use regex on any of my larger tables.

Re: Regex query not using index

From
Erik Jones
Date:
On Feb 19, 2008, at 9:32 PM, Postgres User wrote:

> I'm running a simple query on 8.2.  With this syntax, Explain indicate
> that the index is scanned:
> select * from eod where name = 'AA'
>
> However, when I change the query to use simple regex:
> select * from eod where name ~ 'AA'
>
> now Explain indicates a seq scan:
> Index Scan using equity_eod_symbol_idx on equity_eod  (cost=0.00..8.27
> rows=1 width=149)
> Index Cond: ((symbol)::text = 'AA'::text)
>
> Is there any way to 'encourage' Postgres to hit the index when using
> regex?  Do I need to create a functional index or something?
> Without the index in play, I really can't use regex on any of my
> larger tables.

You need it to be anchored:

select * from eod where name ~ '^AA';

If you're looking to be able to use indexes for searches within a
string then, for 8.2, you'll need to check out tsearch2.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Regex query not using index

From
"Postgres User"
Date:
Thanks, my dumb mistake.
I need to perform the equivalent of a WHERE clause OR expression using
regex to match exact strings.

_________________________________________________________________________

this example hits the index:
select * from eod where name ~ '^BA$'

but when I try to add another possible value to the regex, it does a row scan:
select * from eod where name ~ ^BA$|^AA$'

both of these statements return the right results, but the 2nd ignores
the index even though both values are left-anchored.

any workaround- this behavior doesn't seem to make sense

On Feb 19, 2008 8:45 PM, Erik Jones <erik@myemma.com> wrote:
>
> On Feb 19, 2008, at 9:32 PM, Postgres User wrote:
>
> > I'm running a simple query on 8.2.  With this syntax, Explain indicate
> > that the index is scanned:
> > select * from eod where name = 'AA'
> >
> > However, when I change the query to use simple regex:
> > select * from eod where name ~ 'AA'
> >
> > now Explain indicates a seq scan:
> > Index Scan using equity_eod_symbol_idx on equity_eod  (cost=0.00..8.27
> > rows=1 width=149)
> > Index Cond: ((symbol)::text = 'AA'::text)
> >
> > Is there any way to 'encourage' Postgres to hit the index when using
> > regex?  Do I need to create a functional index or something?
> > Without the index in play, I really can't use regex on any of my
> > larger tables.
>
> You need it to be anchored:
>
> select * from eod where name ~ '^AA';
>
> If you're looking to be able to use indexes for searches within a
> string then, for 8.2, you'll need to check out tsearch2.
>
> Erik Jones
>
> DBA | Emma(R)
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
>

Re: Regex query not using index

From
Chris
Date:
Postgres User wrote:
> Thanks, my dumb mistake.
> I need to perform the equivalent of a WHERE clause OR expression using
> regex to match exact strings.
>
> _________________________________________________________________________
>
> this example hits the index:
> select * from eod where name ~ '^BA$'
>
> but when I try to add another possible value to the regex, it does a row scan:
> select * from eod where name ~ ^BA$|^AA$'
>
> both of these statements return the right results, but the 2nd ignores
> the index even though both values are left-anchored.
>
> any workaround- this behavior doesn't seem to make sense

try changing it to

select * from eod where (name ~ '^BA$' or name ~ '^AA$')

though in this example they should both be name = 'XX' rather than regex'es.

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Regex query not using index

From
"Postgres User"
Date:
Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression.  So it's not what I'm looking for.

On Feb 19, 2008 9:16 PM, Chris <dmagick@gmail.com> wrote:
> Postgres User wrote:
> > Thanks, my dumb mistake.
> > I need to perform the equivalent of a WHERE clause OR expression using
> > regex to match exact strings.
> >
> > _________________________________________________________________________
> >
> > this example hits the index:
> > select * from eod where name ~ '^BA$'
> >
> > but when I try to add another possible value to the regex, it does a row scan:
> > select * from eod where name ~ ^BA$|^AA$'
> >
> > both of these statements return the right results, but the 2nd ignores
> > the index even though both values are left-anchored.
> >
> > any workaround- this behavior doesn't seem to make sense
>
> try changing it to
>
> select * from eod where (name ~ '^BA$' or name ~ '^AA$')
>
> though in this example they should both be name = 'XX' rather than regex'es.
>
> --
> Postgresql & php tutorials
> http://www.designmagick.com/
>

Re: Regex query not using index

From
Chris
Date:
Postgres User wrote:
> Yes that works, but the whole point of the exercise is replace many OR
> statements with 1 regex expression.  So it's not what I'm looking for.

Why do you want it done this way?

You can build an array of strings to check and use an in clause.

Using php :

$checks = array('AA', 'BA');

$query = "select * from table where name in ('" . implode("','",
$checks) . "')";

and it should use an index (up to a point anyway).

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Regex query not using index

From
"Postgres User"
Date:
im trying to allow the client to pass a varchar param into my
function, and want to avoid any parsing of the parameter inside the
function, or code to build a sql string.

if the function can use this code, it will be compiled and optimized
(unlike a dynamic sql stirng)

select * from mytable where fielda ~ p_param

(where p_param is the input parameter)

On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote:
> Postgres User wrote:
> > Yes that works, but the whole point of the exercise is replace many OR
> > statements with 1 regex expression.  So it's not what I'm looking for.
>
> Why do you want it done this way?
>
> You can build an array of strings to check and use an in clause.
>
> Using php :
>
> $checks = array('AA', 'BA');
>
> $query = "select * from table where name in ('" . implode("','",
> $checks) . "')";
>
> and it should use an index (up to a point anyway).
>
> --
>
> Postgresql & php tutorials
> http://www.designmagick.com/
>

Re: Regex query not using index

From
Tom Lane
Date:
"Postgres User" <postgres.developer@gmail.com> writes:
> Yes that works, but the whole point of the exercise is replace many OR
> statements with 1 regex expression.  So it's not what I'm looking for.

Unfortunately, Postgres is not as intelligent as you are.  There is
no mechanism to rewrite a multi-branch regex condition into multiple
indexscans.  I recommend going back to the OR's.

            regards, tom lane

Re: Regex query not using index

From
"Postgres User"
Date:
doh!  tom, let me know if you decide to hack out a fix for this one of
these nights ;)
thanks for your help.

On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Postgres User" <postgres.developer@gmail.com> writes:
> > Yes that works, but the whole point of the exercise is replace many OR
> > statements with 1 regex expression.  So it's not what I'm looking for.
>
> Unfortunately, Postgres is not as intelligent as you are.  There is
> no mechanism to rewrite a multi-branch regex condition into multiple
> indexscans.  I recommend going back to the OR's.
>
>                         regards, tom lane
>

Re: Regex query not using index

From
"Postgres User"
Date:
by the way, your example works fine unless it's a null value or empty string
unfortunately, postgres isn't smart enough to know that the when
p_param below is null, that the WHERE condition can be ignored

select * from table where name in (Coalesce(p_param, name))

which is the same as:   select * from table where name in (name)

postgres does a row scan on the above sql.  too slow.


On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote:
> Postgres User wrote:
> > Yes that works, but the whole point of the exercise is replace many OR
> > statements with 1 regex expression.  So it's not what I'm looking for.
>
> Why do you want it done this way?
>
> You can build an array of strings to check and use an in clause.
>
> Using php :
>
> $checks = array('AA', 'BA');
>
> $query = "select * from table where name in ('" . implode("','",
> $checks) . "')";
>
> and it should use an index (up to a point anyway).
>
> --
>
> Postgresql & php tutorials
> http://www.designmagick.com/
>

Re: Regex query not using index

From
Chris
Date:
Postgres User wrote:
> by the way, your example works fine unless it's a null value or empty string
> unfortunately, postgres isn't smart enough to know that the when
> p_param below is null, that the WHERE condition can be ignored
>
> select * from table where name in (Coalesce(p_param, name))
>
> which is the same as:   select * from table where name in (name)
>
> postgres does a row scan on the above sql.  too slow.

If there's no where condition, postgres has to do a seq-scan anyway so
your argument is void :)

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Regex query not using index

From
"Postgres User"
Date:
a final question: why does this syntax do a seq scan + filter:

select * from tablea where fielda = fielda   -or-  select * from
tablea where fielda in (fielda)

while this syntax results in no filter, seq scan only

select * from tablea where 1 = 1

it seems that both where clauses should be ignored by the optimizer-
or am i missing something

On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Postgres User" <postgres.developer@gmail.com> writes:
> > Yes that works, but the whole point of the exercise is replace many OR
> > statements with 1 regex expression.  So it's not what I'm looking for.
>
> Unfortunately, Postgres is not as intelligent as you are.  There is
> no mechanism to rewrite a multi-branch regex condition into multiple
> indexscans.  I recommend going back to the OR's.
>
>                         regards, tom lane
>

Re: Regex query not using index

From
Martijn van Oosterhout
Date:
On Wed, Feb 20, 2008 at 12:56:54AM -0800, Postgres User wrote:
> a final question: why does this syntax do a seq scan + filter:
>
> select * from tablea where fielda = fielda   -or-  select * from
> tablea where fielda in (fielda)
>
> while this syntax results in no filter, seq scan only
>
> select * from tablea where 1 = 1
>
> it seems that both where clauses should be ignored by the optimizer-
> or am i missing something

WHERE fielda = fielda will only match non-null rows...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Regex query not using index

From
Tino Wildenhain
Date:
Postgres User wrote:
> im trying to allow the client to pass a varchar param into my
> function, and want to avoid any parsing of the parameter inside the
> function, or code to build a sql string.
>
> if the function can use this code, it will be compiled and optimized
> (unlike a dynamic sql stirng)
>
> select * from mytable where fielda ~ p_param

No, you should never let users specify raw regex. at best they can
hog down your server. Regex is a state engine and you can create
endless loops.

Maybe we can see the overall picture of your query?

Regards
Tino

Re: Regex query not using index

From
"Postgres User"
Date:
Tino,

My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input parameter.
 The function would then parse this input param into a valid regex
expression.

I was trying to write a function that lets me avoid using Execute
<string> and instead write in-line SQL with all the benefits of
pre-compilation and optimization.

Regex offers such a technique- IF it could understand regex that
represented a set of logical ORs and do an index scan (my rule is to
avoid seq-scans)

An example of regex that allows you to use in-line SQL with a
condition equivalent to many OR conditions when using basic comparison
operators:

select * from table1
where name ~ '.*' '^Smith$' |^Jones$':

And this works very well- except for the seq scan instead of an index scan



On Feb 20, 2008 2:31 AM, Tino Wildenhain <tino@wildenhain.de> wrote:
> Postgres User wrote:
> > im trying to allow the client to pass a varchar param into my
> > function, and want to avoid any parsing of the parameter inside the
> > function, or code to build a sql string.
> >
> > if the function can use this code, it will be compiled and optimized
> > (unlike a dynamic sql stirng)
> >
> > select * from mytable where fielda ~ p_param
>
> No, you should never let users specify raw regex. at best they can
> hog down your server. Regex is a state engine and you can create
> endless loops.
>
> Maybe we can see the overall picture of your query?
>
> Regards
> Tino
>

Re: Regex query not using index

From
Tom Lane
Date:
"Postgres User" <postgres.developer@gmail.com> writes:
> My users are developers and the goal was to accept a simple
> comma-delimited list of string values as a function's input parameter.
>  The function would then parse this input param into a valid regex
> expression.

Why are you fixated on this being a regex?  If you aren't actually
trying to expose regex capabilities to the users, you'll just be having
to suppress a bunch of strange behaviors for special characters.

ISTM that the best solution is to use an array-of-text parameter,
along the lines of

    where name = any (array['Smith', 'Jones', ...])

For what you're doing, you'd not actually want the array[] syntax,
it would look more like

    where name = any ('{Smith,Jones}'::text[])

This should optimize into an indexscan in 8.2 or later.

            regards, tom lane

Re: Regex query not using index

From
"Postgres User"
Date:
Tom,

I was looking for another approach but didn't come across that array
syntax in my searches (perhaps because it's newer.  Thanks for a
solution.

Now to end my fixation, one last item.  What about the case of a null
or empty param value- is there a way to assign a condition value that
Postgres will ignore when processing the query?

This syntax results in a seq scan:   WHERE fielda = Coalesce(param, fielda)
because it applies only to non-nulls

Is there another way to write this- perhaps using your array syntax on
an empty array?  Basically I'd PG to ignore the condition just as it
ignores   WHERE 1 = 1


On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Postgres User" <postgres.developer@gmail.com> writes:
>
> > My users are developers and the goal was to accept a simple
>  > comma-delimited list of string values as a function's input parameter.
>  >  The function would then parse this input param into a valid regex
>  > expression.
>
>  Why are you fixated on this being a regex?  If you aren't actually
>  trying to expose regex capabilities to the users, you'll just be having
>  to suppress a bunch of strange behaviors for special characters.
>
>  ISTM that the best solution is to use an array-of-text parameter,
>  along the lines of
>
>         where name = any (array['Smith', 'Jones', ...])
>
>  For what you're doing, you'd not actually want the array[] syntax,
>  it would look more like
>
>         where name = any ('{Smith,Jones}'::text[])
>
>  This should optimize into an indexscan in 8.2 or later.
>
>                         regards, tom lane
>

Re: Regex query not using index

From
Alban Hertroys
Date:
On Feb 20, 2008, at 5:51 PM, Postgres User wrote:

> Now to end my fixation, one last item.  What about the case of a null
> or empty param value- is there a way to assign a condition value that
> Postgres will ignore when processing the query?
>
> This syntax results in a seq scan:   WHERE fielda = Coalesce(param,
> fielda)
> because it applies only to non-nulls
>
> Is there another way to write this- perhaps using your array syntax on
> an empty array?  Basically I'd PG to ignore the condition just as it
> ignores   WHERE 1 = 1

Not sure whether you want no rows returned if param is NULL or all
rows (all rows looking at your example), but you could simply check
param for NULL before comparing it, so either:

WHERE param IS NOT NULL AND fielda = param

or

WHERE param IS NULL OR fielda = param

In the second case, if param IS NULL you will get a sequential scan
of course, as that's the most efficient way to return all rows.

>
> On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Postgres User" <postgres.developer@gmail.com> writes:
>>
>>> My users are developers and the goal was to accept a simple
>>> comma-delimited list of string values as a function's input
>>> parameter.
>>>  The function would then parse this input param into a valid regex
>>> expression.
>>
>>  Why are you fixated on this being a regex?  If you aren't actually
>>  trying to expose regex capabilities to the users, you'll just be
>> having
>>  to suppress a bunch of strange behaviors for special characters.
>>
>>  ISTM that the best solution is to use an array-of-text parameter,
>>  along the lines of
>>
>>         where name = any (array['Smith', 'Jones', ...])
>>
>>  For what you're doing, you'd not actually want the array[] syntax,
>>  it would look more like
>>
>>         where name = any ('{Smith,Jones}'::text[])
>>
>>  This should optimize into an indexscan in 8.2 or later.
>>
>>                         regards, tom lane
>>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>
>

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c1522f233091890169212!