Thread: regular expressions in query

regular expressions in query

From
fiona
Date:
My database table holds phone numbers that may contain characters other
than digits (that's not a problem in itself).

I want to be able to apply a regular expression (to ignore all
characters except digits) to the attribute 'phone' first and then for
the ILIKE to compare
the result to $telephone. I can't find any way of applying the RE to phone.
My current query without the RE is as follows:

SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE
'%".addslashes($telephone)."%'"
I want to do something like:  AND phone([^[:digit:]]) ILIKE $telephone
But this doesn't work.
Any ideas?
--
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>
<http://www.mozilla.org/products/thunderbird/>

Re: regular expressions in query

From
Jeff Davis
Date:
Try using the "~" regex matching operator instead of ILIKE.

Regards,
    Jeff Davis

On Fri, 2005-02-11 at 22:21 +0000, fiona wrote:
> My database table holds phone numbers that may contain characters other
> than digits (that's not a problem in itself).
>
> I want to be able to apply a regular expression (to ignore all
> characters except digits) to the attribute 'phone' first and then for
> the ILIKE to compare
> the result to $telephone. I can't find any way of applying the RE to phone.
> My current query without the RE is as follows:
>
> SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE
> '%".addslashes($telephone)."%'"
> I want to do something like:  AND phone([^[:digit:]]) ILIKE $telephone
> But this doesn't work.
> Any ideas?


Re: regular expressions in query

From
"F.Bissett"
Date:

On Fri, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:
>�Try using the "~" regex matching operator instead of ILIKE.
>
>�Regards,
>�Jeff Davis
>

 

I still need the ILIKE to compare the returned value with $telephone.

 

I have the following PHP to check an input string for non numeric characters:

 

$tel = ereg_replace('[^[:digit:]]', "", $test);   --  $tel then equals only the numbers in test.

 

This is what I want to be able to do inside the query, but without altering the values in the database - to look at the column 'phone', see if there are any non-numeric characters and ignore them then compare the numbers that are left with $telephone.

 

I tried "AND phone ~ '[^[:digit:]]' ILIKE $telephone"

But get the following error:

 

Warning:pg_query(): Query failed: Error: operator does not exist: boolean ~~* "unknown" HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.

 

cheers,

Fiona

Re: regular expressions in query

From
Scott Marlowe
Date:
On Sat, 2005-02-12 at 10:31, F.Bissett wrote:
> On Fri, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:
> >Try using the "~" regex matching operator instead of ILIKE.
> >
> >Regards,
> >Jeff Davis
> >
>
>
>
> I still need the ILIKE to compare the returned value with $telephone.
>
>
>
> I have the following PHP to check an input string for non numeric
> characters:
>
>
>
> $tel = ereg_replace('[^[:digit:]]', "", $test);   --  $tel then equals
> only the numbers in test.
>
>
>
> This is what I want to be able to do inside the query, but without
> altering the values in the database - to look at the column 'phone',
> see if there are any non-numeric characters and ignore them then
> compare the numbers that are left with $telephone.
>
>
>
> I tried "AND phone ~ '[^[:digit:]]' ILIKE $telephone"
>
> But get the following error:

Try anding them:

where phone ILIKE $telephone AND phone ~ '[.... etc...

Re: regular expressions in query

From
Tom Lane
Date:
"F.Bissett" <fbissett@blueyonder.co.uk> writes:
> </head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
> , 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
> font-size:10pt;color:navy;">>=A0Try using the "~" regex matching operato=
> r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
> N style=3D"font-size:10pt;color:navy;">></SPAN><SPAN style=3D"font-size:=
> 10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">>=A0Regar=
> ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
> size:10pt;color:navy;">>=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
> t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">></SPAN></p>

Please don't post HTML email; it's a pain in the neck to quote.

> I have the following PHP to check an input string for non numeric characters:
>
> $tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.

The closest equivalent we have to that is the regex-extraction version
of the substring() function --- see
http://www.postgresql.org/docs/8.0/static/functions-matching.html
It would go something like

    substring($test from '[0-9]+')

However, what that actually gets you is the first all-numeric substring;
if there are multiple occurrences of digits separated by non-digits this
will not do what you want.

My advice is to write the function you want in one of the PLs that have
good string-mashing facilities --- either plperl or pltcl would
certainly do.  (Probably plpython too, but I'm not very familiar with
Python.)  Plain SQL is not very strong on string manipulation, but
that's why we have extension languages.

            regards, tom lane

Re: regular expressions in query

From
elein@varlena.com (elein)
Date:
No doubt someone more adept at perl can write
this function as a one-liner.

create or replace function just_digits(text)
returns text as
$$
        my $innum = $_[0];
        $innum =~ s/\D//g;
        return $innum;
$$ language 'plperl'

     SELECT telephone FROM addresses
        WHERE user_id = 'bob'
        AND just_digits(telephone) = '1115551212';

--elein


On Sat, Feb 12, 2005 at 12:27:20PM -0500, Tom Lane wrote:
> "F.Bissett" <fbissett@blueyonder.co.uk> writes:
> > </head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
> > , 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
> > font-size:10pt;color:navy;">>=A0Try using the "~" regex matching operato=
> > r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
> > N style=3D"font-size:10pt;color:navy;">></SPAN><SPAN style=3D"font-size:=
> > 10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">>=A0Regar=
> > ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
> > size:10pt;color:navy;">>=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
> > t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">></SPAN></p>
>
> Please don't post HTML email; it's a pain in the neck to quote.
>
> > I have the following PHP to check an input string for non numeric characters:
> >
> > $tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.
>
> The closest equivalent we have to that is the regex-extraction version
> of the substring() function --- see
> http://www.postgresql.org/docs/8.0/static/functions-matching.html
> It would go something like
>
>     substring($test from '[0-9]+')
>
> However, what that actually gets you is the first all-numeric substring;
> if there are multiple occurrences of digits separated by non-digits this
> will not do what you want.
>
> My advice is to write the function you want in one of the PLs that have
> good string-mashing facilities --- either plperl or pltcl would
> certainly do.  (Probably plpython too, but I'm not very familiar with
> Python.)  Plain SQL is not very strong on string manipulation, but
> that's why we have extension languages.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: regular expressions in query

From
Russ Brown
Date:
elein wrote:
> No doubt someone more adept at perl can write
> this function as a one-liner.
>
> create or replace function just_digits(text)
> returns text as
> $$
>         my $innum = $_[0];
>         $innum =~ s/\D//g;
>         return $innum;
> $$ language 'plperl'
>
>      SELECT telephone FROM addresses
>         WHERE user_id = 'bob'
>         AND just_digits(telephone) = '1115551212';
>
> --elein
>

I've  thought about things like this in the past, and a thought that
occurred to me was to add a functional index on just_digits(telephone)
to the table. Would this not allow the above query to use an index while
searching?

--

Russ.

Re: regular expressions in query

From
Lincoln Yeoh
Date:
At 09:57 AM 2/13/2005 +0000, Russ Brown wrote:

>I've  thought about things like this in the past, and a thought that
>occurred to me was to add a functional index on just_digits(telephone) to
>the table. Would this not allow the above query to use an index while
>searching?

I think it should. But for phone numbers it may be better to reverse the
digits before indexing - usually whilst the area code changes, the last 4
or 5 digits don't change.

This way you can do a LIKE search on *5678. Where the number ends with 5678.

I'm not sure how to get Postgresql to index from the ending to the start of
a string vs the normal from the start to the end, so in my webapp I
reversed it at the application layer. If you are going to do this sort of
thing at the application layer you might as well do the nondigit removal
there too.

e.g.
$phone=~tr/0-9%_//cd; # I allowed the wildcards % and _
$phone=reverse $phone;

You may still wish to store the phone numbers "as is" for display purposes.

Link.



Re: regular expressions in query

From
"J. Greenlees"
Date:

Lincoln Yeoh wrote:
> At 09:57 AM 2/13/2005 +0000, Russ Brown wrote:
>
>> I've  thought about things like this in the past, and a thought that
>> occurred to me was to add a functional index on just_digits(telephone)
>> to the table. Would this not allow the above query to use an index
>> while searching?
>
>
> I think it should. But for phone numbers it may be better to reverse the
> digits before indexing - usually whilst the area code changes, the last
> 4 or 5 digits don't change.
>
> This way you can do a LIKE search on *5678. Where the number ends with
> 5678.
>
> I'm not sure how to get Postgresql to index from the ending to the start
> of a string vs the normal from the start to the end, so in my webapp I
> reversed it at the application layer. If you are going to do this sort
> of thing at the application layer you might as well do the nondigit
> removal there too.
>
> e.g.
> $phone=~tr/0-9%_//cd; # I allowed the wildcards % and _
> $phone=reverse $phone;
>
> You may still wish to store the phone numbers "as is" for display purposes.
>
> Link.
>
make sure the table stores as text rather than as numeric data.
then you can use the excellent perl string tools to pull the last 4
characters of the number.

$base=((strlen-4,strlen)

$base being the last 4 digits.
then convert to numeric to test against search requirements.

Jaqui




Re: regular expressions in query

From
Lincoln Yeoh
Date:
But that method would be specific for searches for the last 4 digits. It
won't work as well for the general case of the last X digits.

To clarify the method I suggested:

Say a phone number is: 818 9567 1234

You reverse the number and store it as text and index it as

43217659818

Then if someone searches for 5671234 you reverse the query string and do a
search for

select * from phonebook where number like '4321765%' and ....

If they enter just the last 5 digits: 71234

select * from phonebook where number like '43217%' and ....

These sort of searches are indexable on postgresql.

Link.

At 04:33 AM 2/13/2005 -0800, J. Greenlees wrote:

>Lincoln Yeoh wrote:
>I think it should. But for phone numbers it may be better to reverse the
>digits before indexing - usually whilst the area code changes, the last 4
>or 5 digits don't change.
>>This way you can do a LIKE search on *5678. Where the number ends with 5678.
>make sure the table stores as text rather than as numeric data.
>then you can use the excellent perl string tools to pull the last 4
>characters of the number.
>
>$base=((strlen-4,strlen)
>
>$base being the last 4 digits.
>then convert to numeric to test against search requirements.
>
>Jaqui