Thread: Inside the Regex Engine

Inside the Regex Engine

From
david@fetter.org (David Fetter)
Date:
Kind people,

As a perl weenie, I'm used to being able to do things with regexes
like

$text =~ s/(foo|bar|baz)/NO UNIX WEENIES HERE/;
$got_it = $1;

While PL/Perl is great, it's not available everywhere, and I'd like to
be able to grab atoms from a regex match in, say, a SELECT.  Is there
some way to get access to them?

TIA for any pointers on this :)

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100    cell: +1 415 235 3778

Civil government, so far as it is instituted for the security of
property, is in reality instituted for the defense of the rich against
the poor, or of those who have some property against those who have
none at all.                                                   Adam Smith


Re: Inside the Regex Engine

From
Alvaro Herrera
Date:
On Tue, Dec 02, 2003 at 07:52:57PM -0600, David Fetter wrote:

> As a perl weenie, I'm used to being able to do things with regexes
> like
> 
> $text =~ s/(foo|bar|baz)/NO UNIX WEENIES HERE/;
> $got_it = $1;
> 
> While PL/Perl is great, it's not available everywhere, and I'd like to
> be able to grab atoms from a regex match in, say, a SELECT.  Is there
> some way to get access to them?

Huh, the best I am able to do is

alvh=> select substring('bazfoo fubar', 'fu(foo|bar)');substring
-----------bar
(1 fila)

The choice of the name for the function seems weird to me.  Also note
that you are able to use only one set of parenthesis (i.e. the first
gets picked up, the rest is ignored).

If you need to be able to extract further things, there's a tip in the
docuemntation that reads

"If you have pattern matching needs that go beyond this, consider
writing a user-defined function in Perl or Tcl."

It does not appear to be that difficult to add the functionality needed
to extract random atoms, but there's some hacking involved.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Cuando no hay humildad las personas se degradan" (A. Christie)


Re: Inside the Regex Engine

From
Tom Lane
Date:
david@fetter.org (David Fetter) writes:
> While PL/Perl is great, it's not available everywhere, and I'd like to
> be able to grab atoms from a regex match in, say, a SELECT.  Is there
> some way to get access to them?

There's a three-parameter variant of substring() that allows extraction
of a portion of a regex match --- unfortunately it uses SQL99's
brain-dead notion of regex, which will not satisfy any Perl weenie :-(

I think it'd be worth our while to define some comparable functionality
that depends only on the POSIX regex engine ...
        regards, tom lane


Re: Inside the Regex Engine

From
Andrew Dunstan
Date:
Tom Lane wrote:

>david@fetter.org (David Fetter) writes:
>  
>
>>While PL/Perl is great, it's not available everywhere, and I'd like to
>>be able to grab atoms from a regex match in, say, a SELECT.  Is there
>>some way to get access to them?
>>    
>>
>
>There's a three-parameter variant of substring() that allows extraction
>of a portion of a regex match --- unfortunately it uses SQL99's
>brain-dead notion of regex, which will not satisfy any Perl weenie :-(
>
>I think it'd be worth our while to define some comparable functionality
>that depends only on the POSIX regex engine ...
>  
>

substitute should be relatively straightforward, I guess; split and 
match maybe less so - what do you return? An array? Or you could require 
an explicit subscript to get a particular return value as in 
split_part(), which would be potentially inefficient if you want more 
than one (although I guess results could be cached).

cheers

andrew



Re: Inside the Regex Engine

From
david@fetter.org (David Fetter)
Date:
Andrew Dunstan <andrew@dunslane.net> wrote:
> Tom Lane wrote:
> 
>>david@fetter.org (David Fetter) writes:
>>  
>>
>>>While PL/Perl is great, it's not available everywhere, and I'd like
>>>to be able to grab atoms from a regex match in, say, a SELECT.  Is
>>>there some way to get access to them?
>>
>>There's a three-parameter variant of substring() that allows
>>extraction of a portion of a regex match --- unfortunately it uses
>>SQL99's brain-dead notion of regex, which will not satisfy any Perl
>>weenie :-(
>>
>>I think it'd be worth our while to define some comparable
>>functionality that depends only on the POSIX regex engine ...
> 
> substitute should be relatively straightforward, I guess; split and
> match maybe less so - what do you return? An array?

That would be great.

> Or you could require an explicit subscript to get a particular
> return value as in split_part(), which would be potentially
> inefficient if you want more than one (although I guess results
> could be cached).

That'd be good, too.

Cheers
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100    cell: +1 415 235 3778

My definition of a free society is a society where it is safe to be
unpopular.                                                   Adlai Stevenson


Re: Inside the Regex Engine

From
david@fetter.org (David Fetter)
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> david@fetter.org (David Fetter) writes:
>> While PL/Perl is great, it's not available everywhere, and I'd like
>> to be able to grab atoms from a regex match in, say, a SELECT.  Is
>> there some way to get access to them?
> 
> There's a three-parameter variant of substring() that allows
> extraction of a portion of a regex match --- unfortunately it uses
> SQL99's brain-dead notion of regex, which will not satisfy any Perl
> weenie :-(
> 
> I think it'd be worth our while to define some comparable
> functionality that depends only on the POSIX regex engine ...

What pieces of the source code would be involved?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100    cell: +1 415 235 3778

Transported to a surreal landscape, a young girl kills the first
woman she meets and then teams up with three complete strangers
to kill again.         Marin County newspaper's TV listing for The Wizard of Oz