Thread: why non-greedy modifier for one atom changes greediness of other atoms?

why non-greedy modifier for one atom changes greediness of other atoms?

From
hubert depesz lubaczewski
Date:
Example:
# select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
        x        | substring
-----------------+-----------
 ab.123xxx.46hfd | ab.1
 a.b.c.d.123xx   | a.b.c.d.1
(2 rows)


I found in docs, that this is what happens, but I don't understand the
logic behind forcing unique greediness in whole expression.

Also - how can one write a regexp that will match "ab.123" and
"a.b.c.d.123" respectively?

in pl/perl it's of course trivial, but I can't seem to find a way to do it in substring() regexps.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: why non-greedy modifier for one atom changes greediness of other atoms?

From
hubert depesz lubaczewski
Date:
On Mon, Jan 04, 2010 at 11:30:51AM +0100, hubert depesz lubaczewski wrote:
> Example:
> # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
>         x        | substring
> -----------------+-----------
>  ab.123xxx.46hfd | ab.1
>  a.b.c.d.123xx   | a.b.c.d.1
> (2 rows)
>
>
> I found in docs, that this is what happens, but I don't understand the
> logic behind forcing unique greediness in whole expression.
>
> Also - how can one write a regexp that will match "ab.123" and
> "a.b.c.d.123" respectively?


sorry - it could have be unclear - in case of string 'ab123bc.12xx'
return value should be 'ab123bc.12' - i.e. we have to search to first .
followed by digits and return it from beginning of string to the last of
digits.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: why non-greedy modifier for one atom changesgreediness of other atoms?

From
"Albe Laurenz"
Date:
hubert depesz lubaczewski wrote:
>> Example:
>> # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from 
>( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
>>         x        | substring
>> -----------------+-----------
>>  ab.123xxx.46hfd | ab.1
>>  a.b.c.d.123xx   | a.b.c.d.1
>> (2 rows)
>> 
>> 
>> I found in docs, that this is what happens, but I don't understand the
>> logic behind forcing unique greediness in whole expression.

Yes, that's odd.

>> Also - how can one write a regexp that will match "ab.123" and
>> "a.b.c.d.123" respectively?
> 
> 
> sorry - it could have be unclear - in case of string 'ab123bc.12xx'
> return value should be 'ab123bc.12' - i.e. we have to search to first .
> followed by digits and return it from beginning of string to the last of
> digits.

You could add a negative lookahead to exclude digits after the last match:

... substring(x from E'^(.*?\\.\\d+(?!\\d))') ...

Yours,
Laurenz Albe