Home > mailing lists

Re: Why this regexp matches?! - Mailing list pgsql-general

From	David Johnston
Subject	Re: Why this regexp matches?!
Date	February 4, 2012 12:30:46
Msg-id	8887FCC5-2FC2-4609-B1FC-11EB81F01B86@yahoo.com Whole thread Raw
In response to	Re: Why this regexp matches?! (hubert depesz lubaczewski <depesz@depesz.com>)
List	pgsql-general

Tree view

On Feb 4, 2012, at 3:58, hubert depesz lubaczewski <depesz@depesz.com> wrote:

> On Sat, Feb 04, 2012 at 09:54:34AM +0100, Szymon Guz wrote:
>> On 4 February 2012 09:46, hubert depesz lubaczewski <depesz@depesz.com>wrote:
>>
>>> select 'depesz depeszx depesz' ~ E'^(.*)( \\1)+$';
>>>
>>> what's worse:
>>> $ select regexp_replace( 'depesz depeszx depesz', E'^(.*)( \\1)+$', E'\\1'
>>> );
>>> regexp_replace
>>> ────────────────
>>> depesz
>>> (1 row)
>>>
>>> I know that Pg regexps are limited, but even grep's regexps match this
>>> correctly:
>>>
>>> =$ printf 'depesz depesz depesz\ndepesz depeszx depesz\n' | grep -E
>>> '^(.*)( \1)+$';
>>> depesz depesz depesz
>>>
>>> Best regards,
>>>
>>> depesz
>>>
>>>
>> Hi,
>> some time ago I hit the same problem, however the solution was a little bit
>> tricky. I didn't have time to investigate it, but this works:
>>
>> postgres@postgres:5840=#  select regexp_replace( 'depesz depeszx depesz',
>> E'^(.*)( \\\\1)+$', E'\\\\1' );
>>    regexp_replace
>> -----------------------
>> depesz depeszx depesz
>> (1 row)
>
> not sure if I understand your point.
>
> This regexp was meant to find repeated substrings.
>
> Like this one does in perl:
>
> /^(.*)( \1)+$/
>
> We can see how it works with:
> =$ perl -e 'if ( shift =~ m/^(.*)( \1)+$/ ) { print "is repeat of [$1]\n" } else {print "is not repeated\n"}' 'depesz
depeszdepesz' 
> is repeat of [depesz]
>
> =$ perl -e 'if ( shift =~ m/^(.*)( \1)+$/ ) { print "is repeat of [$1]\n" } else {print "is not repeated\n"}' 'depesz
depeszxdepesz' 
> is not repeated
>
> reason why your regexp matches is also a mystery for me.
>
> Best regards,
>
> depesz
>
>

Don't know the answer (if there is one other than 'it's a bug') but as a workaround you can split the string on
whitespacethen perform grouping and see if more than one record results... 

David J.

pgsql-general by date:

From: "mike@trausch.us"
Date: 04 February 2012, 10:33:31
Subject: Re: debugging the server[ module causes server cash]

From: Tom Lane
Date: 04 February 2012, 14:04:15
Subject: Re: debugging the server[ module causes server cash]

Re: Why this regexp matches?! - Mailing list pgsql-general

Previous

Next