On Feb 4, 2012, at 3:58, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> On Sat, Feb 04, 2012 at 09:54:34AM +0100, Szymon Guz wrote:
>> On 4 February 2012 09:46, hubert depesz lubaczewski <depesz@depesz.com>wrote:
>>
>>> select 'depesz depeszx depesz' ~ E'^(.*)( \\1)+$';
>>>
>>> what's worse:
>>> $ select regexp_replace( 'depesz depeszx depesz', E'^(.*)( \\1)+$', E'\\1'
>>> );
>>> regexp_replace
>>> ────────────────
>>> depesz
>>> (1 row)
>>>
>>> I know that Pg regexps are limited, but even grep's regexps match this
>>> correctly:
>>>
>>> =$ printf 'depesz depesz depesz\ndepesz depeszx depesz\n' | grep -E
>>> '^(.*)( \1)+$';
>>> depesz depesz depesz
>>>
>>> Best regards,
>>>
>>> depesz
>>>
>>>
>> Hi,
>> some time ago I hit the same problem, however the solution was a little bit
>> tricky. I didn't have time to investigate it, but this works:
>>
>> postgres@postgres:5840=# select regexp_replace( 'depesz depeszx depesz',
>> E'^(.*)( \\\\1)+$', E'\\\\1' );
>> regexp_replace
>> -----------------------
>> depesz depeszx depesz
>> (1 row)
>
> not sure if I understand your point.
>
> This regexp was meant to find repeated substrings.
>
> Like this one does in perl:
>
> /^(.*)( \1)+$/
>
> We can see how it works with:
> =$ perl -e 'if ( shift =~ m/^(.*)( \1)+$/ ) { print "is repeat of [$1]\n" } else {print "is not repeated\n"}' 'depesz
depeszdepesz'
> is repeat of [depesz]
>
> =$ perl -e 'if ( shift =~ m/^(.*)( \1)+$/ ) { print "is repeat of [$1]\n" } else {print "is not repeated\n"}' 'depesz
depeszxdepesz'
> is not repeated
>
> reason why your regexp matches is also a mystery for me.
>
> Best regards,
>
> depesz
>
>
Don't know the answer (if there is one other than 'it's a bug') but as a workaround you can split the string on
whitespacethen perform grouping and see if more than one record results...
David J.