Thread: Regex match not back-referencing in function

Regex match not back-referencing in function

From
Thom Brown
Date:
Hi,

Could someone explain the following behaviour?

SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1');

This returns:

     regexp_replace
------------------------
 Hello \& goodbye
(1 row)

So it matched:

SELECT chr(92);
 chr
-----
 \
(1 row)

But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.

Just to confirm:

SELECT ascii('&');
 ascii
-------
    38
(1 row)

So I'd expect the output of the original statement to be:

     regexp_replace
------------------------
 Hello && goodbye
(1 row)

What am I missing?

--
Thom

Re: Regex match not back-referencing in function

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
> What am I missing?

I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed.  The actual arguments seen by regexp_replace are

regression=# select E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1';
     ?column?     | ?column? | ?column?
------------------+----------+----------
 Hello & goodbye  | ([&])    | \\1
(1 row)

and given that, the result looks perfectly fine to me.

If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character.  But I believe we've discussed that in the past and decided
not to change it.

            regards, tom lane

Re: Regex match not back-referencing in function

From
David Johnston
Date:
On Feb 12, 2012, at 13:26, Thom Brown <thom@linux.com> wrote:

> Hi,
>
> Could someone explain the following behaviour?
>
> SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
> ascii(E'\\1') || E';\\1');
>
> This returns:
>
>     regexp_replace
> ------------------------
> Hello \& goodbye
> (1 row)
>
> So it matched:
>
> SELECT chr(92);
> chr
> -----
> \
> (1 row)
>
> But notice that when I append the value it's supposed to have matched
> to the end of the replacement value, it shows it should be '&'.
>
> Just to confirm:
>
> SELECT ascii('&');
> ascii
> -------
>    38
> (1 row)
>
> So I'd expect the output of the original statement to be:
>
>     regexp_replace
> ------------------------
> Hello && goodbye
> (1 row)
>
> What am I missing?
>
> --
> Thom
>

The "ASCII" function call is evaluated independently of, and before, the regexp_replace function call and so the E'\\1'
hasno special meaning.  It only has special meaning inside of the regexp_replace function. 

Try just evaluating ascii(E'\\1') by itself and confirm you get "92".

David J.



Re: Regex match not back-referencing in function

From
Thom Brown
Date:
On 12 February 2012 18:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>> What am I missing?
>
> I might be more confused than you, but I think you're supposing that
> the result of ascii(E'\\1') has something to do with the match that
> the surrounding regexp_replace function will find, later on when it
> gets executed.  The actual arguments seen by regexp_replace are
>
> regression=# select E'Hello & goodbye ',E'([&])','&#' ||
> ascii(E'\\1') || E';\\1';
>     ?column?     | ?column? | ?column?
> ------------------+----------+----------
>  Hello & goodbye  | ([&])    | \\1
> (1 row)
>
> and given that, the result looks perfectly fine to me.
>
> If there's a bug here, it's that ascii() ignores additional bytes in its
> input instead of throwing an error for a string with more than one
> character.  But I believe we've discussed that in the past and decided
> not to change it.

Okay, in that case I made the wrong assumptions about order of resolution.

Thanks

--
Thom