Thread: Regex match not back-referencing in function
Hi, Could someone explain the following behaviour? SELECT regexp_replace(E'Hello & goodbye ',E'([&])','' || ascii(E'\\1') || E';\\1'); This returns: regexp_replace ------------------------ Hello \& goodbye (1 row) So it matched: SELECT chr(92); chr ----- \ (1 row) But notice that when I append the value it's supposed to have matched to the end of the replacement value, it shows it should be '&'. Just to confirm: SELECT ascii('&'); ascii ------- 38 (1 row) So I'd expect the output of the original statement to be: regexp_replace ------------------------ Hello && goodbye (1 row) What am I missing? -- Thom
Thom Brown <thom@linux.com> writes: > What am I missing? I might be more confused than you, but I think you're supposing that the result of ascii(E'\\1') has something to do with the match that the surrounding regexp_replace function will find, later on when it gets executed. The actual arguments seen by regexp_replace are regression=# select E'Hello & goodbye ',E'([&])','' || ascii(E'\\1') || E';\\1'; ?column? | ?column? | ?column? ------------------+----------+---------- Hello & goodbye | ([&]) | \\1 (1 row) and given that, the result looks perfectly fine to me. If there's a bug here, it's that ascii() ignores additional bytes in its input instead of throwing an error for a string with more than one character. But I believe we've discussed that in the past and decided not to change it. regards, tom lane
On Feb 12, 2012, at 13:26, Thom Brown <thom@linux.com> wrote: > Hi, > > Could someone explain the following behaviour? > > SELECT regexp_replace(E'Hello & goodbye ',E'([&])','' || > ascii(E'\\1') || E';\\1'); > > This returns: > > regexp_replace > ------------------------ > Hello \& goodbye > (1 row) > > So it matched: > > SELECT chr(92); > chr > ----- > \ > (1 row) > > But notice that when I append the value it's supposed to have matched > to the end of the replacement value, it shows it should be '&'. > > Just to confirm: > > SELECT ascii('&'); > ascii > ------- > 38 > (1 row) > > So I'd expect the output of the original statement to be: > > regexp_replace > ------------------------ > Hello && goodbye > (1 row) > > What am I missing? > > -- > Thom > The "ASCII" function call is evaluated independently of, and before, the regexp_replace function call and so the E'\\1' hasno special meaning. It only has special meaning inside of the regexp_replace function. Try just evaluating ascii(E'\\1') by itself and confirm you get "92". David J.
On 12 February 2012 18:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> What am I missing? > > I might be more confused than you, but I think you're supposing that > the result of ascii(E'\\1') has something to do with the match that > the surrounding regexp_replace function will find, later on when it > gets executed. The actual arguments seen by regexp_replace are > > regression=# select E'Hello & goodbye ',E'([&])','' || > ascii(E'\\1') || E';\\1'; > ?column? | ?column? | ?column? > ------------------+----------+---------- > Hello & goodbye | ([&]) | \\1 > (1 row) > > and given that, the result looks perfectly fine to me. > > If there's a bug here, it's that ascii() ignores additional bytes in its > input instead of throwing an error for a string with more than one > character. But I believe we've discussed that in the past and decided > not to change it. Okay, in that case I made the wrong assumptions about order of resolution. Thanks -- Thom