Thread: order of (escaped) characters in regex range

order of (escaped) characters in regex range

From
InterRob
Date:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Re: order of (escaped) characters in regex range

From
Szymon Guz
Date:


On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon

Re: order of (escaped) characters in regex range

From
InterRob
Date:
True, but still weird...

And are you sure it does the same thing?

2011/12/13 Szymon Guz <mabewlun@gmail.com>


On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon

Re: order of (escaped) characters in regex range

From
David Johnston
Date:
On Dec 13, 2011, at 8:09, Szymon Guz <mabewlun@gmail.com> wrote:



On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards


If you don't intend to use PostgreSQL escapes in your string then omit the leading 'E'.

In a character class the - symbol has special meaning if it appears anywhere but the first character of the group. To avoid that special meaning you have to escape it.  If it appears first it always means a literal -.  The PostgreSQL documentation does not fully describe RegularExpressions but a reference book on them would note this particular behavior.

David J.

Re: order of (escaped) characters in regex range

From
InterRob
Date:
Thanks guys, i see what you mean.

I do intend to use the PG escaping, in order to avoid that annoying warning... Hence, my expression should indeed be:
SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');

In the above expression i added the parentheses as I whish to match these as well :))

Thanks!

2011/12/13 David Johnston <polobo@yahoo.com>
On Dec 13, 2011, at 8:09, Szymon Guz <mabewlun@gmail.com> wrote:



On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards


If you don't intend to use PostgreSQL escapes in your string then omit the leading 'E'.

In a character class the - symbol has special meaning if it appears anywhere but the first character of the group. To avoid that special meaning you have to escape it.  If it appears first it always means a literal -.  The PostgreSQL documentation does not fully describe RegularExpressions but a reference book on them would note this particular behavior.

David J.

Re: order of (escaped) characters in regex range

From
hubert depesz lubaczewski
Date:
On Tue, Dec 13, 2011 at 02:51:15PM +0100, InterRob wrote:
> Thanks guys, i see what you mean.
>
> I do intend to use the PG escaping, in order to avoid that annoying
> warning... Hence, my expression should indeed be:
> SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');
>
> In the above expression i added the parentheses as I whish to match these
> as well :))

instead of putting that much quoting just do:
SELECT regexp_matches('123-A' , '(3[A-Z() -])');
 ( and ) don't need to be quoted, and if you'll move - at the beginning
 or end (i prefer end) of range, it doesn't need to be quoted either.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: order of (escaped) characters in regex range

From
Merlin Moncure
Date:
On Tue, Dec 13, 2011 at 7:51 AM, InterRob <rob.marjot@gmail.com> wrote:
> Thanks guys, i see what you mean.
>
> I do intend to use the PG escaping, in order to avoid that annoying
> warning... Hence, my expression should indeed be:
> SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');
>
> In the above expression i added the parentheses as I whish to match these as
> well :))

I advise dollar quoting when writing complicated regular expressions:

E'(3[A-Z\\-\\(\\) ])'
becomes
$$(3[A-Z\-\(\) ])$$

merlin

Re: order of (escaped) characters in regex range

From
"David Johnston"
Date:
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Merlin Moncure
Sent: Tuesday, December 13, 2011 11:39 AM
To: rob@marjot-multisoft.com
Cc: David Johnston; Szymon Guz; pgsql-general
Subject: Re: [GENERAL] order of (escaped) characters in regex range

On Tue, Dec 13, 2011 at 7:51 AM, InterRob <rob.marjot@gmail.com> wrote:
> Thanks guys, i see what you mean.
>
> I do intend to use the PG escaping, in order to avoid that annoying
> warning... Hence, my expression should indeed be:
> SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');
>
> In the above expression i added the parentheses as I whish to match
> these as well :))

I advise dollar quoting when writing complicated regular expressions:

E'(3[A-Z\\-\\(\\) ])'
becomes
$$(3[A-Z\-\(\) ])$$

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

---------------------------------------------------------------

Aside from backward compatibility, and the various warnings, is there any
reason to prefer dollar-quoting over a non-SQL-escaped string literal (i.e.,
'3[A-Z\-\(\) ]'   ) ?

David J.



Re: order of (escaped) characters in regex range

From
Merlin Moncure
Date:
On Tue, Dec 13, 2011 at 10:53 AM, David Johnston <polobo@yahoo.com> wrote:
> Aside from backward compatibility, and the various warnings, is there any
> reason to prefer dollar-quoting over a non-SQL-escaped string literal (i.e.,
> '3[A-Z\-\(\) ]'   ) ?

yeah -- because sooner or later you have to stick a single quote in
there (of course, you can double the ', but I personally think that's
awful).

merlin

Re: order of (escaped) characters in regex range

From
Rob Marjot
Date:
True, but still weird...

And are you sure it does the same thing?

2011/12/13 Szymon Guz <mabewlun@gmail.com>


On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR:  invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
 regexp_matches
----------------
 {3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon