Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13 - Mailing list pgsql-general

From Alexander Farber
Subject Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Date
Msg-id CAADeyWgrVpRsG6baR1_oPGJWRNN3bWiCNofKKG0LTf3PAiH3Tw@mail.gmail.com
Whole thread Raw
In response to Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
List pgsql-general
Hello,

unfortunately octal doesn't seem to work either -

On Tue, Mar 19, 2013 at 7:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Farber <alexander.farber@gmail.com> writes:
>> # select 'АБВГД' ~ '^[\u0410-\u042F]{2,}$';
>> WARNING:  nonstandard use of escape in a string literal
>
> I think Unicode escapes were introduced in 9.0.  In 8.4 you'd probably
> have to write out the UTF8 equivalent as octal escapes :-(

    # select 'АБВГД' ~ '^[\2020-\2057]{2,}$';
    WARNING:  nonstandard use of escape in a string literal
    LINE 1: select 'АБВГД' ~ '^[\2020-\2057]{2,}$';
                             ^
    HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
    ERROR:  invalid byte sequence for encoding "UTF8": 0x82
    HINT:  This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled by
"client_encoding".

But writing out UTF8 equivalents seems to work
(trying to detect capitalized Russian letters as per
http://www.unicode.org/charts/PDF/U0400.pdf ):

    # select 'АБВГД' ~ '^[А-Я]{2,}$';
     ?column?
    ----------
       t
    (1 row)

And then I try to solve my 2nd problem (detecting 3
letters in a row, a rare case in Russian language):

# select 'ОШИБББКА' ~ '(.)\1\1';
WARNING:  nonstandard use of escape in a string literal
LINE 1: select 'ОШИБББКА' ~ '(.)\1\1';
                            ^
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
 ?column?
----------
 f
(1 row)


Does anybody please know why this fails in 8.4.13?

According to the table 9-18 in
http://www.postgresql.org/docs/8.4/static/functions-matching.html
it should be ok to use \1 for referencing
parts captured by round brackets?

Regards
Alex


pgsql-general by date:

Previous
From: Hannes Erven
Date:
Subject: Re: Rewritten rows on unchanged values
Next
From: Bertrand Janin
Date:
Subject: Re: Rewritten rows on unchanged values