Re: Scadinavian characters in regular expressions - Mailing list pgsql-sql

From Søren Vainio
Subject Re: Scadinavian characters in regular expressions
Date
Msg-id 910513A5A944D5118BE900C04F67CB5A1F82C6@MAIL
Whole thread Raw
In response to Scadinavian characters in regular expressions  (Søren Vainio <sva@Netpointers.com>)
Responses Re: Scadinavian characters in regular expressions
Re: Scadinavian characters in regular expressions
List pgsql-sql
Using \s does produce FALSE for SELECT 'oneå two three' ~
'^[^\s]+[\s][^\s]+$';
But it also produces FALSE for any two-word string ex:
SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would expect TRUE???
(I am using PostgreSQL 7.1.3)

> -----Oprindelig meddelelse-----
> Fra: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]På vegne af Andreas
> Joseph Krogh
> Sendt: 9. april 2002 11:53
> Til: 'pgsql-sql@postgresql.org'
> Emne: Re: [SQL] Scadinavian characters in regular expressions
>
>
> On Tuesday 09 April 2002 10:51, Søren Vainio wrote:
> > Can someone please explain the following?
> > I am using a regular expression to find strings containing
> two words (begin
> > with one or more characters not being spaces followed by a
> space followed
> > by one or more characters not being spaces).
> > But when scandinavian characters are included it returns
> different results
> > depending on where the character is positioned.
> > The first two-word example returns TRUE as expected.
> > The second three-word example returns FALSE as expected.
> > But when I let an å (å å a-ring) traverse through
> the string it
> > unexpectedly returns TRUE when the character is positioned as the
> > second-last or last character in the two first words.
> >
> > SELECT 'one two' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'åone two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'oåne two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'onåe two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one åtwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one tåwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one twåo three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one twoå three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two åthree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two tåhree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thåree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thråee' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threåe' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threeå' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> >
> > Thank you for any response.
> >
> > Søren Vainio, Denmark
>
> I just tried the following which returned false as expected:
> andreak=# SELECT 'oneå two three' ~ '^[^\s]+[\s][^\s]+$';
>  ?column?
> ----------
>  f
> (1 row)
>
> andreak=# select version();
>                           version
> -----------------------------------------------------------
>  PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> NOTE: I replaced your [^ ] with the properly formated pattarn
> for whitespace:
> [^\s]
>
> --
> Andreas Joseph Krogh (Senior Software Developer)
> <andreak@officenet.no>
> A hen is an egg's way of making another egg.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


pgsql-sql by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: Scadinavian characters in regular expressions
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: Scadinavian characters in regular expressions