Thread: POSIX Regular Expression question

POSIX Regular Expression question

From
Aldor
Date:
Hi,

I want to get out a string only with characters A-Za-z.

I tried really a lot of things with substring and read many POSIX docs,
I'm also familiar with the Perl RegEx but right now, I'm giving up... ;-(

Any idea how to do this in Postgres with POSIX Regex?

Regards,

Aldor


Re: POSIX Regular Expression question

From
"A. Kretschmer"
Date:
am  05.09.2005, um 14:57:06 +0100 mailte Aldor folgendes:
> Hi,
> 
> I want to get out a string only with characters A-Za-z.

like this?

test=# select * from foo;  s
--------123abzabz123
(3 rows)

test=# select * from foo where s ~ '^[a-zA-Z]+$'; s
-----abz
(1 row)


> Any idea how to do this in Postgres with POSIX Regex?

The term '[a-zA-Z]+' means: this chars at least one times, the '^' and
'$' are anchors to the begin and end of the line.

Hope that helps,


Regards, Andreas
-- 
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net===    Schollglas Unternehmensgruppe    === 


Re: POSIX Regular Expression question

From
Michael Fuhr
Date:
On Mon, Sep 05, 2005 at 02:57:06PM +0100, Aldor wrote:
> 
> I want to get out a string only with characters A-Za-z.
> 
> I tried really a lot of things with substring and read many POSIX docs,
> I'm also familiar with the Perl RegEx but right now, I'm giving up... ;-(
> 
> Any idea how to do this in Postgres with POSIX Regex?

Match the beginning of the string with ^.

Match one or more characters in the set A-Za-z with [A-Za-z]+ (or
with just [A-Z]+ or [a-z]+ if you're doing a case-insensitive match).
Using [[:alpha:]]+ should also work.

Match the end of the string with $.

Examples:

SELECT 'abcd' ~ '^[A-Za-z]+$';?column? 
----------t
(1 row)

SELECT 'ABCD' ~* '^[a-z]+$';?column? 
----------t
(1 row)

SELECT 'ABC123' ~* '^[a-z]+$';?column? 
----------f
(1 row)

-- 
Michael Fuhr


Re: POSIX Regular Expression question

From
Peter Eisentraut
Date:
Am Montag, 5. September 2005 15:57 schrieb Aldor:
> I want to get out a string only with characters A-Za-z.
> Any idea how to do this in Postgres with POSIX Regex?

Presumably,
   colname ~ '^[A-Za-z]*$'

If you want to be independent of locale issues, then you'd have to enumerate 
all the letters instead of using a range specification.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: POSIX Regular Expression question

From
"A. Kretschmer"
Date:
am  05.09.2005, um 16:19:28 +0200 mailte Peter Eisentraut folgendes:
> Am Montag, 5. September 2005 15:57 schrieb Aldor:
> > I want to get out a string only with characters A-Za-z.
> > Any idea how to do this in Postgres with POSIX Regex?
> 
> Presumably,
> 
>     colname ~ '^[A-Za-z]*$'                         ^

This match also a empty string.


Andreas
-- 
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net===    Schollglas Unternehmensgruppe    === 


Re: POSIX Regular Expression question

From
Alvaro Herrera
Date:
On Mon, Sep 05, 2005 at 04:19:28PM +0200, Peter Eisentraut wrote:
> Am Montag, 5. September 2005 15:57 schrieb Aldor:
> > I want to get out a string only with characters A-Za-z.
> > Any idea how to do this in Postgres with POSIX Regex?
> 
> Presumably,
> 
>     colname ~ '^[A-Za-z]*$'
> 
> If you want to be independent of locale issues, then you'd have to enumerate 
> all the letters instead of using a range specification.

Note that [:alpha:] and such character classes are defined by POSIX to
be locale independent:

alvherre=# select 'á' ~ '[a-z]';?column? 
----------f
(1 fila)

alvherre=# select 'á' ~ '[[:alpha:]]';?column? 
----------t
(1 fila)


-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)


Re: POSIX Regular Expression question

From
Peter Eisentraut
Date:
Am Montag, 5. September 2005 17:10 schrieb A. Kretschmer:
> >     colname ~ '^[A-Za-z]*$'
>
> This match also a empty string.

An empty string also fulfulls the condition "only with characters A-Za-z".  Or 
maybe not. :-)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: POSIX Regular Expression question

From
Harald Fuchs
Date:
In article <200509051619.29210.peter_e@gmx.net>,
Peter Eisentraut <peter_e@gmx.net> writes:

> Presumably,

>     colname ~ '^[A-Za-z]*$'

> If you want to be independent of locale issues, then you'd have to enumerate 
> all the letters instead of using a range specification.

How so?  I thought character ranges are more an encoding than a locale
issue.  I dunno the details of the supported encodings, but is there
any encoding where 'a-z' includes more or less than 26 letters?
Fortunately we don't support EBCDIC ;-)



Re: POSIX Regular Expression question

From
Peter Eisentraut
Date:
Am Montag, 5. September 2005 17:13 schrieb Alvaro Herrera:
> Note that [:alpha:] and such character classes are defined by POSIX to
> be locale independent:
>
> alvherre=# select 'á' ~ '[a-z]';
>  ?column?
> ----------
>  f
> (1 fila)
>
> alvherre=# select 'á' ~ '[[:alpha:]]';
>  ?column?
> ----------
>  t
> (1 fila)

I don't think this addresses the concern I intended to raise.  The first query 
should succeed for all letters between a and z, the second should succeed for 
all letters.  Neither is guaranteed to succeed only for all "normal" Latin 
letters a, b, c, ... z.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: POSIX Regular Expression question

From
Peter Eisentraut
Date:
Am Montag, 5. September 2005 17:40 schrieb Harald Fuchs:
> How so?  I thought character ranges are more an encoding than a locale
> issue.

That is incorrect.

> I dunno the details of the supported encodings, but is there 
> any encoding where 'a-z' includes more or less than 26 letters?

Well, it seems that our regexp library interprets [a-z] as exactly 26 letters, 
but that seems to be a lack of locale support rather than a feature.  There 
are statements in the documentation of other regexp libraries that directly 
contradict this behavior.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: POSIX Regular Expression question

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Montag, 5. September 2005 17:40 schrieb Harald Fuchs:
>> I dunno the details of the supported encodings, but is there 
>> any encoding where 'a-z' includes more or less than 26 letters?

> Well, it seems that our regexp library interprets [a-z] as exactly 26 letters, 
> but that seems to be a lack of locale support rather than a feature.

ISTM that the notation should properly mean "every character that sorts
between a and z inclusive".  So it's sort order dependent and thus
locale dependent --- or at least should be.  The locale support in our
regexp code is definitely pretty weak at the moment.
        regards, tom lane


Re: POSIX Regular Expression question

From
Bruno Wolff III
Date:
On Mon, Sep 05, 2005 at 16:19:28 +0200, Peter Eisentraut <peter_e@gmx.net> wrote:
> Am Montag, 5. September 2005 15:57 schrieb Aldor:
> > I want to get out a string only with characters A-Za-z.
> > Any idea how to do this in Postgres with POSIX Regex?
> 
> Presumably,
> 
>     colname ~ '^[A-Za-z]*$'
> 
> If you want to be independent of locale issues, then you'd have to enumerate 
> all the letters instead of using a range specification.

Or use:

colname ~ '^[[:alpha:]]*$'