Thread: POSIX Regular Expression question
Hi, I want to get out a string only with characters A-Za-z. I tried really a lot of things with substring and read many POSIX docs, I'm also familiar with the Perl RegEx but right now, I'm giving up... ;-( Any idea how to do this in Postgres with POSIX Regex? Regards, Aldor
am 05.09.2005, um 14:57:06 +0100 mailte Aldor folgendes: > Hi, > > I want to get out a string only with characters A-Za-z. like this? test=# select * from foo; s --------123abzabz123 (3 rows) test=# select * from foo where s ~ '^[a-zA-Z]+$'; s -----abz (1 row) > Any idea how to do this in Postgres with POSIX Regex? The term '[a-zA-Z]+' means: this chars at least one times, the '^' and '$' are anchors to the begin and end of the line. Hope that helps, Regards, Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47212, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net=== Schollglas Unternehmensgruppe ===
On Mon, Sep 05, 2005 at 02:57:06PM +0100, Aldor wrote: > > I want to get out a string only with characters A-Za-z. > > I tried really a lot of things with substring and read many POSIX docs, > I'm also familiar with the Perl RegEx but right now, I'm giving up... ;-( > > Any idea how to do this in Postgres with POSIX Regex? Match the beginning of the string with ^. Match one or more characters in the set A-Za-z with [A-Za-z]+ (or with just [A-Z]+ or [a-z]+ if you're doing a case-insensitive match). Using [[:alpha:]]+ should also work. Match the end of the string with $. Examples: SELECT 'abcd' ~ '^[A-Za-z]+$';?column? ----------t (1 row) SELECT 'ABCD' ~* '^[a-z]+$';?column? ----------t (1 row) SELECT 'ABC123' ~* '^[a-z]+$';?column? ----------f (1 row) -- Michael Fuhr
Am Montag, 5. September 2005 15:57 schrieb Aldor: > I want to get out a string only with characters A-Za-z. > Any idea how to do this in Postgres with POSIX Regex? Presumably, colname ~ '^[A-Za-z]*$' If you want to be independent of locale issues, then you'd have to enumerate all the letters instead of using a range specification. -- Peter Eisentraut http://developer.postgresql.org/~petere/
am 05.09.2005, um 16:19:28 +0200 mailte Peter Eisentraut folgendes: > Am Montag, 5. September 2005 15:57 schrieb Aldor: > > I want to get out a string only with characters A-Za-z. > > Any idea how to do this in Postgres with POSIX Regex? > > Presumably, > > colname ~ '^[A-Za-z]*$' ^ This match also a empty string. Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47212, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net=== Schollglas Unternehmensgruppe ===
On Mon, Sep 05, 2005 at 04:19:28PM +0200, Peter Eisentraut wrote: > Am Montag, 5. September 2005 15:57 schrieb Aldor: > > I want to get out a string only with characters A-Za-z. > > Any idea how to do this in Postgres with POSIX Regex? > > Presumably, > > colname ~ '^[A-Za-z]*$' > > If you want to be independent of locale issues, then you'd have to enumerate > all the letters instead of using a range specification. Note that [:alpha:] and such character classes are defined by POSIX to be locale independent: alvherre=# select 'á' ~ '[a-z]';?column? ----------f (1 fila) alvherre=# select 'á' ~ '[[:alpha:]]';?column? ----------t (1 fila) -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
Am Montag, 5. September 2005 17:10 schrieb A. Kretschmer: > > colname ~ '^[A-Za-z]*$' > > This match also a empty string. An empty string also fulfulls the condition "only with characters A-Za-z". Or maybe not. :-) -- Peter Eisentraut http://developer.postgresql.org/~petere/
In article <200509051619.29210.peter_e@gmx.net>, Peter Eisentraut <peter_e@gmx.net> writes: > Presumably, > colname ~ '^[A-Za-z]*$' > If you want to be independent of locale issues, then you'd have to enumerate > all the letters instead of using a range specification. How so? I thought character ranges are more an encoding than a locale issue. I dunno the details of the supported encodings, but is there any encoding where 'a-z' includes more or less than 26 letters? Fortunately we don't support EBCDIC ;-)
Am Montag, 5. September 2005 17:13 schrieb Alvaro Herrera: > Note that [:alpha:] and such character classes are defined by POSIX to > be locale independent: > > alvherre=# select 'á' ~ '[a-z]'; > ?column? > ---------- > f > (1 fila) > > alvherre=# select 'á' ~ '[[:alpha:]]'; > ?column? > ---------- > t > (1 fila) I don't think this addresses the concern I intended to raise. The first query should succeed for all letters between a and z, the second should succeed for all letters. Neither is guaranteed to succeed only for all "normal" Latin letters a, b, c, ... z. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Am Montag, 5. September 2005 17:40 schrieb Harald Fuchs: > How so? I thought character ranges are more an encoding than a locale > issue. That is incorrect. > I dunno the details of the supported encodings, but is there > any encoding where 'a-z' includes more or less than 26 letters? Well, it seems that our regexp library interprets [a-z] as exactly 26 letters, but that seems to be a lack of locale support rather than a feature. There are statements in the documentation of other regexp libraries that directly contradict this behavior. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Am Montag, 5. September 2005 17:40 schrieb Harald Fuchs: >> I dunno the details of the supported encodings, but is there >> any encoding where 'a-z' includes more or less than 26 letters? > Well, it seems that our regexp library interprets [a-z] as exactly 26 letters, > but that seems to be a lack of locale support rather than a feature. ISTM that the notation should properly mean "every character that sorts between a and z inclusive". So it's sort order dependent and thus locale dependent --- or at least should be. The locale support in our regexp code is definitely pretty weak at the moment. regards, tom lane
On Mon, Sep 05, 2005 at 16:19:28 +0200, Peter Eisentraut <peter_e@gmx.net> wrote: > Am Montag, 5. September 2005 15:57 schrieb Aldor: > > I want to get out a string only with characters A-Za-z. > > Any idea how to do this in Postgres with POSIX Regex? > > Presumably, > > colname ~ '^[A-Za-z]*$' > > If you want to be independent of locale issues, then you'd have to enumerate > all the letters instead of using a range specification. Or use: colname ~ '^[[:alpha:]]*$'