Thread: Regexp matching

Regexp matching

From
Eduardas Kazakas
Date:
Hello, I have some problems using character class matching (e.g. [:alpha:]).<br /><br />For example I have a table:<br
/><br/>CREATE TABLE re_test (text_column character varying (50) NOT NULL);<br /><br />Notice, that there are some
specificcharacters.<br /><br />INSERT INTO re_test VALUES ('AŠDF');<br />INSERT INTO re_test VALUES ('AŠDF45');<br
/>INSERTINTO re_test VALUES ('AŠDF FDŠA');<br />INSERT INTO re_test VALUES ('ASDF FDŠA');<br />INSERT INTO re_test
VALUES('58ASDF FDŠA');<br /> INSERT INTO re_test VALUES ('ašDf');<br />INSERT INTO re_test VALUES ('aŠdf');<br /><br
/>SELECT* FROM re_test WHERE text_column ~ '[^[:alpha:]]' and text_column ~ [:upper:];<br /><br />Goal:<br />I want to
writesuch statement which returns me only those records which have only one word and those words must be uppercase.<br
/>So I expect this statement to return only one record where text_column = AŠDF.<br /><br />Maybe someone could give me
moredetail explanation how to use those regexp classes, because the documentation tells very little about this.<br
/><br/>Some more information:<br /><br />PostgreSQL9<br /><br />OS - Windows x86-32<br />DB encoding - UTF-8<br
/>lc_collate- English_United States.1252<br />lc_ctype - English_United States.1252<br />lc_messages - English_United
States.1252<br/> lc_monetary - English_United States.1252<br />lc_numeric - English_United States.1252<br />lc_time -
English_UnitedStates.1252  

Re: Regexp matching

From
Osvaldo Kussama
Date:
2010/9/28 Eduardas Kazakas <eduardas.kazakas@gmail.com>:
> Hello, I have some problems using character class matching (e.g. [:alpha:]).
>
> For example I have a table:
>
> CREATE TABLE re_test (text_column character varying (50) NOT NULL);
>
> Notice, that there are some specific characters.
>
> INSERT INTO re_test VALUES ('AŠDF');
> INSERT INTO re_test VALUES ('AŠDF45');
> INSERT INTO re_test VALUES ('AŠDF FDŠA');
> INSERT INTO re_test VALUES ('ASDF FDŠA');
> INSERT INTO re_test VALUES ('58ASDF FDŠA');
> INSERT INTO re_test VALUES ('ašDf');
> INSERT INTO re_test VALUES ('aŠdf');
>
> SELECT * FROM re_test WHERE text_column ~ '[^[:alpha:]]' and text_column ~
> [:upper:];
>
> Goal:
> I want to write such statement which returns me only those records which
> have only one word and those words must be uppercase.
> So I expect this statement to return only one record where text_column =
> AŠDF.
>
> Maybe someone could give me more detail explanation how to use those regexp
> classes, because the documentation tells very little about this.
>
> Some more information:
>
> PostgreSQL9
>
> OS - Windows x86-32
> DB encoding - UTF-8
> lc_collate - English_United States.1252
> lc_ctype - English_United States.1252
> lc_messages - English_United States.1252
> lc_monetary - English_United States.1252
> lc_numeric - English_United States.1252
> lc_time - English_United States.1252


I believe that "Š" isn't an alphabetical character in English_United
States (LC_CTYPE).
http://www.postgresql.org/docs/current/interactive/locale.html

Osvaldo