Re: Select all invalid e-mail addresses - Mailing list pgsql-general

From Michael Fuhr
Subject Re: Select all invalid e-mail addresses
Date
Msg-id 20051021184909.GA59288@winnie.fuhr.org
Whole thread Raw
In response to Re: Select all invalid e-mail addresses  ("Andrus" <eetasoft@online.ee>)
List pgsql-general
On Fri, Oct 21, 2005 at 08:15:23PM +0300, Andrus wrote:
> I tried
>
> SELECT   email  FROM customer
>   WHERE  email !~
>
'/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/'
>
> but got an error
>
> ERROR:  invalid regular expression: invalid character range

Aside from the fact that this regular expression is semantically wrong,
it has a few other problems:

* A hyphen (-) must come first or last in a character class if you want
it interpreted literally instead of as part of a range specification.

  test=> SELECT 'abc' ~ '[a-z0-9-_]';  -- WRONG
  ERROR:  invalid regular expression: invalid character range

  test=> SELECT 'abc' ~ '[a-z0-9_-]';
   ?column?
  ----------
   t
  (1 row)

* Regular expressions in PostgreSQL don't use delimiters like / at
the beginning and end of the expression.

  test=> SELECT 'abc' ~ '/abc/';  -- WRONG
   ?column?
  ----------
   f
  (1 row)

  test=> SELECT 'abc' ~ 'abc';
   ?column?
  ----------
   t
  (1 row)

* If you use single quotes around the regular expression then you
need to escape backslashes that should be part of the regular
expression; otherwise the backslash will be parsed by the string
parser before the string is used as a regular expression and you'll
get unexpected results.  In other words, there's an extra layer of
string parsing that you have to allow for.  In 8.0 and later you
can avoid this by using dollar quotes.

  test=> SELECT 'abc' ~ 'a\.c';  -- WRONG
   ?column?
  ----------
   t
  (1 row)

  test=> SELECT 'abc' ~ 'a\\.c';
   ?column?
  ----------
   f
  (1 row)

  test=> SELECT 'a.c' ~ 'a\\.c';
   ?column?
  ----------
   t
  (1 row)

  test=> SELECT 'abc' ~ $$a\.c$$;
   ?column?
  ----------
   f
  (1 row)

  test=> SELECT 'a.c' ~ $$a\.c$$;
   ?column?
  ----------
   t
  (1 row)

--
Michael Fuhr

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: scheduled backup
Next
From: Steve Atkins
Date:
Subject: Re: Select all invalid e-mail addresses