Re: Select all invalid e-mail addresses - Mailing list pgsql-general

From Steve Atkins
Subject Re: Select all invalid e-mail addresses
Date
Msg-id 20051025165400.GB21613@gp.word-to-the-wise.com
Whole thread Raw
In response to Re: Select all invalid e-mail addresses  (Michael Fuhr <mike@fuhr.org>)
List pgsql-general
On Tue, Oct 25, 2005 at 09:09:44AM -0600, Michael Fuhr wrote:
> On Tue, Oct 25, 2005 at 11:20:53AM +0300, Andrus wrote:
> > This regex allows email addresses containing two dots without any letters,
> > like eeta..soft@online.ee
> > I havent seen any email of such kind.
>
> That's because the regular expression is wrong: it simply checks
> the local part for zero or more non-@ characters instead of checking
> against the RFC822/RFC2822 specification.  Use a search engine to
> find a more complete regular expression (beware: it's long).

eeta..soft@online.ee is a perfectly functional email address, despite
not being in dot-atom form, so technically in violation of RFC
2822. There are few constraints on the local part of an email address,
and those constraints are often violated in practice, and cause no
problems.

I do data analysis on email addresses all day, every day. I'm fully
aware of RFC 2822 constraints, and I'm also aware that the correlation
between them and the real world is high, but not absolute.

If you were using this to validate email software that would be a
different thing, but if you're actually working in the real world with
real world data and are actually concerned about finding email
addresses that are likely to be incorrect (rather than punishing users
with noc RFC 2822 compliant email addresses) then looking at the
local-part in much detail is really not useful.

Cheers,
  Steve

pgsql-general by date:

Previous
From: WireSpot
Date:
Subject: Re: Deleting vs foreign keys
Next
From: Andreas Seltenreich
Date:
Subject: Re: STL problem in stored procedures