Thread: Email Verfication Regular Expression

Email Verfication Regular Expression

From
Brad Nicholson
Date:
Does anybody have regular expression handy to verfiy email addresses?

--
Brad Nicholson
Database Administrator, Afilias Canada Corp.



Re: Email Verfication Regular Expression

From
Michael Glaesemann
Date:
On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote:

> Does anybody have regular expression handy to verfiy email addresses?

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

:)

Michael Glaesemann
grzm myrealbox com



Re: Email Verfication Regular Expression

From
Markus Rebbert
Date:
Am Mittwoch, den 07.09.2005, 11:17 -0400 schrieb Brad Nicholson:
> Does anybody have regular expression handy to verfiy email addresses?

^([a-zA-Z0-9._-]+)\@(([a-zA-Z0-9-]+[.]?){1,}[a-zA-Z0-9-]*+\.){1,}[a-zA-Z]{2,4}$

but i don't think, it's really complete.

best regards,
Markus


Re: Email Verfication Regular Expression

From
Douglas McNaught
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:

> Does anybody have regular expression handy to verfiy email addresses?

It's harder than you think.  For one that handles it in fairly full
generality, see Jeffrey Friedl's book _Mastering Reguar Expressions_.
The regex he comes up with is quite a beast.

-Doug

Re: Email Verfication Regular Expression

From
"Cristian Prieto"
Date:
Does somebody could embed this regex into a pgsql ~ statement? (maybe in a
DOMAIN type?)

Thanks a lot!

----- Original Message -----
From: "Michael Glaesemann" <grzm@myrealbox.com>
To: "Brad Nicholson" <bnichols@ca.afilias.info>
Cc: <pgsql-general@postgresql.org>
Sent: Wednesday, September 07, 2005 9:41 AM
Subject: Re: [GENERAL] Email Verfication Regular Expression


>
> On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote:
>
>> Does anybody have regular expression handy to verfiy email addresses?
>
> http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
>
> :)
>
> Michael Glaesemann
> grzm myrealbox com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org


Re: Email Verfication Regular Expression

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Markus" == Markus Rebbert <markus.rebbert@freenet.de> writes:

Markus> Am Mittwoch, den 07.09.2005, 11:17 -0400 schrieb Brad Nicholson:
>> Does anybody have regular expression handy to verfiy email addresses?

Markus> ^([a-zA-Z0-9._-]+)\@(([a-zA-Z0-9-]+[.]?){1,}[a-zA-Z0-9-]*+\.){1,}[a-zA-Z]{2,4}$

Markus> but i don't think, it's really complete.

Absolutely not.  It rejects <fred&barney@stonehenge.com> which is a perfectly
valid email address.  (Try it, you'll get my autoresponder.)

Google for "RFC 822" and "RFC 2822" to see the *real* rules.  An
actual regex for an email address is rather large.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Email Verification Regular Expression

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Does anybody have regular expression handy to verfiy email addresses?

CREATE OR REPLACE FUNCTION goodemail(text) RETURNS BOOL LANGUAGE plperl AS
$$
my $lwsp = "(?:(?:\\r\\n)?[ \\t])";
my $specials = '()<>@,;:\\\\".\\[\\]';
my $controls = '\\000-\\037\\177';
my $dtext = "[^\\[\\]\\r\\\\]";
my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp*";
my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp*";
my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))";
my $word = "(?:$atom|$quoted_string)";
my $localpart = "$word(?:\\.$lwsp*$word)*";
my $sub_domain = "(?:$atom|$domain_literal)";
my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*";
my $addr_spec = "$localpart\@$lwsp*$domain";
my $phrase = "$word*";
my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)";
my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp*";
my $mailbox = "(?:$addr_spec|$phrase$route_addr)";
my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*";
my $address = "(?:$mailbox|$group)";
my $EMAILRE = qr{$lwsp*$address};

return $_[0] =~ $EMAILRE ? 1 : 0;

$$;



- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509071223
https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMfFCgACgkQvJuQZxSWSshBlQCfTIJVNH2SH/g3PaVW4COA9x4q
evUAnRqTbkLI88kr5diqaqBb5jAacXcm
=6OXG
-----END PGP SIGNATURE-----



Re: Email Verfication Regular Expression

From
Ben
Date:
Not knowing your application, keep in mind that just because somebody
enters a syntactically correct email address doesn't mean they entered
the right one.

Cristian Prieto wrote:

> Does somebody could embed this regex into a pgsql ~ statement? (maybe
> in a DOMAIN type?)
>
> Thanks a lot!
>
> ----- Original Message ----- From: "Michael Glaesemann"
> <grzm@myrealbox.com>
> To: "Brad Nicholson" <bnichols@ca.afilias.info>
> Cc: <pgsql-general@postgresql.org>
> Sent: Wednesday, September 07, 2005 9:41 AM
> Subject: Re: [GENERAL] Email Verfication Regular Expression
>
>
>>
>> On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote:
>>
>>> Does anybody have regular expression handy to verfiy email addresses?
>>
>>
>> http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
>>
>> :)
>>
>> Michael Glaesemann
>> grzm myrealbox com
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match



Re: Email Verfication Regular Expression

From
"Welty, Richard"
Date:
Randal L. Schwartz wrote:
>Absolutely not.  It rejects <fred&barney@stonehenge.com> which is a perfectly
>valid email address.  (Try it, you'll get my autoresponder.)

>Google for "RFC 822" and "RFC 2822" to see the *real* rules.  An
>actual regex for an email address is rather large.

there's an extended example in appendix b of _Mastering Regular Expressions_
from O'Reilly.

the appendix suggests the regex may be available online at jeffery friedl's
home page. here's the url, but i've not gone excavating for the regex.

http://dict.regex.info/cgi-bin/j-e/jfriedl.html

richard

Re: Email Verfication Regular Expression

From
Roman Neuhauser
Date:
# bnichols@ca.afilias.info / 2005-09-07 11:17:10 -0400:
> Does anybody have regular expression handy to verfiy email addresses?

    This is what I have. The comment notes the caveats.

-- CREATE FUNCTION IS_EMAILADDRESS {{{
-- returns TRUE if $1 matches the rules for RFC2822 addr-spec token,
-- ignoring CFWS in atoms, obs- versions of everything, !dot-atom
-- versions of local-part, and quoted-pairs in domain-literal (IOW,
-- this function doesn't allow backslashes after the "@")
-- FIXME: locale-dependent (relies on ranges [x-y])
/*
atext           =       ALPHA / DIGIT / ; Any character except controls,
                        "!" / "#" /     ;  SP, and specials.
                        "$" / "%" /     ;  Used for atoms
                        "&" / "'" /
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "{" /
                        "|" / "}" /
                        "~"
dot-atom-text   =       1*atext *("." 1*atext)
dot-atom        =       [CFWS] dot-atom-text [CFWS]
addr-spec       =       local-part "@" domain
local-part      =       dot-atom / quoted-string / obs-local-part
domain          =       dot-atom / domain-literal / obs-domain
domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
dcontent        =       dtext / quoted-pair
dtext           =       NO-WS-CTL /     ; Non white space controls
                        %d33-90 /       ; The rest of the US-ASCII
                        %d94-126        ;  characters not including "[",
                                        ;  "]", or "\"
NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
                        %d11 /          ;  that do not include the
                        %d12 /          ;  carriage return, line feed,
                        %d14-31 /       ;  and white space characters
                        %d127
*/
CREATE OR REPLACE FUNCTION IS_EMAILADDRESS(VARCHAR)
  RETURNS BOOL
  IMMUTABLE
  RETURNS NULL ON NULL INPUT
  LANGUAGE plpgsql
  AS '
    BEGIN
      RETURN $1 ~ ''(?x) # this is an ARE
                    # local-part dot-atom-text (1*atext)
                    ^[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+
                    # local-part dot-atom-text (*("." 1*atext))
                    (?:\.[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+)*
                    # literal "@"
                    @
                    (?:
                      # domain (dom-atom or domain-literal)
                      (?:
                        # domain dot-atom (1*atext)
                        [-!#$%&''''*+/=?^_`{|}~[:alnum:]]+
                        # domain dot-atom (*("." 1*atext))
                        \.[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+
                      )*
                    |
                      # domain domain-literal ("[")
                      [[]
                      # domain domain-literal (dcontent)
                      # ^@    -    ^H     ^K     ^L     ^N      ^_     "!"  -  "Z"    "^"  -  DEL
                      [\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21-\\\\x5A\\\\x5E-\\\\x7F]*
                      # domain domain-literal ("]")
                      []]
                    )
                    $'';
    END;
  ';
-- }}}

-- CREATE DOMAIN emailaddrspec {{{
CREATE DOMAIN emailaddrspec AS VARCHAR
  CONSTRAINT dom_emailaddrspec CHECK (
       VALUE = ''
    OR IS_EMAILADDRESS(VALUE)
  );
-- }}}


--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991

Re: Email Verfication Regular Expression

From
Steve Atkins
Date:
On Wed, Sep 07, 2005 at 11:17:10AM -0400, Brad Nicholson wrote:
> Does anybody have regular expression handy to verfiy email addresses?

It's not possible to validate an email address with a regex. If
you're prepared to handwave over things like whitespace and
embedded comments you can validate with a scary big regex.
Take a look at Mail::RFC822::Address from CPAN.

But, depending on what you're doing, validation may not be a good
idea. There are email addresses that are syntactically invalid that
are deliverable and in active use. You might want to look at
just doing some basic sanity checking instead, rather than
full validation - something like


/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/

This'll exclude email addresses like tv@tv, but the owners of such are used
to their being rejected, and it saves you from a lot of the usual miskeyed
addresses.

Cheers,
  Steve


Re: Email Verfication Regular Expression

From
Bruno Wolff III
Date:
On Wed, Sep 07, 2005 at 12:21:45 -0700,
  Steve Atkins <steve@blighty.com> wrote:
>
>
/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/
>
> This'll exclude email addresses like tv@tv, but the owners of such are used
> to their being rejected, and it saves you from a lot of the usual miskeyed
> addresses.

Hard coding the top level domains seems like a bad idea. xxx might still get
added. It also doesn't take into account there are non-icann roots that
include other tlds.

Re: Email Verfication Regular Expression

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Steve" == Steve Atkins <steve@blighty.com> writes:

Steve> But, depending on what you're doing, validation may not be a good
Steve> idea. There are email addresses that are syntactically invalid that
Steve> are deliverable and in active use.

Really?  Name one. Or maybe it's just your idea of syntax that's wrong.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Email Verfication Regular Expression

From
Steve Atkins
Date:
On Wed, Sep 07, 2005 at 03:52:11PM -0500, Bruno Wolff III wrote:
> On Wed, Sep 07, 2005 at 12:21:45 -0700,
>   Steve Atkins <steve@blighty.com> wrote:
> >
> >
/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/
> >
> > This'll exclude email addresses like tv@tv, but the owners of such are used
> > to their being rejected, and it saves you from a lot of the usual miskeyed
> > addresses.
>
> Hard coding the top level domains seems like a bad idea. xxx might still get
> added.

Not hard-coding them is an even worse idea, if you're actually looking to
exclude bad email addresses.

Yes, it's a maintenance issue, but that's part of the job of handling
large numbers of email addresses.

> It also doesn't take into account there are non-icann roots that
> include other tlds.

If it's a non-icann TLD, it's not a valid internet email address.

Cheers,
  Steve

Re: Email Verfication Regular Expression

From
Steve Atkins
Date:
On Wed, Sep 07, 2005 at 01:33:51PM -0700, Randal L. Schwartz wrote:
> >>>>> "Steve" == Steve Atkins <steve@blighty.com> writes:
>
> Steve> But, depending on what you're doing, validation may not be a good
> Steve> idea. There are email addresses that are syntactically invalid that
> Steve> are deliverable and in active use.
>
> Really?  Name one. Or maybe it's just your idea of syntax that's wrong.

Well, my idea of syntax may differ from yours, but it doesn't neccessarily
mean that either of us is wrong. If we were talking the formal grammar
in RFC2822 section 3.4.1 I'd agree with you. But reading the surrounding
text implies that the spec is tighter than the formal grammar says it is.

2822 syntax allows almost any character in the domain-part (excluding
brackets, whitespace and backslash only, IIRC) but 2822 also describes
the dot-atom form of the domain part as an internet domain name,
either an MX or a hostname, referring to STD3, STD13 and STD14.

While most characters are legal in the 2822 syntax and in DNS, you can
extract from the RFCs that hostnames really should look like
/([A-Za-z0-9-]+\.)*[A-Za-z0-9]+/

So I consider any use of characters outside that set in a hostname or
"domain name" to be invalid. Specifically an underscore is not a valid
character, so any use of an underscore in the domain-part of an
address that is supposedly an internet address is syntactically
invalid.

And yet there are quite a lot of hosts that have underscores in their
names. Mail to them is deliverable. I've seen them in use
occasionally, though I've no idea how reliable they are.

All of which is a nice bit of RFC-lawyering, but not really that
relevant. The obvious response demonstrating that "steve@foo&bar+baz"
is syntactically valid would be an equally good bit of RFC-lawyering
too. :)

More practically (and this is a pragmatic database list, not an
esoteric rules-lawyering anti-spam list :) ) I've found that the RE I
mentioned earlier - allowing underscore, but excluding the other
invalid hostname characters - is pretty good at spotting the usual
badly formatted email addresses you see, without stumbling over the
ones that many "email address validators" do. It punts on the whole
"what is a reasonable looking local part?" question, of course, but
that's near impossible to answer in a useful, practical sense other
than being nervous about whitespace or anything smacking of source
routing.

Cheers,
  Steve


Re: Email Verfication Regular Expression

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Steve" == Steve Atkins <steve@blighty.com> writes:

Steve> So I consider any use of characters outside that set in a hostname or
Steve> "domain name" to be invalid. Specifically an underscore is not a valid
Steve> character, so any use of an underscore in the domain-part of an
Steve> address that is supposedly an internet address is syntactically
Steve> invalid.

Really?  I actually went round and round at a $client who wanted underscores
in DNS, and I had to tell them "We can't change the entire world... you'll
have to rename your hosts".

Do you have an example of an underscore host that is publicly addressable?
I'd like to look up their MX. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Email Verfication Regular Expression

From
Peter Eisentraut
Date:
Brad Nicholson wrote:
> Does anybody have regular expression handy to verfiy email addresses?

There are Perl modules on CPAN to verify just about anything.
Email::Valid comes to mind here.  These can of course be plugged into a
PL/Perl function.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Email Verfication Regular Expression

From
"Cristian Prieto"
Date:
Well, I guess this could be a hard-expensive way to do it but I've done this
little Stored Function, it doesn't use a regular expresion (you could pass
your email first to one to check it out I guess).

#include "postgres.h"
#include "fmgr.h"
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <resolv.h>

PG_FUNCTION_INFO_V1(digmx);

Datum
digmx(PG_FUNCTION_ARGS)
{
 int res;
 char *name;
 char answer[1024];
 text *arg;

 arg = PG_GETARG_TEXT_P(0);

 res = res_init();
 if(res != 0) {
  // Aki reporto un error
 }
 name = (char *) palloc(VARSIZE(arg)-VARHDRSZ);
 strcpy(name, VARDATA(arg));

 res = res_query(name, C_IN, T_MX, answer, sizeof(answer));

 if(res == -1) {
  PG_RETURN_BOOL(false);
 } else {
  // Aki imprimimos lo que debe escupir
  PG_RETURN_BOOL(true);
 }
}

You can pass the domain to that function and It would check using resolv if
the domains has an mx entry in the nameserver. I guess it is a little slow
(it was not thinking to use it for speed, but I accept suggestions for it!)
but I think it is enough easy and it could be usefull for somebody.

mydb# SELECT digmx('hotmail.com');
digmx
------
t
(1 row)

mydb# SELECT digmx('hotmail.co');
digmx
------
f
(1 row)

I know, it could be a very dumb to check the domain, but I consider myself
as a totally newbie database/unix/programmer.

Thanks a lot!

PD: Please, I accept suggestion to improve this function.


Re: Email Verfication Regular Expression

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Cristian" == Cristian Prieto <cristian@clickdiario.com> writes:

Cristian>  res = res_query(name, C_IN, T_MX, answer, sizeof(answer));

This incorrectly fails if an address has an "A" record but no "MX"
record.  According to RFC 2821 Section 5:

   The lookup first attempts to locate an MX record associated with
   the name.  If a CNAME record is found instead, the resulting name
   is processed as if it were the initial name.  If no MX records are
   found, but an A RR is found, the A RR is treated as if it was
   associated with an implicit MX RR, with a preference of 0, pointing
   to that host.

So, your function will say "no good" if the domain has an A record but
no MX record, even though the RFC says that's OK and deliverable.

Man, is there a lot of bogus knowledge and cargo culting around this
subject!

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Email Verfication Regular Expression

From
Stephane Bortzmeyer
Date:
On Wed, Sep 07, 2005 at 12:21:45PM -0700,
 Steve Atkins <steve@blighty.com> wrote
 a message of 26 lines which said:

>
/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/

Very bad idea to hardcode the list of TLD. You are already late
(".jobs" and ".travel" are in the ICANN root).

Re: Email Verfication Regular Expression

From
Stephane Bortzmeyer
Date:
On Thu, Sep 08, 2005 at 12:16:36PM -0600,
 Cristian Prieto <cristian@clickdiario.com> wrote
 a message of 66 lines which said:

> res = res_query(name, C_IN, T_MX, answer, sizeof(answer));

Besides Randal Schwartz' excellent remark (do not forget the AAAA
records, too), remember that the Internet is not reliable. What do you
do when there is a temporary failure? (The email system works fine
when faced with such failures.)

Re: Email Verfication Regular Expression

From
Steve Atkins
Date:
On Tue, Sep 13, 2005 at 12:59:43PM +0200, Stephane Bortzmeyer wrote:
> On Wed, Sep 07, 2005 at 12:21:45PM -0700,
>  Steve Atkins <steve@blighty.com> wrote
>  a message of 26 lines which said:
>
> >
/^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/
>
> Very bad idea to hardcode the list of TLD. You are already late
> (".jobs" and ".travel" are in the ICANN root).

And are in my production code (despite being unusued, as yet).

If you want to validate email addresses you _must_ check the TLD as
part of the sanity checking, as many of the typos that are
theoretically detectable are detectable by that check.

Yes, you need to maintain that list correctly. But the list does not
change often enough that keeping it in a dynamic table with the many
orders of magnitude higher overhead makes any sense at all.

Not hardcoding the list of TLDs would be a bad idea, if you need
both performance and correctness.

Cheers,
  Steve


Re: Email Verfication Regular Expression

From
Peter Eisentraut
Date:
Steve Atkins wrote:
> If you want to validate email addresses you _must_ check the TLD as
> part of the sanity checking, as many of the typos that are
> theoretically detectable are detectable by that check.

Your requirements may be different than mine, but I often make up "fake"
TLDs for testing or even internal subnets in production systems, so
having a system that hardcoded the list of "official" TLDs would be
significantly less useful to me.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Email Verfication Regular Expression

From
Steve Atkins
Date:
On Tue, Sep 13, 2005 at 09:02:46PM +0200, Peter Eisentraut wrote:
> Steve Atkins wrote:
> > If you want to validate email addresses you _must_ check the TLD as
> > part of the sanity checking, as many of the typos that are
> > theoretically detectable are detectable by that check.
>
> Your requirements may be different than mine, but I often make up "fake"
> TLDs for testing or even internal subnets in production systems, so
> having a system that hardcoded the list of "official" TLDs would be
> significantly less useful to me.

It depends on the needs. For a purely internal application your needs
are defined by your local setup. Those are not "internet email
addresses", though.

If you're accepting email addresses from Joe Public with the
expectation of sending email to them, then you really want to do as
much validation as you can at data capture time, or if not then at
data import time.

It's very, very hard to validate email addresses, but avoiding the
usual typos, mistakes and misunderstandings is a very good idea
and can keep your set of email addresses at least somewhat clean.

(I pity the poor folks at noemail.com and aol.co, though...)

Cheers,
  Steve