Thread: Email Verfication Regular Expression
Does anybody have regular expression handy to verfiy email addresses? -- Brad Nicholson Database Administrator, Afilias Canada Corp.
On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote: > Does anybody have regular expression handy to verfiy email addresses? http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html :) Michael Glaesemann grzm myrealbox com
Am Mittwoch, den 07.09.2005, 11:17 -0400 schrieb Brad Nicholson: > Does anybody have regular expression handy to verfiy email addresses? ^([a-zA-Z0-9._-]+)\@(([a-zA-Z0-9-]+[.]?){1,}[a-zA-Z0-9-]*+\.){1,}[a-zA-Z]{2,4}$ but i don't think, it's really complete. best regards, Markus
Brad Nicholson <bnichols@ca.afilias.info> writes: > Does anybody have regular expression handy to verfiy email addresses? It's harder than you think. For one that handles it in fairly full generality, see Jeffrey Friedl's book _Mastering Reguar Expressions_. The regex he comes up with is quite a beast. -Doug
Does somebody could embed this regex into a pgsql ~ statement? (maybe in a DOMAIN type?) Thanks a lot! ----- Original Message ----- From: "Michael Glaesemann" <grzm@myrealbox.com> To: "Brad Nicholson" <bnichols@ca.afilias.info> Cc: <pgsql-general@postgresql.org> Sent: Wednesday, September 07, 2005 9:41 AM Subject: Re: [GENERAL] Email Verfication Regular Expression > > On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote: > >> Does anybody have regular expression handy to verfiy email addresses? > > http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html > > :) > > Michael Glaesemann > grzm myrealbox com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
>>>>> "Markus" == Markus Rebbert <markus.rebbert@freenet.de> writes: Markus> Am Mittwoch, den 07.09.2005, 11:17 -0400 schrieb Brad Nicholson: >> Does anybody have regular expression handy to verfiy email addresses? Markus> ^([a-zA-Z0-9._-]+)\@(([a-zA-Z0-9-]+[.]?){1,}[a-zA-Z0-9-]*+\.){1,}[a-zA-Z]{2,4}$ Markus> but i don't think, it's really complete. Absolutely not. It rejects <fred&barney@stonehenge.com> which is a perfectly valid email address. (Try it, you'll get my autoresponder.) Google for "RFC 822" and "RFC 2822" to see the *real* rules. An actual regex for an email address is rather large. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Does anybody have regular expression handy to verfiy email addresses? CREATE OR REPLACE FUNCTION goodemail(text) RETURNS BOOL LANGUAGE plperl AS $$ my $lwsp = "(?:(?:\\r\\n)?[ \\t])"; my $specials = '()<>@,;:\\\\".\\[\\]'; my $controls = '\\000-\\037\\177'; my $dtext = "[^\\[\\]\\r\\\\]"; my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp*"; my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp*"; my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))"; my $word = "(?:$atom|$quoted_string)"; my $localpart = "$word(?:\\.$lwsp*$word)*"; my $sub_domain = "(?:$atom|$domain_literal)"; my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*"; my $addr_spec = "$localpart\@$lwsp*$domain"; my $phrase = "$word*"; my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)"; my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp*"; my $mailbox = "(?:$addr_spec|$phrase$route_addr)"; my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*"; my $address = "(?:$mailbox|$group)"; my $EMAILRE = qr{$lwsp*$address}; return $_[0] =~ $EMAILRE ? 1 : 0; $$; - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200509071223 https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkMfFCgACgkQvJuQZxSWSshBlQCfTIJVNH2SH/g3PaVW4COA9x4q evUAnRqTbkLI88kr5diqaqBb5jAacXcm =6OXG -----END PGP SIGNATURE-----
Not knowing your application, keep in mind that just because somebody enters a syntactically correct email address doesn't mean they entered the right one. Cristian Prieto wrote: > Does somebody could embed this regex into a pgsql ~ statement? (maybe > in a DOMAIN type?) > > Thanks a lot! > > ----- Original Message ----- From: "Michael Glaesemann" > <grzm@myrealbox.com> > To: "Brad Nicholson" <bnichols@ca.afilias.info> > Cc: <pgsql-general@postgresql.org> > Sent: Wednesday, September 07, 2005 9:41 AM > Subject: Re: [GENERAL] Email Verfication Regular Expression > > >> >> On Sep 8, 2005, at 12:17 AM, Brad Nicholson wrote: >> >>> Does anybody have regular expression handy to verfiy email addresses? >> >> >> http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html >> >> :) >> >> Michael Glaesemann >> grzm myrealbox com >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Randal L. Schwartz wrote: >Absolutely not. It rejects <fred&barney@stonehenge.com> which is a perfectly >valid email address. (Try it, you'll get my autoresponder.) >Google for "RFC 822" and "RFC 2822" to see the *real* rules. An >actual regex for an email address is rather large. there's an extended example in appendix b of _Mastering Regular Expressions_ from O'Reilly. the appendix suggests the regex may be available online at jeffery friedl's home page. here's the url, but i've not gone excavating for the regex. http://dict.regex.info/cgi-bin/j-e/jfriedl.html richard
# bnichols@ca.afilias.info / 2005-09-07 11:17:10 -0400: > Does anybody have regular expression handy to verfiy email addresses? This is what I have. The comment notes the caveats. -- CREATE FUNCTION IS_EMAILADDRESS {{{ -- returns TRUE if $1 matches the rules for RFC2822 addr-spec token, -- ignoring CFWS in atoms, obs- versions of everything, !dot-atom -- versions of local-part, and quoted-pairs in domain-literal (IOW, -- this function doesn't allow backslashes after the "@") -- FIXME: locale-dependent (relies on ranges [x-y]) /* atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" dot-atom-text = 1*atext *("." 1*atext) dot-atom = [CFWS] dot-atom-text [CFWS] addr-spec = local-part "@" domain local-part = dot-atom / quoted-string / obs-local-part domain = dot-atom / domain-literal / obs-domain domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS] dcontent = dtext / quoted-pair dtext = NO-WS-CTL / ; Non white space controls %d33-90 / ; The rest of the US-ASCII %d94-126 ; characters not including "[", ; "]", or "\" NO-WS-CTL = %d1-8 / ; US-ASCII control characters %d11 / ; that do not include the %d12 / ; carriage return, line feed, %d14-31 / ; and white space characters %d127 */ CREATE OR REPLACE FUNCTION IS_EMAILADDRESS(VARCHAR) RETURNS BOOL IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plpgsql AS ' BEGIN RETURN $1 ~ ''(?x) # this is an ARE # local-part dot-atom-text (1*atext) ^[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+ # local-part dot-atom-text (*("." 1*atext)) (?:\.[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+)* # literal "@" @ (?: # domain (dom-atom or domain-literal) (?: # domain dot-atom (1*atext) [-!#$%&''''*+/=?^_`{|}~[:alnum:]]+ # domain dot-atom (*("." 1*atext)) \.[-!#$%&''''*+/=?^_`{|}~[:alnum:]]+ )* | # domain domain-literal ("[") [[] # domain domain-literal (dcontent) # ^@ - ^H ^K ^L ^N ^_ "!" - "Z" "^" - DEL [\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21-\\\\x5A\\\\x5E-\\\\x7F]* # domain domain-literal ("]") []] ) $''; END; '; -- }}} -- CREATE DOMAIN emailaddrspec {{{ CREATE DOMAIN emailaddrspec AS VARCHAR CONSTRAINT dom_emailaddrspec CHECK ( VALUE = '' OR IS_EMAILADDRESS(VALUE) ); -- }}} -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991
On Wed, Sep 07, 2005 at 11:17:10AM -0400, Brad Nicholson wrote: > Does anybody have regular expression handy to verfiy email addresses? It's not possible to validate an email address with a regex. If you're prepared to handwave over things like whitespace and embedded comments you can validate with a scary big regex. Take a look at Mail::RFC822::Address from CPAN. But, depending on what you're doing, validation may not be a good idea. There are email addresses that are syntactically invalid that are deliverable and in active use. You might want to look at just doing some basic sanity checking instead, rather than full validation - something like /^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/ This'll exclude email addresses like tv@tv, but the owners of such are used to their being rejected, and it saves you from a lot of the usual miskeyed addresses. Cheers, Steve
On Wed, Sep 07, 2005 at 12:21:45 -0700, Steve Atkins <steve@blighty.com> wrote: > > /^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/ > > This'll exclude email addresses like tv@tv, but the owners of such are used > to their being rejected, and it saves you from a lot of the usual miskeyed > addresses. Hard coding the top level domains seems like a bad idea. xxx might still get added. It also doesn't take into account there are non-icann roots that include other tlds.
>>>>> "Steve" == Steve Atkins <steve@blighty.com> writes: Steve> But, depending on what you're doing, validation may not be a good Steve> idea. There are email addresses that are syntactically invalid that Steve> are deliverable and in active use. Really? Name one. Or maybe it's just your idea of syntax that's wrong. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
On Wed, Sep 07, 2005 at 03:52:11PM -0500, Bruno Wolff III wrote: > On Wed, Sep 07, 2005 at 12:21:45 -0700, > Steve Atkins <steve@blighty.com> wrote: > > > > /^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/ > > > > This'll exclude email addresses like tv@tv, but the owners of such are used > > to their being rejected, and it saves you from a lot of the usual miskeyed > > addresses. > > Hard coding the top level domains seems like a bad idea. xxx might still get > added. Not hard-coding them is an even worse idea, if you're actually looking to exclude bad email addresses. Yes, it's a maintenance issue, but that's part of the job of handling large numbers of email addresses. > It also doesn't take into account there are non-icann roots that > include other tlds. If it's a non-icann TLD, it's not a valid internet email address. Cheers, Steve
On Wed, Sep 07, 2005 at 01:33:51PM -0700, Randal L. Schwartz wrote: > >>>>> "Steve" == Steve Atkins <steve@blighty.com> writes: > > Steve> But, depending on what you're doing, validation may not be a good > Steve> idea. There are email addresses that are syntactically invalid that > Steve> are deliverable and in active use. > > Really? Name one. Or maybe it's just your idea of syntax that's wrong. Well, my idea of syntax may differ from yours, but it doesn't neccessarily mean that either of us is wrong. If we were talking the formal grammar in RFC2822 section 3.4.1 I'd agree with you. But reading the surrounding text implies that the spec is tighter than the formal grammar says it is. 2822 syntax allows almost any character in the domain-part (excluding brackets, whitespace and backslash only, IIRC) but 2822 also describes the dot-atom form of the domain part as an internet domain name, either an MX or a hostname, referring to STD3, STD13 and STD14. While most characters are legal in the 2822 syntax and in DNS, you can extract from the RFCs that hostnames really should look like /([A-Za-z0-9-]+\.)*[A-Za-z0-9]+/ So I consider any use of characters outside that set in a hostname or "domain name" to be invalid. Specifically an underscore is not a valid character, so any use of an underscore in the domain-part of an address that is supposedly an internet address is syntactically invalid. And yet there are quite a lot of hosts that have underscores in their names. Mail to them is deliverable. I've seen them in use occasionally, though I've no idea how reliable they are. All of which is a nice bit of RFC-lawyering, but not really that relevant. The obvious response demonstrating that "steve@foo&bar+baz" is syntactically valid would be an equally good bit of RFC-lawyering too. :) More practically (and this is a pragmatic database list, not an esoteric rules-lawyering anti-spam list :) ) I've found that the RE I mentioned earlier - allowing underscore, but excluding the other invalid hostname characters - is pretty good at spotting the usual badly formatted email addresses you see, without stumbling over the ones that many "email address validators" do. It punts on the whole "what is a reasonable looking local part?" question, of course, but that's near impossible to answer in a useful, practical sense other than being nervous about whitespace or anything smacking of source routing. Cheers, Steve
>>>>> "Steve" == Steve Atkins <steve@blighty.com> writes: Steve> So I consider any use of characters outside that set in a hostname or Steve> "domain name" to be invalid. Specifically an underscore is not a valid Steve> character, so any use of an underscore in the domain-part of an Steve> address that is supposedly an internet address is syntactically Steve> invalid. Really? I actually went round and round at a $client who wanted underscores in DNS, and I had to tell them "We can't change the entire world... you'll have to rename your hosts". Do you have an example of an underscore host that is publicly addressable? I'd like to look up their MX. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Brad Nicholson wrote: > Does anybody have regular expression handy to verfiy email addresses? There are Perl modules on CPAN to verify just about anything. Email::Valid comes to mind here. These can of course be plugged into a PL/Perl function. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Well, I guess this could be a hard-expensive way to do it but I've done this little Stored Function, it doesn't use a regular expresion (you could pass your email first to one to check it out I guess). #include "postgres.h" #include "fmgr.h" #include <netinet/in.h> #include <arpa/nameser.h> #include <resolv.h> PG_FUNCTION_INFO_V1(digmx); Datum digmx(PG_FUNCTION_ARGS) { int res; char *name; char answer[1024]; text *arg; arg = PG_GETARG_TEXT_P(0); res = res_init(); if(res != 0) { // Aki reporto un error } name = (char *) palloc(VARSIZE(arg)-VARHDRSZ); strcpy(name, VARDATA(arg)); res = res_query(name, C_IN, T_MX, answer, sizeof(answer)); if(res == -1) { PG_RETURN_BOOL(false); } else { // Aki imprimimos lo que debe escupir PG_RETURN_BOOL(true); } } You can pass the domain to that function and It would check using resolv if the domains has an mx entry in the nameserver. I guess it is a little slow (it was not thinking to use it for speed, but I accept suggestions for it!) but I think it is enough easy and it could be usefull for somebody. mydb# SELECT digmx('hotmail.com'); digmx ------ t (1 row) mydb# SELECT digmx('hotmail.co'); digmx ------ f (1 row) I know, it could be a very dumb to check the domain, but I consider myself as a totally newbie database/unix/programmer. Thanks a lot! PD: Please, I accept suggestion to improve this function.
>>>>> "Cristian" == Cristian Prieto <cristian@clickdiario.com> writes: Cristian> res = res_query(name, C_IN, T_MX, answer, sizeof(answer)); This incorrectly fails if an address has an "A" record but no "MX" record. According to RFC 2821 Section 5: The lookup first attempts to locate an MX record associated with the name. If a CNAME record is found instead, the resulting name is processed as if it were the initial name. If no MX records are found, but an A RR is found, the A RR is treated as if it was associated with an implicit MX RR, with a preference of 0, pointing to that host. So, your function will say "no good" if the domain has an A record but no MX record, even though the RFC says that's OK and deliverable. Man, is there a lot of bogus knowledge and cargo culting around this subject! -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
On Wed, Sep 07, 2005 at 12:21:45PM -0700, Steve Atkins <steve@blighty.com> wrote a message of 26 lines which said: > /^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/ Very bad idea to hardcode the list of TLD. You are already late (".jobs" and ".travel" are in the ICANN root).
On Thu, Sep 08, 2005 at 12:16:36PM -0600, Cristian Prieto <cristian@clickdiario.com> wrote a message of 66 lines which said: > res = res_query(name, C_IN, T_MX, answer, sizeof(answer)); Besides Randal Schwartz' excellent remark (do not forget the AAAA records, too), remember that the Internet is not reliable. What do you do when there is a temporary failure? (The email system works fine when faced with such failures.)
On Tue, Sep 13, 2005 at 12:59:43PM +0200, Stephane Bortzmeyer wrote: > On Wed, Sep 07, 2005 at 12:21:45PM -0700, > Steve Atkins <steve@blighty.com> wrote > a message of 26 lines which said: > > > /^[^@]*@(?:[^@]*\.)?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/ > > Very bad idea to hardcode the list of TLD. You are already late > (".jobs" and ".travel" are in the ICANN root). And are in my production code (despite being unusued, as yet). If you want to validate email addresses you _must_ check the TLD as part of the sanity checking, as many of the typos that are theoretically detectable are detectable by that check. Yes, you need to maintain that list correctly. But the list does not change often enough that keeping it in a dynamic table with the many orders of magnitude higher overhead makes any sense at all. Not hardcoding the list of TLDs would be a bad idea, if you need both performance and correctness. Cheers, Steve
Steve Atkins wrote: > If you want to validate email addresses you _must_ check the TLD as > part of the sanity checking, as many of the typos that are > theoretically detectable are detectable by that check. Your requirements may be different than mine, but I often make up "fake" TLDs for testing or even internal subnets in production systems, so having a system that hardcoded the list of "official" TLDs would be significantly less useful to me. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Tue, Sep 13, 2005 at 09:02:46PM +0200, Peter Eisentraut wrote: > Steve Atkins wrote: > > If you want to validate email addresses you _must_ check the TLD as > > part of the sanity checking, as many of the typos that are > > theoretically detectable are detectable by that check. > > Your requirements may be different than mine, but I often make up "fake" > TLDs for testing or even internal subnets in production systems, so > having a system that hardcoded the list of "official" TLDs would be > significantly less useful to me. It depends on the needs. For a purely internal application your needs are defined by your local setup. Those are not "internet email addresses", though. If you're accepting email addresses from Joe Public with the expectation of sending email to them, then you really want to do as much validation as you can at data capture time, or if not then at data import time. It's very, very hard to validate email addresses, but avoiding the usual typos, mistakes and misunderstandings is a very good idea and can keep your set of email addresses at least somewhat clean. (I pity the poor folks at noemail.com and aol.co, though...) Cheers, Steve