Re: Email Verfication Regular Expression - Mailing list pgsql-general

From Steve Atkins
Subject Re: Email Verfication Regular Expression
Date
Msg-id 20050907220147.GA31577@gp.word-to-the-wise.com
Whole thread Raw
In response to Re: Email Verfication Regular Expression  (merlyn@stonehenge.com (Randal L. Schwartz))
Responses Re: Email Verfication Regular Expression
List pgsql-general
On Wed, Sep 07, 2005 at 01:33:51PM -0700, Randal L. Schwartz wrote:
> >>>>> "Steve" == Steve Atkins <steve@blighty.com> writes:
>
> Steve> But, depending on what you're doing, validation may not be a good
> Steve> idea. There are email addresses that are syntactically invalid that
> Steve> are deliverable and in active use.
>
> Really?  Name one. Or maybe it's just your idea of syntax that's wrong.

Well, my idea of syntax may differ from yours, but it doesn't neccessarily
mean that either of us is wrong. If we were talking the formal grammar
in RFC2822 section 3.4.1 I'd agree with you. But reading the surrounding
text implies that the spec is tighter than the formal grammar says it is.

2822 syntax allows almost any character in the domain-part (excluding
brackets, whitespace and backslash only, IIRC) but 2822 also describes
the dot-atom form of the domain part as an internet domain name,
either an MX or a hostname, referring to STD3, STD13 and STD14.

While most characters are legal in the 2822 syntax and in DNS, you can
extract from the RFCs that hostnames really should look like
/([A-Za-z0-9-]+\.)*[A-Za-z0-9]+/

So I consider any use of characters outside that set in a hostname or
"domain name" to be invalid. Specifically an underscore is not a valid
character, so any use of an underscore in the domain-part of an
address that is supposedly an internet address is syntactically
invalid.

And yet there are quite a lot of hosts that have underscores in their
names. Mail to them is deliverable. I've seen them in use
occasionally, though I've no idea how reliable they are.

All of which is a nice bit of RFC-lawyering, but not really that
relevant. The obvious response demonstrating that "steve@foo&bar+baz"
is syntactically valid would be an equally good bit of RFC-lawyering
too. :)

More practically (and this is a pragmatic database list, not an
esoteric rules-lawyering anti-spam list :) ) I've found that the RE I
mentioned earlier - allowing underscore, but excluding the other
invalid hostname characters - is pretty good at spotting the usual
badly formatted email addresses you see, without stumbling over the
ones that many "email address validators" do. It punts on the whole
"what is a reasonable looking local part?" question, of course, but
that's near impossible to answer in a useful, practical sense other
than being nervous about whitespace or anything smacking of source
routing.

Cheers,
  Steve


pgsql-general by date:

Previous
From: Sebastian Hennebrueder
Date:
Subject: Re: Debug plpgSQL stored procedures
Next
From: merlyn@stonehenge.com (Randal L. Schwartz)
Date:
Subject: Re: Email Verfication Regular Expression