Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date
Msg-id 20643.1268443116@sss.pgh.pa.us
Whole thread Raw
In response to Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
List pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
> Well, I think the big question is whether we need to honor RFC 5322
> (http://www.rfc-editor.org/rfc/rfc5322.txt). Wikipedia says these are
> all valid characters:

>     http://en.wikipedia.org/wiki/E-mail_address

>     * Uppercase and lowercase English letters (a-z, A-Z)
>     * Digits 0 to 9
>     * Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
>     * Character . (dot, period, full stop) provided that it is not the
>       first or last character, and provided also that it does not appear two
>       or more times consecutively.

That's an awful lot of special characters.  For the RFC's purposes,
it's not hard to be flexible because in an email message there is
external context telling where to expect an address.  I think if we
tried to allow all of those in email addresses in tsearch, we'd have
"email addresses" gobbling up a whole lot of adjacent text, to nobody's
benefit.

I can see the case for adding "+" because that's fairly common as Alvaro
notes, but I think we should be very circumspect about going farther.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Next
From: Bruce Momjian
Date:
Subject: Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores