Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date
Msg-id 201003130109.o2D19WG28274@momjian.us
Whole thread Raw
In response to Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
List pgsql-hackers
Alvaro Herrera wrote:
> 
> Upon seeing this patch I considered that I use addresses such as
> alvherre+stuff@something.org  and wondered how could this thing support
> that.  I don't think we want extra parser stuff just to add whatever
> random junk we want to support in email addresses ...

Well, I think the big question is whether we need to honor RFC 5322
(http://www.rfc-editor.org/rfc/rfc5322.txt). Wikipedia says these are
all valid characters:
   http://en.wikipedia.org/wiki/E-mail_address
   * Uppercase and lowercase English letters (a-z, A-Z)   * Digits 0 to 9   * Characters ! # $ % & ' * + - / = ? ^ _ `
{| } ~   * Character . (dot, period, full stop) provided that it is not the     first or last character, and provided
alsothat it does not appear two     or more times consecutively.
 

And we don't currently honor most of the special characters, including
plus:
test=> select ts_parse('default', ' first+last@yahoo.com '   );      ts_parse-------------------- (12," ") (1,first)
(12,+)(4,last@yahoo.com) (12," ")(5 rows)
 

Where does this leave us?  Do we add the other characters?  Do we
document that we only allow a limited number of characters for email
addresses?  What is the logic in that?  Do any of these characters
conflict with our tsquery operators?  

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: renameatt() can rename attribute of index, sequence, ...
Next
From: Tom Lane
Date:
Subject: Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores