Home > mailing lists

Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores - Mailing list pgsql-bugs

From	Euler Taveira de Oliveira
Subject	Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date	October 22, 2009 16:39:53
Msg-id	4AE0B4F8.1010604@timbira.com Whole thread Raw
In response to	Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
List	pgsql-bugs

Tree view

Robert Haas escreveu:
> I'm not real familiar with ts_parse(), but I'm thinking that it
> doesn't have any special casing for email addresses and is just
> intended to parse text for full-text-search - in which case splitting
> on _ is a pretty good algorithm.
>
It is a bug. The tsearch claims to identify types of tokens but it doesn't
correctly identify any valid e-mail addresses. As Dan stated ts_parse() fails
to recognize an e-mail address. For example, foo+bar@baz.com is a valid e-mail
but the function fails to report that.

It is not that simple to identify an e-mail address that agrees with RFC. As
that code is a state machine, IMHO it decides too early (when it finds _) that
that string is not an e-mail address. AFAIR, that's not an one-line fix.

euler=# select distinct token as email from ts_parse('default',
'foo.bar@baz.com');
      email
âââââââââââââââââ
 foo.bar@baz.com
(1 row)

euler=# select distinct token as email from ts_parse('default',
'foo+bar@baz.com');
    email
âââââââââââââ
 foo
 +
 bar@baz.com
(3 rows)

euler=# select distinct token as email from ts_parse('default',
'foo_bar@baz.com');
    email
âââââââââââââ
 foo
 bar@baz.com
 _
(3 rows)


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

pgsql-bugs by date:

From: Tom Lane
Date: 22 October 2009, 15:39:47
Subject: Re: BUG #5039: 'i' flag i in regexp_replace ignored for polish letters

From: Stephen Frost
Date: 22 October 2009, 16:42:44
Subject: psql -1 -f - busted

Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores - Mailing list pgsql-bugs

Previous

Next