Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date
Msg-id 20091023014430.GC2240@alvh.no-ip.org
Whole thread Raw
In response to Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores  (Euler Taveira de Oliveira <euler@timbira.com>)
List pgsql-bugs
Euler Taveira de Oliveira escribió:
> Robert Haas escreveu:
> > I'm not real familiar with ts_parse(), but I'm thinking that it
> > doesn't have any special casing for email addresses and is just
> > intended to parse text for full-text-search - in which case splitting
> > on _ is a pretty good algorithm.
> >
> It is a bug. The tsearch claims to identify types of tokens but it doesn't
> correctly identify any valid e-mail addresses. As Dan stated ts_parse() fails
> to recognize an e-mail address. For example, foo+bar@baz.com is a valid e-mail
> but the function fails to report that.

It is similarly too-simplistic for other cases too, like file names
(particularly where Windows filenames are concerned).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #5126: convert_to preventing index scan
Next
From: Kamil Roman
Date:
Subject: Re: BUG #5039: 'i' flag i in regexp_replace ignored for polish letters