BUG #6375: tsearch does not recognize all valid emails - Mailing list pgsql-bugs

From valgog@gmail.com
Subject BUG #6375: tsearch does not recognize all valid emails
Date
Msg-id E1Ri8il-0008Ct-9p@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #6375: tsearch does not recognize all valid emails  (Bruce Momjian <bruce@momjian.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      6375
Logged by:          Valentine Gogichashvili
Email address:      valgog@gmail.com
PostgreSQL version: 9.1.1
Operating system:   Debian 4.4.5-8
Description:=20=20=20=20=20=20=20=20

Hello,=20

default tsearch parser does not recognize all valid email addresses and
tokenizes them as text, splitting into tokens.=20

For example:

postgres=3D# select to_tsquery('simple', 'normal@email.com' );
     to_tsquery=20=20=20=20=20
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80
 'normal@email.com'
(1 row)

here it behaves ok;

postgres=3D# select to_tsquery('simple', '-still-normal@email.com' );
        to_tsquery=20=20=20=20=20=20=20=20
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80
 'still-normal@email.com'
(1 row)

here it trims '-' from the beginning of an email. This is not correct, but
will at least find that email.

postgres=3D# select to_tsquery('simple', '-not-normal-with-dash-@email.com'
);
                                  to_tsquery=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
=20=20
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80
 'not-normal-with-dash' & 'not' & 'normal' & 'with' & 'dash' & 'email.com'
(1 row)

and this is now a real problem as it leads to finding emails that are not
the same, but are "super-sets" of that one.

Valid email characters, that are not correctly treated also are at least '+'
and '.'

With my best regards,=20

-- Valentine Gogichashvili=20

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #6372: Error while creating database with fsync parameter as on incase of CIFS
Next
From: Josh Kupershmidt
Date:
Subject: Re: BUG #6370: manual does not discuss transactional DDL