Email parsing in Text Search - Mailing list pgsql-bugs

From Martin Dubé
Subject Email parsing in Text Search
Date
Msg-id CAGny-cMH0s4Q-Ob=Ebn+-yDchLMVEm8bZ9PBP88vEvppsh5BPw@mail.gmail.com
Whole thread Raw
Responses Re: Email parsing in Text Search  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi,

I'm having a weird behavior with the email parser and wonder if it is a bug or a feature.

When using the default regconfig and parse an email where the first part is numbers only, it is not parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001@asdf.com');
 alias |   description    |   token   | dictionaries | dictionary |   lexemes   
-------+------------------+-----------+--------------+------------+-------------
 uint  | Unsigned integer | 000000001 | {simple}     | simple     | {000000001}
 blank | Space symbols    | @         | {}           |            | 
 host  | Host             | asdf.com  | {simple}     | simple     | {asdf.com}
(3 rows)


However, if I add a letter, it is parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001a@asdf.com');
 alias |  description  |        token        | dictionaries | dictionary |        lexemes        
-------+---------------+---------------------+--------------+------------+-----------------------
 email | Email address | 000000001a@asdf.com | {simple}     | simple     | {000000001a@asdf.com}
(1 row)

According to RFC and several forums, an email address with only numbers in the first part is valid. 

Is it a normal behavior?

I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.

Thanks,


--
Mart

pgsql-bugs by date:

Previous
From: Olivier Dony
Date:
Subject: Re: Serialization failures on PQ9.5
Next
From: Tom Lane
Date:
Subject: Re: Email parsing in Text Search