Thread: Email parsing in Text Search
Hi,
--
I'm having a weird behavior with the email parser and wonder if it is a bug or a feature.
When using the default regconfig and parse an email where the first part is numbers only, it is not parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001@asdf.com');
alias | description | token | dictionaries | dictionary | lexemes
-------+------------------+-----------+--------------+------------+-------------
uint | Unsigned integer | 000000001 | {simple} | simple | {000000001}
blank | Space symbols | @ | {} | |
(3 rows)
However, if I add a letter, it is parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001a@asdf.com');
alias | description | token | dictionaries | dictionary | lexemes
-------+---------------+---------------------+--------------+------------+-----------------------
email | Email address | 000000001a@asdf.com | {simple} | simple | {000000001a@asdf.com}
(1 row)
According to RFC and several forums, an email address with only numbers in the first part is valid.
Is it a normal behavior?
I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.
Thanks,
Mart
Martin Dubé <martin.dube@gmail.com> writes: > When using the default regconfig and parse an email where the first part is > numbers only, it is not parsed as an email. This has been changed for 9.6: * Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov) regards, tom lane
I should have seen that! Thank you very much!
On Wed, Sep 7, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Martin Dubé <martin.dube@gmail.com> writes:
> When using the default regconfig and parse an email where the first part is
> numbers only, it is not parsed as an email.
This has been changed for 9.6:
* Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)
regards, tom lane
Mart