Re: unexpected result from to_tsvector - Mailing list pgsql-hackers
From | Artur Zakirov |
---|---|
Subject | Re: unexpected result from to_tsvector |
Date | |
Msg-id | 56E6CE7C.30409@postgrespro.ru Whole thread Raw |
In response to | Re: unexpected result from to_tsvector ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>) |
Responses |
Re: unexpected result from to_tsvector
|
List | pgsql-hackers |
On 14.03.2016 16:22, Shulgin, Oleksandr wrote: > > Hm... now that doesn't look all that consistent to me (after applying > the patch): > > =# select ts_debug('simple', 'aaa@123-yyy.zzz'); > ts_debug > --------------------------------------------------------------------------- > (email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz}) > (1 row) > > But: > > =# select ts_debug('simple', 'aaa@123_yyy.zzz'); > ts_debug > --------------------------------------------------------- > (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa}) > (blank,"Space symbols",@,{},,) > (uint,"Unsigned integer",123,{simple},simple,{123}) > (blank,"Space symbols",_,{},,) > (host,Host,yyy.zzz,{simple},simple,{yyy.zzz}) > (5 rows) > > One can also see that if we only keep the domain name, the result is > similar: > > =# select ts_debug('simple', '123-yyy.zzz'); > ts_debug > ------------------------------------------------------- > (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz}) > (1 row) > > =# select ts_debug('simple', '123_yyy.zzz'); > ts_debug > ----------------------------------------------------- > (uint,"Unsigned integer",123,{simple},simple,{123}) > (blank,"Space symbols",_,{},,) > (host,Host,yyy.zzz,{simple},simple,{yyy.zzz}) > (3 rows) > > But, this only has to do with 123 being recognized as a number, not with > the underscore: > > =# select ts_debug('simple', 'abc_yyy.zzz'); > ts_debug > ------------------------------------------------------- > (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz}) > (1 row) > > =# select ts_debug('simple', '1abc_yyy.zzz'); > ts_debug > ------------------------------------------------------- > (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz}) > (1 row) > > In fact, the 123-yyy.zzz domain is not valid either according to the RFC > (subdomain can't start with a digit), but since we already allow it, > should we not allow 123_yyy.zzz to be recognized as a Host? Then why > not recognize aaa@123_yyy.zzz as an email address? > > Another option is to prohibit underscore in recognized host names, but > this has more breakage potential IMO. > > -- > Alex > It seems reasonable to me. I like more first option. But I am not confident that we should allow 123_yyy.zzz to be recognized as a Host. By the way, in this question http://webmasters.stackexchange.com/a/775 you can see examples of domain names with numbers (but not subdomains). If there are not objections from others, I will send a new patch today later or tomorrow with 123_yyy.zzz recognizing. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: