Re: unexpected result from to_tsvector - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: unexpected result from to_tsvector
Date
Msg-id CACACo5SMkOU3cYhKHiLcOCkKvkeh9MYqQTbA95apZ38iwPL5qQ@mail.gmail.com
Whole thread Raw
In response to Re: unexpected result from to_tsvector  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: unexpected result from to_tsvector
Re: unexpected result from to_tsvector
List pgsql-hackers
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:


Hello,

Should we added tests for this case?

I think we should. I have added tests for teodor@123-stack.net and 123@stack.net emails.


123_reg.ro <http://123_reg.ro> is not valid domain name, bacause of
symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub

Thank you for the information. Fixed.

Hm...  now that doesn't look all that consistent to me (after applying the patch):

=# select ts_debug('simple', 'aaa@123-yyy.zzz');
                                 ts_debug                                  
---------------------------------------------------------------------------
 (email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
                        ts_debug                         
---------------------------------------------------------
 (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
 (blank,"Space symbols",@,{},,)
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is similar:

=# select ts_debug('simple', '123-yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
                      ts_debug                       
-----------------------------------------------------
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC (subdomain can't start with a digit), but since we already allow it, should we not allow 123_yyy.zzz to be recognized as a Host?  Then why not recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but this has more breakage potential IMO.

--
Alex

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Prepared Statement support for Parallel query
Next
From: David Steele
Date:
Subject: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check