Thread: pgsql: Fix support of digits in email/hostnames.

pgsql: Fix support of digits in email/hostnames.

From
Teodor Sigaev
Date:
Fix support of digits in email/hostnames.

When tsearch was implemented I did several mistakes in hostname/email
definition rules:
1) allow underscore in hostname what prohibited by RFC
2) forget to allow leading digits separated by hyphen (like 123-x.com)
   in hostname
3) do no allow underscore/hyphen after leading digits in localpart of email

Artur's patch resolves two last issues, but by the way allows hosts name like
123_x.com together with 123-x.com. RFC forbids underscore usage in hostname
but pg allows that since initial tsearch version in core, although only
for non-digits. Patch syncs support digits and nondigits in both hostname and
email.

Forbidding underscore in hostname may break existsing usage of tsearch and,
anyhow, it should be done by separate patch.

Author: Artur Zakirov
BUG: #13964

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/61d66c44f18c73094a50a2ef97d26cc03e171dc0

Modified Files
--------------
src/backend/tsearch/wparser_def.c     |  3 +++
src/test/regress/expected/tsearch.out | 22 ++++++++++++++--------
src/test/regress/sql/tsearch.sql      |  6 +++---
3 files changed, 20 insertions(+), 11 deletions(-)


Re: pgsql: Fix support of digits in email/hostnames.

From
Bruce Momjian
Date:
On Tue, Mar 29, 2016 at 03:29:20PM +0000, Teodor Sigaev wrote:
> Fix support of digits in email/hostnames.
>
> When tsearch was implemented I did several mistakes in hostname/email
> definition rules:
> 1) allow underscore in hostname what prohibited by RFC
> 2) forget to allow leading digits separated by hyphen (like 123-x.com)
>    in hostname
> 3) do no allow underscore/hyphen after leading digits in localpart of email
>
> Artur's patch resolves two last issues, but by the way allows hosts name like
> 123_x.com together with 123-x.com. RFC forbids underscore usage in hostname
> but pg allows that since initial tsearch version in core, although only
> for non-digits. Patch syncs support digits and nondigits in both hostname and
> email.
>
> Forbidding underscore in hostname may break existsing usage of tsearch and,
> anyhow, it should be done by separate patch.
>
> Author: Artur Zakirov
> BUG: #13964

Doesn't this invalidate tsvector indexes upgraded by pg_upgrade?  Should
they be marked as invalid?

Can you also fix the other two TODO items related to this?

    Improve handling of dash and plus signs in email address
    user names, and perhaps improve URL parsing

    http://www.postgresql.org/message-id/201010122203.o9CM3RW09263@momjian.us

    http://www.postgresql.org/message-id/E1Ri8il-0008Ct-9p@wrigleys.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +


Re: pgsql: Fix support of digits in email/hostnames.

From
Teodor Sigaev
Date:
> Doesn't this invalidate tsvector indexes upgraded by pg_upgrade?  Should
> they be marked as invalid?
Directly, it affects on functional indexes i.e. over to_tsvector(). But it
affects tsvector column, it should be recreated if it was generated by
ts_vector() function.

>
> Can you also fix the other two TODO items related to this?
>
>     Improve handling of dash and plus signs in email address
>     user names, and perhaps improve URL parsing
>
>     http://www.postgresql.org/message-id/201010122203.o9CM3RW09263@momjian.us
>
>     http://www.postgresql.org/message-id/E1Ri8il-0008Ct-9p@wrigleys.postgresql.org
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: pgsql: Fix support of digits in email/hostnames.

From
Bruce Momjian
Date:
On Fri, Apr 29, 2016 at 01:20:35PM +0300, Teodor Sigaev wrote:
> >Doesn't this invalidate tsvector indexes upgraded by pg_upgrade?  Should
> >they be marked as invalid?
> Directly, it affects on functional indexes i.e. over to_tsvector(). But it
> affects tsvector column, it should be recreated if it was generated by
> ts_vector() function.

OK, so every tsvector column or expression index needs to be reported by
pg_upgrade?  Do we want to fix everything else in this same release?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +


Re: pgsql: Fix support of digits in email/hostnames.

From
Bruce Momjian
Date:
On Fri, Apr 29, 2016 at 06:43:04AM -0400, Bruce Momjian wrote:
> On Fri, Apr 29, 2016 at 01:20:35PM +0300, Teodor Sigaev wrote:
> > >Doesn't this invalidate tsvector indexes upgraded by pg_upgrade?  Should
> > >they be marked as invalid?
> > Directly, it affects on functional indexes i.e. over to_tsvector(). But it
> > affects tsvector column, it should be recreated if it was generated by
> > ts_vector() function.
>
> OK, so every tsvector column or expression index needs to be reported by
> pg_upgrade?  Do we want to fix everything else in this same release?

I guess my point is that we should do all pg_upgrade-breaking tsvector
changes in a single release so we don't need to invalidate tsvector
columns and indexes in two releases.  If we can't do them all in 9.6,
perhaps we should revert this change and do them all in 9.7.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +