Re: CREATE DATABASE command for non-libc providers - Mailing list pgsql-hackers
From | Daniel Verite |
---|---|
Subject | Re: CREATE DATABASE command for non-libc providers |
Date | |
Msg-id | eaafe5c4-a1eb-4028-92a1-722304875d86@manitou-mail.org Whole thread Raw |
In response to | Re: CREATE DATABASE command for non-libc providers (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: CREATE DATABASE command for non-libc providers
|
List | pgsql-hackers |
Jeff Davis wrote: > The main challenge is backwards compatibility. Users of FTS would need > to recreate all of their tsvectors and indexes dependent on them. It's > even possible that some users only have tsvectors and don't store the > original data in the database, which would further complicate matters. Why would it be that bad? FTS indexes don't get corrupted that way. You may get different lexems before and after the upgrade for some documents, and then what? The FTS parser had seen user-visible changes in the past, and regenerating tsvectors because of that were merely a suggestion. commit 61d66c44f18c73094a50a2ef97d26cc03e171dc0 Author: Teodor Sigaev <teodor@sigaev.ru> Date: Tue Mar 29 17:59:58 2016 +0300 Fix support of digits in email/hostnames. When tsearch was implemented I did several mistakes in hostname/email definition rules: 1) allow underscore in hostname what ted by RFC 2) forget to allow leading digits separated by hyphen (like 123-x.com) in hostname 3) do no allow underscore/hyphen after leading digits in localpart of email Artur's patch resolves two last issues, but by the way allows hosts name like 123_x.com together with 123-x.com. RFC forbids underscore usage in hostname but pg allows that since initial tsearch version in core, although only for non-digits. Patch syncs support digits and nondigits in both hostname and email. Forbidding underscore in hostname may break existsing usage of tsearch and, anyhow, it should be done by separate patch. Author: Artur Zakirov BUG: #13964 In the release notes: Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov) In most cases this will result in few changes in the parsing of text. But if you have data where such addresses occur frequently, it may be worth rebuilding dependent tsvector columns and indexes so that addresses of this form will be found properly by text searches. commit 2c265adea3129c917296b46a82786d67988ece2c Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed Apr 28 02:04:16 2010 +0000 Modify the built-in text search parser to handle URLs more nearly according to RFC 3986. In particular, these characters now terminate the path part of a URL: '"', '<', '>', '\', '^', '`', '{', '|', '}'. The previous behavior was inconsistent and depended on whether a "?" was present in the path. Per gripe from Donald Fraser and spec research by Kevin Grittner. This is a pre-existing bug, but not back-patching since the risks of breaking existing applications seem to outweigh the benefits. https://www.postgresql.org/docs/release/9.0.0/ E.24.3.5.1. Full Text Search Use more standards-compliant rules for parsing URL tokens (Tom Lane) Best regards, -- Daniel Vérité https://postgresql.verite.pro/
pgsql-hackers by date: