Thread: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi

Get rid of USE_WIDE_UPPER_LOWER dependency in trigram construction.

contrib/pg_trgm's make_trigrams() was coded to ignore multibyte character
boundaries and just make trigrams from bytes if USE_WIDE_UPPER_LOWER wasn't
defined.  This is a bit odd, since there's no obvious reason why trigram
compaction rules should depend on the presence of towlower() and friends.
What's more, there was an Assert() that would fail if that code path was
fed any multibyte characters.

We need to do something about this since the pending regex-indexing patch
has an assumption that you get just one "trgm" from any three characters.
The best solution seems to be to remove the USE_WIDE_UPPER_LOWER
dependency, which shouldn't really have been there in the first place.
The second loop in make_trigrams() is now just a fast path and not a
potentially incompatible algorithm.

If there is anybody still using Postgres on machines without wcstombs() or
towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
incorrectly. It seems likely that there are no such installations, though.

In passing, rename cnt_trigram to compact_trigram, which seems to better
describe its functionality, and improve make_trigrams' test for whether it
has to use the slow path or not (per a suggestion from Alexander Korotkov).

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/7844608e54a3a2e3dee461b00fd6ef028a845d7c

Modified Files
--------------
contrib/pg_trgm/trgm_op.c |   17 ++++++++++-------
1 files changed, 10 insertions(+), 7 deletions(-)


Re: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> If there is anybody still using Postgres on machines without wcstombs() or
> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
> incorrectly. It seems likely that there are no such installations, though.

Those conditions seem just complex enough to require a test script that
will check that for you. What if we created a new binary responsible for
auto checking all those release-note items that are possible to machine
check, then issue a WARNING containing the URL to the release notes you
should be reading, and a SQL script (ala pg_upgrade) to run after
upgrade?

  $ pg_checkupgrade -d "connection=string" > upgrade.sql
  NOTICE: checking 9.3 upgrade release notes
  WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3
  WARNING: TN-93-0012 …
  WARNING: This script is NOT comprehensive, read release notes at …

The target version would be hard coded on the binary itself for easier
maintaining of it, and that proposal includes a unique identifier for
any release note worthy warning that we know about, that would be
included in the output of the program.

I think most of the checks would only have to be SQL code, and some of
them should include running some binary code the server side. When
that's possible, we could maybe expose that binary code in a server side
extension so as to make the client side binary life's easier. That would
also be an excuse for the project to install some upgrade material on
the old server, which has been discussed in the past for preparing
pg_upgrade when we have a page format change.

What do you think?
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi

From
Stefan Kaltenbrunner
Date:
On 04/08/2013 10:11 AM, Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> If there is anybody still using Postgres on machines without wcstombs() or
>> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
>> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
>> incorrectly. It seems likely that there are no such installations, though.
>
> Those conditions seem just complex enough to require a test script that
> will check that for you. What if we created a new binary responsible for
> auto checking all those release-note items that are possible to machine
> check, then issue a WARNING containing the URL to the release notes you
> should be reading, and a SQL script (ala pg_upgrade) to run after
> upgrade?
>
>   $ pg_checkupgrade -d "connection=string" > upgrade.sql
>   NOTICE: checking 9.3 upgrade release notes
>   WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3
>   WARNING: TN-93-0012 …
>   WARNING: This script is NOT comprehensive, read release notes at …
>
> The target version would be hard coded on the binary itself for easier
> maintaining of it, and that proposal includes a unique identifier for
> any release note worthy warning that we know about, that would be
> included in the output of the program.
>
> I think most of the checks would only have to be SQL code, and some of
> them should include running some binary code the server side. When
> that's possible, we could maybe expose that binary code in a server side
> extension so as to make the client side binary life's easier. That would
> also be an excuse for the project to install some upgrade material on
> the old server, which has been discussed in the past for preparing
> pg_upgrade when we have a page format change.

given something like this also will have to be dealt with by pg_upgrade,
why not fold it into that (like into -c) completly and recommend running
that? on the flipside if people don't read the release notes they will
also not run any kind of binary/script mentioned there...



Stefan