Home > mailing lists

Slightly insane use of USE_WIDE_UPPER_LOWER in pg_trgm - Mailing list pgsql-hackers

From	Tom Lane
Subject	Slightly insane use of USE_WIDE_UPPER_LOWER in pg_trgm
Date	April 7, 2013 15:43:33
Msg-id	21256.1365349406@sss.pgh.pa.us Whole thread Raw
Responses	Re: Slightly insane use of USE_WIDE_UPPER_LOWER in pg_trgm
List	pgsql-hackers

Tree view

While reviewing the latest incarnation of the regex indexing patch,
I noticed that make_trigrams() in contrib/pg_trgm/trgm_op.c is coded
so that if USE_WIDE_UPPER_LOWER is not set, it ignores multibyte
character boundaries and just makes trigrams from 3-byte substrings.
This seems slightly insane, not least because there's an Assert there
that will fail if it's fed any multibyte characters.  I suppose no one
has actually tried this code with non-ASCII data on machines where
USE_WIDE_UPPER_LOWER isn't set; at least not with Asserts turned on.
(Considering that even my favorite dinosaur HPUX machine has got both
HAVE_WCSTOMBS and HAVE_TOWLOWER, it may well be that there *aren't* any
such machines anymore.)

So I'm inclined to remove the two #ifdef USE_WIDE_UPPER_LOWER tests
in trgm_op.c, and just use the multibyte-aware code all the time.
A downside of this is that if there is indeed anyone out there storing
non-ASCII trigrams on a machine without USE_WIDE_UPPER_LOWER, their
indexes would break if they pg_upgrade to 9.3.  OTOH their indexes would
break anyway if they rebuilt against a more modern libc, or built with
Asserts on.

If we don't do this then we'll have to complicate the regex indexing
patch some more, since it's currently imagining that cnt_trigram()
is always the way to make storable trigrams from raw text, and this
is just wrong in the existing non-USE_WIDE_UPPER_LOWER code path.

Comments?
        regards, tom lane

pgsql-hackers by date:

From: Craig Ringer
Date: 07 April 2013, 11:55:04
Subject: Re: doc hdparm also support SATA

From: Alvaro Herrera
Date: 07 April 2013, 15:50:24
Subject: Re: Process title for autovac

Slightly insane use of USE_WIDE_UPPER_LOWER in pg_trgm - Mailing list pgsql-hackers

Previous

Next