tsvector and tsquery are not collatable types, but they do need locale
information to parse the original text. It would not do any good to
make it a collatable type, because a COLLATE clause would typically be
applied after the parsing is done.
Previously, tsearch used the database CTYPE for parsing, but that's not
good because it creates an unnecessary dependency on libc even when the
user has requested another provider.
This patch series allows tsearch to use the database default locale for
parsing. If the database collation is libc, there's no change.
Motivation:
(a) it reduces the dependence on setlocale(), which is not thread-
safe;
(b) if a user is using the builtin or ICU providers, understanding
the effects of LC_CTYPE can be very confusing;
(c) it would allow us to test more of the tsearch parsing behavior.
Notes:
* Should have the the exact same behavior as before if the database
locale provider is libc. If the database locale provider is builtin or
ICU, then there will be some differences in tsearch parsing behavior.
* Most of the patches are straightforward, but v1-0005 might need extra
attention. There are quite a few cases there with subtle distinctions,
and I might have missed something. For example, in the "C" locale,
tsearch treats non-ascii characters as alpha, even though the libc
functions do not do so (I preserved this behavior).
* This introduces redundancy between the character isxyz() functions in
recg_pg_locale.c and similar functions in pg_locale.c. It would be easy
enough to refactor to eliminate the redundancy, but that might have
performance implications, so I didn't do it yet.
Regards,
Jeff Davis