Thread: pgsql: Teach the regular expression functions to do case-insensitive
pgsql: Teach the regular expression functions to do case-insensitive
From
tgl@postgresql.org (Tom Lane)
Date:
Log Message: ----------- Teach the regular expression functions to do case-insensitive matching and locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice. Modified Files: -------------- pgsql/src/backend/regex: regc_locale.c (r1.9 -> r1.10) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/regex/regc_locale.c?r1=1.9&r2=1.10) pgsql/src/include/regex: regcustom.h (r1.7 -> r1.8) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/regex/regcustom.h?r1=1.7&r2=1.8)