pgsql: Improve performance of Unicode {de,re}composition in the backend - Mailing list pgsql-committers

From Michael Paquier
Subject pgsql: Improve performance of Unicode {de,re}composition in the backend
Date
Msg-id E1kVmWU-0005ls-SS@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Improve performance of Unicode {de,re}composition in the backend

This replaces the existing binary search with two perfect hash functions
for the composition and the decomposition in the backend code, at the
cost of slightly-larger binaries there (35kB in libpgcommon_srv.a).  Per
the measurements done, this improves the speed of the recomposition and
decomposition by up to 30~40 times for the NFC and NFKC conversions,
while all other operations get at least 40% faster.  This is not as
"good" as what libicu has, but it closes the gap a lot as per the
feedback from Daniel Verite.

The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance as this gets involved only in code
paths related to the SCRAM authentication.  In consequence, note that
the perfect hash function for the recomposition needs to use a new
inverse lookup array back to to the existing decomposition table.

The size of all frontend deliverables remains unchanged, even with
--enable-debug, including libpq.

Author: John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/783f0cc64dcc05e3d112a06b1cd181e5a1ca9099

Modified Files
--------------
src/common/unicode/Makefile                       |    4 +-
src/common/unicode/generate-unicode_norm_table.pl |  226 +-
src/common/unicode_norm.c                         |  106 +-
src/include/common/unicode_norm_hashfunc.h        | 2932 +++++++++++++++++++++
src/tools/pgindent/exclude_file_patterns          |    3 +-
5 files changed, 3227 insertions(+), 44 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Sync our copy of the timezone library with IANA release tzcode20
Next
From: Heikki Linnakangas
Date:
Subject: pgsql: Fix initialization of es_result_relations in EvalPlanQualStart()