Re: Unicode normalization SQL functions - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Unicode normalization SQL functions
Date
Msg-id 7052cc8f-0164-72a8-d9a4-fd32066c938e@2ndquadrant.com
Whole thread Raw
In response to Re: Unicode normalization SQL functions  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Unicode normalization SQL functions  (John Naylor <john.naylor@2ndquadrant.com>)
Re: Unicode normalization SQL functions  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On 2020-03-24 10:20, Peter Eisentraut wrote:
> Now I have some concerns about the size of the new table in
> unicode_normprops_table.h, and the resulting binary size.  At the very
> least, we should probably make that #ifndef FRONTEND or something like
> that so libpq isn't bloated by it unnecessarily.  Perhaps there is a
> better format for that table?  Any ideas?

I have figured this out.  New patch is attached.

First, I have added #ifndef FRONTEND, as mentioned above, so libpq isn't 
bloated.  Second, I have changed the lookup structure to a bitfield, so 
each entry is only 32 bits instead of 64.  Third, I have dropped the 
quickcheck tables for the NFD and NFKD forms.  Those are by far the 
biggest tables, and you still get okay performance if you do the 
normalization check the long way, since we don't need the recomposition 
step on those cases, which is by far the slowest part.  The main use 
case of all of this, I expect, is to check for NFC normalization, so 
it's okay if the other variants are not optimized to the same extent.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Surafel Temesgen
Date:
Subject: Re: A rather hackish POC for alternative implementation of WITH TIES
Next
From: Michael Paquier
Date:
Subject: Re: [Patch] pg_rewind: options to use restore_command fromrecovery.conf or command line