Home > mailing lists

Re: Unicode normalization SQL functions - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: Unicode normalization SQL functions
Date	April 2, 2020 10:51:19
Msg-id	2d3e7d66-0f89-a663-280b-45d5b88b196b@2ndquadrant.com Whole thread Raw
In response to	Re: Unicode normalization SQL functions (John Naylor <john.naylor@2ndquadrant.com>)
Responses	Re: Unicode normalization SQL functions (John Naylor <john.naylor@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On 2020-03-26 18:41, John Naylor wrote:
> We don't have a trie implementation in Postgres, but we do have a
> perfect hash implementation. Doing that would bring the tables back to
> 64 bits per entry, but would likely be noticeably faster than binary
> search. Since v4 has left out the biggest tables entirely, I think
> this might be worth a look for the smaller tables remaining.
> 
> In the attached v5, when building the hash tables, we sort the code
> points by NO/MAYBE, and store the index of the beginning of the NO
> block:

This is a valuable idea, but I fear it's a bit late now in this cycle. 
I have questions about some details.  For example, you mention that you 
had to fiddle with the hash seed.  How does that affect other users of 
PerfectHash?  What happens when we update Unicode data and the hash 
doesn't work anymore?  These discussions might derail this patch at this 
hour, so I have committed the previous patch.  We can consider your 
patch as a follow-up patch, either now or in the future.

 > Also, if we go with v4, I noticed the following test is present twice:
 >
 > +SELECT "normalize"('abc', 'def');  -- run-time error

I think this is correct.  The other test is for "is_normalized".

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Peter Eisentraut
Date: 02 April 2020, 10:45:09
Subject: Re: Unicode normalization SQL functions

From: "movead.li@highgo.ca"
Date: 02 April 2020, 10:51:52
Subject: Re: A bug when use get_bit() function for a long bytea string

Re: Unicode normalization SQL functions - Mailing list pgsql-hackers

Previous

Next