On Thu, Oct 15, 2020 at 01:59:38PM -0400, John Naylor wrote:
> I think I've seen a trie recommended somewhere, maybe the official website.
> That said, I was able to get the hash working for recomposition (split into
> a separate patch, and both of them now leave frontend alone), and I'm
> pleased to say it's 50-75x faster than linear search in simple tests. I'd
> be curious how it compares to ICU now. Perhaps Daniel Verite would be
> interested in testing again? (CC'd)
Yeah, that would be interesting to compare. Now the gains proposed by
this patch are already a good step forward, so I don't think that it
should be a blocker for a solution we have at hand as the numbers
speak by themselves here. So if something better gets proposed, we
could always change the decomposition and recomposition logic as
needed.
> select count(normalize(t, NFC)) from (
> select md5(i::text) as t from
> generate_series(1,100000) as i
> ) s;
>
> master patch
> 18800ms 257ms
My environment was showing HEAD as being a bit faster with 15s, while
the patch gets "only" down to 290~300ms (compiled with -O2, as I guess
you did). Nice.
+ # Then the second
+ return -1 if $a2 < $b2;
+ return 1 if $a2 > $b2;
Should say "second code point" here?
+ hashkey = pg_hton64(((uint64) start << 32) | (uint64) code);
+ h = recompinfo.hash(&hashkey);
This choice should be documented, and most likely we should have
comments on the perl and C sides to keep track of the relationship
between the two.
The binary sizes of libpgcommon_shlib.a and libpgcommon.a change
because Decomp_hash_func() gets included, impacting libpq.
Structurally, wouldn't it be better to move this part into its own,
backend-only, header? It could be possible to paint the difference
with some ifdef FRONTEND of course, or just keep things as they are
because this can be useful for some out-of-core frontend tool? But if
we keep that as a separate header then any C part can decide to
include it or not, so frontend tools could also make this choice.
Note that we don't include unicode_normprops_table.h for frontends in
unicode_norm.c, but that's the case of unicode_norm_table.h.
--
Michael