Re: Unicode normalization SQL functions - Mailing list pgsql-hackers

From Andreas Karlsson
Subject Re: Unicode normalization SQL functions
Date
Msg-id 26150b35-240f-941c-e5a7-24f2d489b316@proxel.se
Whole thread Raw
In response to Re: Unicode normalization SQL functions  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Unicode normalization SQL functions  (Michael Paquier <michael@paquier.xyz>)
Re: Unicode normalization SQL functions  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On 1/28/20 9:21 PM, Peter Eisentraut wrote:
> You're right, this didn't make any sense.  Here is a new patch set with 
> that fixed.

Thanks for this patch. This is a feature which has been on my personal 
todo list for a while and something which I have wished to have a couple 
of times.

I took a quick look at the patch and here is some feedback:

A possible concern is increased binary size from the new tables for the 
quickcheck but personally I think they are worth it.

A potential optimization would be to merge utf8_to_unicode() and 
pg_utf_mblen() into one function in unicode_normalize_func() since 
utf8_to_unicode() already knows length of the character. Probably not 
worth it though.

It feels a bit wasteful to measure output_size in 
unicode_is_normalized() since unicode_normalize() actually already knows 
the length of the buffer, it just does not return it.

A potential optimization for the normalized case would be to abort the 
quick check on the first maybe and normalize from that point on only. If 
I can find the time I might try this out and benchmark it.

Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.pl 
should be "split /\s*;\s*/, $line".

What about using else if in the code below for clarity?

+        if (check == UNICODE_NORM_QC_NO)
+            return UNICODE_NORM_QC_NO;
+        if (check == UNICODE_NORM_QC_MAYBE)
+            result = UNICODE_NORM_QC_MAYBE;

Remove extra space in the line below.

+    else if (quickcheck == UNICODE_NORM_QC_NO )

Andreas



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: [PATCH] libpq improvements and fixes
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] libpq improvements and fixes