Re: speed up unicode normalization quick check - Mailing list pgsql-hackers

From John Naylor
Subject Re: speed up unicode normalization quick check
Date
Msg-id CACPNZCvUBmKSivCGAjh-sERQ9bAigB14cPZr5Zc4Do_ryd5Ezg@mail.gmail.com
Whole thread Raw
In response to Re: speed up unicode normalization quick check  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: speed up unicode normalization quick check  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Fri, May 29, 2020 at 5:59 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
>
> > On May 21, 2020, at 12:12 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:

> > very picky in general. As a test, it also successfully finds a
> > function for the OS "words" file, the "D" sets of codepoints, and for
> > sets of the first n built-in OIDs, where n > 5.
>
> Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash.  Your patch adds a
second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different
multipliersis the right way to go. 

Calling it "each time" with a sample size of two is a bit of a
stretch. The first implementation made a reasonable attempt to suit
future uses and I simply made it a bit more robust. In the text quoted
above you can see I tested some scenarios beyond the current use
cases, with key set sizes as low as 6 and as high as 250k.

> Could you instead make them arguments such that gen_keywordlist.pl, generate-unicode_combining_table.pl, and future
callerscan pass in the numbers they want?  Or is there some advantage to having it this way? 

That is an implementation detail that callers have no business knowing about.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: "John Bachir"
Date:
Subject: feature idea: use index when checking for NULLs before SET NOT NULL
Next
From: David Rowley
Date:
Subject: Speeding up parts of the planner using a binary search tree structurefor nodes