Re: Improve the performance of Unicode Normalization Forms. - Mailing list pgsql-hackers
From | Alexander Borisov |
---|---|
Subject | Re: Improve the performance of Unicode Normalization Forms. |
Date | |
Msg-id | cfd504f7-1fc1-43df-9356-f68818f30921@gmail.com Whole thread Raw |
In response to | Re: Improve the performance of Unicode Normalization Forms. (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Improve the performance of Unicode Normalization Forms.
|
List | pgsql-hackers |
11.06.2025 10:13, John Naylor wrote: > On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov <lex.borisov@gmail.com> wrote: >> 5. The server part "lost weight" in the binary, but the frontend >> "gained weight" a little. >> >> I read the old commits, which say that the size of the frontend is very >> important and that speed is not important >> (speed is important on the server). >> I'm not quite sure what to do if this is really the case. Perhaps >> we should leave the slow version for the frontend. > > In the "small" patch, the frontend files got a few kB bigger, but the > backend got quite a bit smaller. If we decided to go with this patch, > I'd say it's preferable to do it in a way that keeps both paths the > same. Okay, then I'll leave the frontend unchanged so that the size remains the same. The changes will only affect the backend. >> How was it tested? >> Four files were created for each normalization form: NFC, NFD, NFKC, >> and NFKD. >> The files were sent via pgbench. The files contain all code points that >> need to be normalized. >> Unfortunately, the patches are already quite large, but if necessary, >> I can send these files in a separate email or upload them somewhere. > > What kind of workload do they present? > Did you consider running the same tests from the thread that lead to > the current implementation? I found performance tests in this discussion https://www.postgresql.org/message-id/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com Below are performance test results. * Ubuntu 24.04.1 (Intel(R) Xeon(R) Gold 6140) (gcc version 13.3.0) 1. Normalize, decomp only select count(normalize(t, NFD)) from ( select md5(i::text) as t from generate_series(1,100000) as i ) s; Patch (big table): 279,858 ms Patch (small table): 282,925 ms Without: 444,118 ms 2. select count(normalize(t, NFD)) from ( select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from generate_series(1,100000) as i ) s; Patch (big table): 219,858 ms Patch (small table): 247,893 ms Without: 376,906 ms 3. Normalize, decomp+recomp select count(normalize(t, NFC)) from ( select md5(i::text) as t from generate_series(1,1000) as i ) s; Patch (big table): 7,553 ms Patch (small table): 7,876 ms Without: 13,177 ms 4. select count(normalize(t, NFC)) from ( select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from generate_series(1,1000) as i ) s; Patch (big table): 5,765 ms Patch (small table): 6,782 ms Without: 10,800 ms 5. Quick check has not changed because these patches do not affect it: -- all chars are quickcheck YES select count(*) from ( select md5(i::text) as t from generate_series(1,100000) as i ) s; Patch (big table): 29,477 ms Patch (small table): 29,436 ms Without: 29,378 ms From these tests, we see 2x in some tests. -- Best regards, Alexander Borisov
pgsql-hackers by date: