Thread: Speed up collation cache
The blog post here (thank you depesz!): https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/ showed an interesting result where the builtin provider is not quite as fast as "C" for queries like: SELECT * FROM a WHERE t = '...'; The reason is that it's calling varstr_cmp() many times, which does a lookup in the collation cache for each call. For sorts, it only does a lookup in the collation cache once, so the effect is not significant. The reason looking up "C" is faster is because there's a special check for C_COLLATION_OID, so it doesn't even need to do the hash lookup. If you create an equivalent collation like: CREATE COLLATION libc_c(PROVIDER = libc, LOCALE = 'C'); it will perform the same as a collation with the builtin provider. Attached is a patch to use simplehash.h instead, which speeds things up enough to make them fairly close (from around 15% slower to around 8%). The patch is based on the series here: https://postgr.es/m/f1935bc481438c9d86c2e0ac537b1c110d41a00a.camel@j-davis.com which does some refactoring in a related area, but I can make them independent. We can also consider what to do about those special cases: * add a special case for PG_C_UTF8? * instead of a hardwired set of special collation IDs, have a single- element "last collation ID" to check before doing the hash lookup? * remove the special cases entirely if we can close the performance gap enough that it's not important? (Note: the special case in lc_ctpye_is_c() is currently required for correctness because hba.c uses C_COLLATION_OID for regexes before the syscache is initialized. That can be fixed pretty easily a couple different ways, though.) -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
On 15.06.24 01:46, Jeff Davis wrote: > * instead of a hardwired set of special collation IDs, have a single- > element "last collation ID" to check before doing the hash lookup? I'd imagine that method could be very effective.
On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis <pgsql@j-davis.com> wrote: > Attached is a patch to use simplehash.h instead, which speeds things up > enough to make them fairly close (from around 15% slower to around 8%). +#define SH_HASH_KEY(tb, key) hash_uint32((uint32) key) For a static inline hash for speed reasons, we can use murmurhash32 here, which is also inline.
On Thu, 2024-06-20 at 17:07 +0700, John Naylor wrote: > On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis <pgsql@j-davis.com> wrote: > > Attached is a patch to use simplehash.h instead, which speeds > > things up > > enough to make them fairly close (from around 15% slower to around > > 8%). > > +#define SH_HASH_KEY(tb, key) hash_uint32((uint32) key) > > For a static inline hash for speed reasons, we can use murmurhash32 > here, which is also inline. Thank you, that brings it down a few more percentage points. New patches attached, still based on the setlocale-removal patch series. Setup: create collation libc_c (provider=libc, locale='C'); create table collation_cache_test(t text); insert into collation_cache_test select g::text||' '||g::text from generate_series(1,200000000) g; Queries: select * from collation_cache_test where t < '0' collate "C"; select * from collation_cache_test where t < '0' collate libc_c; The two collations are identical except that the former benefits from the optimization for C_COLLATION_OID, and the latter does not, so these queries measure the overhead of the collation cache lookup. Results (in ms): "C" "libc_c" overhead master: 6350 7855 24% v4-0001: 6091 6324 4% (Note: I don't have an explanation for the difference in performance of the "C" locale -- probably just some noise in the test.) Considering that simplehash brings the worst case overhead under 5%, I don't see a big reason to use the single-element cache also. Regards, Jeff Davis
Attachment
On 7/26/24 11:00 PM, Jeff Davis wrote: > Results (in ms): > > "C" "libc_c" overhead > master: 6350 7855 24% > v4-0001: 6091 6324 4% I got more overhead in my quick benchmarking when I ran the same benchmark. Also tried your idea with caching the last lookup (PoC patch attached) and it basically removed all overhead, but I guess it will not help if you have two different non.default locales in the same query. "C" "libc_c" overhead before: 6695 8376 25% after: 6605 7340 11% cache last: 6618 6677 1% But even without that extra optimization I think this patch is worth merging and the patch is small, simple and clean and easy to understand and a just a clear speed up. Feels like a no brainer. I think that it is ready for committer. And then we can discuss after committing if an additional cache of the last locale is worth it or not. Andreas
Attachment
On Sun, 2024-07-28 at 00:14 +0200, Andreas Karlsson wrote: > But even without that extra optimization I think this patch is worth > merging and the patch is small, simple and clean and easy to > understand > and a just a clear speed up. Feels like a no brainer. I think that it > is > ready for committer. Committed, thank you. > And then we can discuss after committing if an additional cache of > the > last locale is worth it or not. Yeah, I'm holding off on that until refactoring in the area settles, and we'll see if it's still worth it. Regards, Jeff Davis