introduce optimized linear search functions that return index of matching element - Mailing list pgsql-hackers
From | Nathan Bossart |
---|---|
Subject | introduce optimized linear search functions that return index of matching element |
Date | |
Msg-id | 20220917052903.GA3172400@nathanxps13 Whole thread Raw |
Responses |
Re: introduce optimized linear search functions that return index of matching element
|
List | pgsql-hackers |
On Fri, Sep 16, 2022 at 02:54:14PM +0700, John Naylor wrote: > v6 demonstrates why this should have been put off towards the end. (more below) Since the SIMD code is fresh in my mind, I wanted to offer my review for 0001 in the "Improve dead tuple storage for lazy vacuum" thread [0]. However, I agree with John that the SIMD part of that work should be left for the end, and I didn't want to distract from the radix tree part too much. So, here is a new thread for just the SIMD part. >> I've updated the radix tree patch. It's now separated into two patches. >> >> 0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find >> better names) that are similar to the pg_lfind8() family but they >> return the index of the key in the vector instead of true/false. The >> patch includes regression tests. I don't think it's clear that the "lfind" functions return whether there is a match while the "lsearch" functions return the index of the first match. It might be better to call these something like "pg_lfind8_idx" and "pg_lfind8_ge_idx" instead. > +/* > + * Return the index of the first element in the vector that is greater than > + * or eual to the given scalar. Return sizeof(Vector8) if there is no such > + * element. > > That's a bizarre API to indicate non-existence. +1. It should probably just return -1 in that case. > + * > + * Note that this function assumes the elements in the vector are sorted. > + */ > > That is *completely* unacceptable for a general-purpose function. +1 > +#else /* USE_NO_SIMD */ > + Vector8 r = 0; > + uint8 *rp = (uint8 *) &r; > + > + for (Size i = 0; i < sizeof(Vector8); i++) > + rp[i] = (((const uint8 *) &v1)[i] == ((const uint8 *) &v2)[i]) ? 0xFF : 0; > > I don't think we should try to force the non-simd case to adopt the > special semantics of vector comparisons. It's much easier to just use > the same logic as the assert builds. +1 > +#ifdef USE_SSE2 > + return (uint32) _mm_movemask_epi8(v); > +#elif defined(USE_NEON) > + static const uint8 mask[16] = { > + 1 << 0, 1 << 1, 1 << 2, 1 << 3, > + 1 << 4, 1 << 5, 1 << 6, 1 << 7, > + 1 << 0, 1 << 1, 1 << 2, 1 << 3, > + 1 << 4, 1 << 5, 1 << 6, 1 << 7, > + }; > + > + uint8x16_t masked = vandq_u8(vld1q_u8(mask), (uint8x16_t) > vshrq_n_s8(v, 7)); > + uint8x16_t maskedhi = vextq_u8(masked, masked, 8); > + > + return (uint32) vaddvq_u16((uint16x8_t) vzip1q_u8(masked, maskedhi)); > > For Arm, we need to be careful here. This article goes into a lot of > detail for this situation: > > https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon The technique demonstrated in this article seems to work nicely. For these kinds of patches, I find the best way to review them is to try out my proposed changes as I'm reading through the patch. I hope you don't mind that I've done so here and attached a new version of the patch. In addition to addressing the aforementioned feedback, I made the following changes: * I renamed the vector8_search_* functions to vector8_find() and vector8_find_ge(). IMO this is more in the spirit of existing function names like vector8_has(). * I simplified vector8_find_ge() by essentially making it do the opposite of what vector8_has_le() does (i.e., using saturating subtraction to find matching bytes). This removes the need for vector8_min(), and since vector8_find_ge() can just call vector8_search() to find any 0 bytes, vector8_highbit_mask() can be removed as well. * I simplified the test for pg_lfind8_ge_idx() by making it look a little more like the test for pg_lfind32(). I wasn't sure about the use of rand() and qsort(), and overall it just felt a little too complicated to me. I've tested all three code paths (i.e., SSE2, Neon, and USE_NO_SIMD), but I haven't done any performance analysis yet. [0] https://postgr.es/m/CAD21AoD3w76wERs_Lq7_uA6%2BgTaoOERPji%2BYz8Ac6aui4JwvTg%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: