On 21/04/2023 23:33, Nathan Bossart wrote:
> On Fri, Apr 21, 2023 at 01:50:34PM +0700, John Naylor wrote:
>> On Wed, Mar 8, 2023 at 7:25 AM Nathan Bossart <nathandbossart@gmail.com>
>> wrote:
>>> was mostly a fun weekend project, and I don't presently have any concrete
>>> examples of workloads where this might help.
>>
>> It seems like that should be demonstrated before seriously considering
>> this, like a profile where the relevant list functions show up.
>
> Agreed.
Grepping for "tlist_member" and "list_delete_ptr", I don't see any
callers in hot codepaths where this could make a noticeable difference.
So I've marked this as Returned with Feedback in the commitfest.
> I noticed that several of the List functions do simple linear searches that
> can be optimized with SIMD intrinsics (as was done for XidInMVCCSnapshot in
> 37a6e5d). The following table shows the time spent iterating over a list
> of n elements (via list_member_int) one billion times on my x86 laptop.
>
>
> n | head (ms) | patched (ms)
> ------+-----------+--------------
> 2 | 3884 | 3001
> 4 | 5506 | 4092
> 8 | 6209 | 3026
> 16 | 8797 | 4458
> 32 | 25051 | 7032
> 64 | 37611 | 12763
> 128 | 61886 | 22770
> 256 | 111170 | 59885
> 512 | 209612 | 103378
> 1024 | 407462 | 189484
I'm surprised to see an improvement with n=2 and n=2. AFAICS, the
vectorization only kicks in when n >= 8.
--
Heikki Linnakangas
Neon (https://neon.tech)