Re: optimize lookups in snapshot [sub]xip arrays - Mailing list pgsql-hackers

From Andres Freund
Subject Re: optimize lookups in snapshot [sub]xip arrays
Date
Msg-id 20220726181906.7l4ulzjdblnanclm@alap3.anarazel.de
Whole thread Raw
In response to Re: optimize lookups in snapshot [sub]xip arrays  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: optimize lookups in snapshot [sub]xip arrays
List pgsql-hackers
On 2022-07-25 12:04:19 -0700, Nathan Bossart wrote:
> On Sat, Jul 16, 2022 at 08:59:57PM -0700, Nathan Bossart wrote:
> > On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote:
> >> I wonder if we additionally / alternatively could use a faster method of
> >> searching the array linearly, e.g. using SIMD.
> > 
> > I looked into using SIMD.  The patch is attached, but it is only intended
> > for benchmarking purposes and isn't anywhere close to being worth serious
> > review.  There may be a simpler/better way to implement the linear search,
> > but this seemed to work well.  Overall, SIMD provided a decent improvement.
> > I had to increase the number of writers quite a bit in order to demonstrate
> > where the hash tables began winning.  Here are the numbers:
> > 
> >     writers head simd hash
> >     256     663  632  694
> >     512     530  618  637
> >     768     489  544  573
> >     1024    364  508  562
> >     2048    185  306  485
> >     4096    146  197  441
> > 
> > While it is unsurprising that the hash tables perform the best, there are a
> > couple of advantages to SIMD that might make that approach worth
> > considering.  For one, there's really no overhead (i.e., you don't need to
> > sort the array or build a hash table), so we can avoid picking an arbitrary
> > threshold and just have one code path.  Also, a SIMD implementation for a
> > linear search through an array of integers could likely be easily reused
> > elsewhere.
> 
> From the discussion thus far, it seems there is interest in optimizing
> [sub]xip lookups, so I'd like to spend some time moving it forward.  I
> think the biggest open question is which approach to take.  Both the SIMD
> and hash table approaches seem viable, but I think I prefer the SIMD
> approach at the moment (see the last paragraph of quoted text for the
> reasons).  What do folks think?

Agreed on all points.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Unstable tests for recovery conflict handling
Next
From: Jacob Champion
Date:
Subject: Re: Transparent column encryption