On Mon, Aug 07, 2023 at 12:51:24PM +0200, Tomas Vondra wrote:
> The bad news is this seems to have negative impact on cases with few
> partitions, that'd fit into 16 slots. Which is not surprising, as the
> code has to walk longer arrays, it probably affects caching etc. So this
> would hurt the systems that don't use that many relations - not much,
> but still.
> 
> The regression appears to be consistently ~3%, and v2 aimed to improve
> that - at least for the case with just 100 rows. It even gains ~5% in a
> couple cases. It's however a bit strange v2 doesn't really help the two
> larger cases.
> 
> Overall, I think this seems interesting - it's hard to not like doubling
> the throughput in some cases. Yes, it's 100 rows only, and the real
> improvements are bound to be smaller, it would help short OLTP queries
> that only process a couple rows.
Indeed.  I wonder whether we could mitigate the regressions by using SIMD
intrinsics in the loops.  Or auto-vectorization, if that is possible.
-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com