Hi,
On 2018-02-03 01:13:21 -0800, Andres Freund wrote:
> On 2018-02-02 18:21:12 -0800, Jeff Davis wrote:
> > I think I saw about a 2% gain here over master, but when I applied it
> > on top of the fast scans it did not seem to add anything on top of
> > fast scans. Seems reproducible, but I don't have an explanation.
>
> Yea, that makes sense. The primary reason the patch is beneficial is
> that it centralizes the place where the HeapTupleHeader is accessed to a
> single piece of code (slot_deform_tuple()). In a lot of cases that first
> access will result in a cache miss in all layers, requiring a memory
> access. In slot_getsomeattrs() there's very little that can be done in
> an out-of-order manner, whereas slot_deform_tuple() can continue
> execution a bit further. Also, the latter will then go and sequentially
> access the rest (or a significant part of) the tuple, so a centralized
> access is more prefetchable.
Oops missed part of the argument here: The reason that isn't that large
an effect anymore with the scan order patch applied is that suddenly the
accesses are, due to the better scan order, more likely to be cacheable
and prefetchable. So in that case the few additional instructions and
branches in slot_getsomeattrs/slot_getattr don't hurt as much
anymore. IIRC I could still show it up, but it's a much smaller win.
Greetings,
Andres Freund