On Wed, Nov 2, 2022 at 12:09 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> By theory, Why does the preferch make thing better? I am asking this
> because I think we need to read the data from buffer to cache line once
> in either case (I'm obvious wrong in face of the test result.)
CPUs have several different kinds of 'hardware prefetchers' (worth
reading about), that look out for sequential and striding patterns and
try to get the cache line ready before you access it. Using the
prefetch instructions explicitly is called 'software prefetching'
(special instructions inserted by programmers or compilers). The
theory here would have to be that the hardware prefetchers couldn't
pick up the pattern, but we know how to do it. The exact details of
the hardware prefetchers vary between chips, and there are even some
parameters you can adjust in BIOS settings. One idea is that the
hardware prefetchers are generally biased towards increasing
addresses, but our tuples tend to go backwards on the page[1]. It's
possible that some other CPUs can detect backwards strides better, but
since real world tuples aren't of equal size anyway, there isn't
really a fixed stride at all, so software prefetching seems quite
promising for this...
[1] https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-PAGE-LAYOUT-FIGURE