"Simon Riggs" <simon@2ndquadrant.com> wrote
>
> You may be trying to use the memory too early. Prefetched memory takes
> time to arrive in cache, so you may need to issue prefetch calls for N
> +2, N+3 etc rather than simply N+1.
>
> p.6-11 covers this.
>
I actually tried it and no improvements have been observed. Also, this may
conflict with "try to mix prefetch with computation" suggestion from the
manual that you pointed out. But anyway, this looks like fixable compared to
the following "prefetch distance" problem. As I read from the manual, this
is one key factor of the efficiency, which also matches our intuition.
However, when we process each tuple on a page, CPU clocks that are needed
might be quite different:
---for (each tuple on a page){ if (ItemIdIsUsed(lpp)) /* some stopped here */ { ... /* some involves deeper
functioncalls here */ valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer); if (valid)
scan->rs_vistuples[ntup++]= lineoff; }}
---
So it is pretty hard to predicate the prefetch distance. The prefetch
improvements to memcpy/memmove does not have this problem, the prefecth
distance can be fixed, and it does not change due to the different speed
CPUs of the same processor serials.
Maybe L2 cache is big enough so no need to worry about fetch too ahead?
Seems not true, since this idea is vulnerable to a busy system. No data in
L2 will be saved for you for a long time.
As Luke suggested, the code above scan operators like sort might be a better
place to look at. I will take a look there.
Regards,
Qingqing