On 2015-09-02 12:24:33 -0700, Peter Geoghegan wrote:
> "Read fetch". One argument past to the intrinsic here specifies that
> the variable will be read only. I did things this way because I
> imagined that there would be very limited uses for the macro only. I
> probably cover almost all interesting cases for explicit memory
> prefetching already.
I'd be less brief in this case then, no need to be super short here.
> > I don't think you should specify the read/write and locality parameters
> > if we don't hand-pick them - right now you're saying the memory will
> > only be read and that it has no temporal locality.
> >
> > I think there should be a empty fallback definition even if the current
> > only caller ends up not needing it - not all callers will require it.
>
> It started out that way, but Tom felt that it was better to have a
> USE_MEM_PREFETCH because of the branch below...
That doesn't mean we shouldn't still provide an empty definition.
> > Hm. Why do we need a branch here? The target of prefetches don't need to
> > be valid addresses and adding a bunch of arithmetic and a branch for the
> > corner case doesn't sound worthwhile to me.
>
> ...which is needed, because I'm still required to not dereference wild
> pointers. > In other words, although pg_rfetch()/__builtin_prefetch()
> does not require that a valid pointer be passed, it is not okay to
> read past an array's bounds to read that pointer. The GCC docs are
> clear on this -- "Data prefetch does not generate faults if addr is
> invalid, but the address expression itself must be valid".
That's just a question of how to formulate this though?
pg_rfetch(((char *) state->memtuples ) + 3 * sizeof(SortTuple) + offsetof(SortTuple, tuple))?
For something heavily platform dependent like this that seems ok.
> Because that was the fastest value following testing on my laptop. You
> are absolutely right to point out that this isn't a good reason to go
> with the patch -- I share your concern. All I can say in defense of
> that is that other major system software does the same, without any
> regard for the underlying microarchitecture AFAICT.
I know linux stripped out most prefetches at some point, even from x86
specific code, because it showed that they aged very badly. I.e. they
removed a bunch of them and stuff got faster, whereas they were
beneficial on earlier architectures.
Greetings,
Andres Freund