Re: PATCH: postpone building buckets to the end of Hash (in HashJoin) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Date
Msg-id CA+TgmoYr90JXogitzKVR6=LXmivm1OH=AevrF4pgskL53xRnGQ@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>>> If this doesn't regress performance in the case where the number of
>>> buckets is estimated accurately to begin with, then I think this is
>>> a great idea. Can you supply some performance tests results for that
>>> case, and maybe some of the other cases also?
>>
>> I don't see how it could regress performance, and the benchmarks I've
>> done confirm that. I'll do more thorough benchmarking and post the
>> results here, but not now as this patch is in 2016-01 CF and I want to
>> put all my time into reviewing patches from the open commitfest.
>
> I've finally got to do more thorough benchmarks, and surprisingly there
> seems to be a minor regression.
...
>
> So even if we entirely skipped the bucket build, we would not recover the
> difference. Not sure what's going on here.
>
> I've also done some profiling using perf, but I haven't really found
> anything suspicious there.
>
> Any ideas what might be the cause of this?

My guess is that memory locality is not as good with this patch.  Consider this:

copyTuple->next = hashtable->buckets[bucketno];

We've only just populated copytuple via memcpy, so that cache line is
presumably in whatever place cache lines go when they are really hot.
We basically make one pass over the allocated space for the hash
table, filling in the hash tuples and linking things into bucket
chains.  OTOH, with the patch, we don't do that on the first pass,
just writing the tuples.  Then when we come back through we still have
to do that part:

hashTuple->next = hashtable->buckets[bucketno];

By the time we're doing this, especially at larger work_mem settings,
we've probably pushed those cache lines out of the faster levels of
the CPU cache, and we have to pull them back in again, and that takes
time.   If the tuples are small, we'll dirty every cache line twice;
if they're larger, we might not dirty every cache line on the second
pass, but just some fraction of them.

I could be totally off base, but this is why I was concerned about the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: make error - libpqdll.def No such file or directory
Next
From: Corey Huinker
Date:
Subject: Re: Advices on custom data type and extension development