Home > mailing lists

Re: Hash Join cost estimates - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Hash Join cost estimates
Date	April 4, 2013 23:30:06
Msg-id	10502.1365107399@sss.pgh.pa.us Whole thread Raw
In response to	Re: Hash Join cost estimates (Stephen Frost <sfrost@snowman.net>)
Responses	Re: Hash Join cost estimates
List	pgsql-hackers

Tree view

Stephen Frost <sfrost@snowman.net> writes:
> Looking with opannotate, there's two main hotspots in
> ExecScanHashBucket:

>  12846 17.4001 :        hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
> and
>  22100 29.9348 :        hashTuple = hashTuple->next;

Those are, of course, pretty trivial statements; so the way I read this
is that the fundamental slowdown comes from the hash table being large
compared to the CPU's cache, so that you're incurring lots of cache
misses at these particular fetches.  (You might be able to confirm that
if you can set oprofile to count cache misses rather than wall clock
time.)

> I'm certainly curious about those, but I'm also very interested in the
> possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
> depending on the work_mem setting.

Not sure about that.  That would make the hash-bucket-header array
larger without changing the size of the rest of the hash table, thus
probably making the CPU cache situation worse not better (which would
manifest as more time at the first of these two statements relative to
the second).

Can you add some instrumentation code to get us information about the
average/max length of the bucket chains?  And maybe try to figure out
how many distinct hash values per bucket, which would give us a clue
whether your two-level-list idea is worth anything.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 04 April 2013, 23:29:05
Subject: Re: Multi-pass planner

From: Robert Haas
Date: 04 April 2013, 23:31:15
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

Re: Hash Join cost estimates - Mailing list pgsql-hackers

Previous

Next