Re: Hash Join cost estimates - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Hash Join cost estimates
Date
Msg-id 20130404192547.GK4361@tamriel.snowman.net
Whole thread Raw
In response to Re: Hash Join cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hash Join cost estimates
Re: Hash Join cost estimates
List pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> perf or oprofile reveal anything?

Here's what we get from oprofile (perhaps not too surprising):

Hash the small table / scan the big table:
samples  cum. samples  %        cum. %     linenr info                 image name               symbol name
167374   167374        47.9491  47.9491    nodeHash.c:915              postgres                 ExecScanHashBucket
85041    252415        24.3624  72.3115    mcount.c:60                 libc-2.15.so             __mcount_internal
28370    280785         8.1274  80.4389    _mcount.S:33                libc-2.15.so             mcount
15856    296641         4.5424  84.9814    (no location information)   [vdso] (tgid:30643
range:0x7fffe6fff000-0x7fffe6ffffff)[vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff)
 
6291     302932         1.8022  86.7836    xact.c:682                  postgres
TransactionIdIsCurrentTransactionId
4555     307487         1.3049  88.0885    instrument.c:70             postgres                 InstrStopNode
3849     311336         1.1027  89.1912    heapam.c:711                postgres                 heapgettup_pagemode
3567     314903         1.0219  90.2130    nodeHashjoin.c:63           postgres                 ExecHashJoin

Hash the big table / scan the small table:
samples  cum. samples  %        cum. %     linenr info                 image name               symbol name
112060   112060        39.2123  39.2123    mcount.c:60                 libc-2.15.so             __mcount_internal
36547    148607        12.7886  52.0009    nodeHash.c:709              postgres                 ExecHashTableInsert
33570    182177        11.7469  63.7477    _mcount.S:33                libc-2.15.so             mcount
16383    198560         5.7328  69.4805    (no location information)   [vdso] (tgid:30643
range:0x7fffe6fff000-0x7fffe6ffffff)[vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff)
 
13200    211760         4.6190  74.0995    (no location information)   no-vmlinux               /no-vmlinux
6345     218105         2.2203  76.3197    xact.c:682                  postgres
TransactionIdIsCurrentTransactionId
5250     223355         1.8371  78.1568    nodeHash.c:915              postgres                 ExecScanHashBucket
4797     228152         1.6786  79.8354    heapam.c:711                postgres                 heapgettup_pagemode
4661     232813         1.6310  81.4664    aset.c:563                  postgres                 AllocSetAlloc
4588     237401         1.6054  83.0718    instrument.c:70             postgres                 InstrStopNode
3550     240951         1.2422  84.3140    memcpy-ssse3-back.S:60      libc-2.15.so             __memcpy_ssse3_back
3013     243964         1.0543  85.3684    aset.c:1109                 postgres                 AllocSetCheck

Looking at the 'Hash the small table / scan the big table', opannotate
claims that this is bar far the worst offender:

147588 42.2808 :        hashTuple = hashTuple->next;

While most of the time in the 'Hash the big table / scan the small
table' is in:
34572 12.0975 :        hashTuple->next = hashtable->buckets[bucketno];

Neither of those strike me as terribly informative though.  To be
honest, I've not really played w/ oprofile all that much.  Now that I've
got things set up to support this, I'd be happy to provide more info if
anyone has suggestions on how to get something more useful.

It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
entries to consider instead of more.  Along those same lines, I really
wonder if we're being too generous wrt the bucket-depth goal of '10'
instead of, say, '1', especially when we've got plenty of work_mem
available.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Multi-pass planner
Next
From: Stephen Frost
Date:
Subject: Re: Hash Join cost estimates