Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: bad estimation together with large work_mem generates terrible slow hash joins
Date
Msg-id 53C2EF5A.4030405@fuzzy.cz
Whole thread Raw
In response to Re: bad estimation together with large work_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: bad estimation together with large work_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On 3.7.2014 19:36, Tomas Vondra wrote:
> On 1.7.2014 01:24, Tomas Vondra wrote:
>> On 30.6.2014 23:12, Tomas Vondra wrote:
>>> Hi,
>>
>> Hopefully I got it right this time. At least it seems to be working
>> for cases that failed before (no file leaks, proper rowcounts so
>> far).
>
> Attached v7, fixing nbatch/ntuples in an assert.
>
> regards
> Tomas

Attached is v8 of this patch, significantly reworked based on the
related discussion (mostly in 'tweaking NTUP_PER_BUCKET').

With the patch applied, the hash table works like this:

initial sizing (ExecChooseHashTableSize)
----------------------------------------
- size nbuckets for NTUP_PER_BUCKET=1
- account for memory allocated for buckets

building the hash table
-----------------------
- while adding tuples, keep track of optimal number of buckets (for the
  current number of tuples per batch)
- don't resize until all the tuples are added (by increasing nbatch the
  number of buckets may decrease)
- after adding all tuples, consider resizing (optimal nbuckets !=
  initial nbuckets)
- do the resize

This means that for well estimated plans (reasonably exact tuple count
for the inner relation), the hash table is properly sized and no resize
is needed.

For badly estimated plans, this works about the same as the previous
patches (we're accounting for memory needed for buckets etc.).


More batches or less buckets?
-----------------------------
In the related threads, I repeatedly mentioned that I don't know a good
answer to "Should I add more batches or decrease the number of buckets?"
while sizing the hash table. The current code (as in HEAD) does not face
this dillema because (a) it does not count buckets into work_mem and (b)
does not allow changing nbuckets.

I still don't have a good answer to this question, but I think the patch
can stand even without it. If you want more/less batches, use
smaller/larger work_mem - same as with the current code. Also, with the
'dense allocation' patch [1], it's possible to use larger work_mem
values (and thus compensate for counting buckets into work_mem).


Changes since v7
----------------

  (a) remove NTUP_GROW_THRESHOLD (just use NTUP_PER_BUCKET directly)
  (b) set NTUP_PER_BUCKET=1
  (c) include buckets (optimal) when considering nbatch increase
  (d) track optimal number of buckets (for NTUP_PER_BUCKET)
  (e) after loading all the tuples, resize the hash table to get
      nbuckets_optimal (if needed)
  (f) renamed GUC to enable_hash_resize (makes a bit more sense)


Comments
--------

  (a) NTUP_GROW_THRESHOLD was overcomplicated and mostly a result of
      misunderstanding how NTUP_PER_BUCKET works (it's upper threshold,
      not average) and is not really needed.

  (b) The value "1" gives the best performance, but also requires more
      memory for the buckets. This should be more than compensated by
      densely allocating the tuples (see hashjoin-alloc-v3.patch
      in the "tweaking NTUP_PER_BUCKET" thread [1]).

  (c,d) Originally, the memory needed for buckets was not included in
        'spaceUsed', but because the NTUP_PER_BUCKET=1 change this is
        not really acceptable (needs much more memory than before).
        So now it's included both in the initial sizing of the hash
        table, and when adding more tuples (nbuckets_optimal).

        Still, spaceUsed does not include palloc overhead (which may be
        a significant amount of memory). This is tackled by [1].

  (e) No change here.

  (f) A bit better GUC name. Anyway, this is for easier development
      only, and won't be included in the final patch.


regards
Tomas

[1] http://www.postgresql.org/message-id/53C2DECC.2010408@fuzzy.cz

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: SSL information view
Next
From: Andres Freund
Date:
Subject: Re: better atomics - v0.5