Home > mailing lists

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: bad estimation together with large work_mem generates terrible slow hash joins
Date	July 3, 2014 21:36:11
Msg-id	53B5A28A.1080803@fuzzy.cz Whole thread Raw
In response to	Re: bad estimation together with large work_mem generates terrible slow hash joins (Atri Sharma <atri.jiit@gmail.com>)
List	pgsql-hackers

Tree view

On 3.7.2014 15:42, Atri Sharma wrote:
> 
> 
> 
> On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
> 
>     On 30.6.2014 23:12, Tomas Vondra wrote:
>     > Hi,
>     >
>     > attached is v5 of the patch. The main change is that scaling the
>     number
>     > of buckets is done only once, after the initial hash table is
>     build. The
>     > main advantage of this is lower price. This also allowed some
>     cleanup of
>     > unecessary code.
>     >
>     > However, this new patch causes warning like this:
>     >
>     >     WARNING:  temporary file leak: File 231 still referenced
>     >
>     > I've been eyeballing the code for a while now, but I still fail to see
>     > where this comes from :-( Any ideas?
> 
>     Meh, the patches are wrong - I haven't realized the tight coupling
>     between buckets/batches in ExecHashGetBucketAndBatch:
> 
>       *bucketno = hashvalue & (nbuckets - 1);
>       *batchno = (hashvalue >> hashtable->log2_nbuckets) & (nbatch - 1);
> 
>     The previous patches worked mostly by pure luck (the nbuckets was set
>     only in the first batch), but once I moved the code at the end, it
>     started failing. And by "worked" I mean "didn't throw an error, but
>     probably returned bogus results with (nbatch>1)".
> 
>     However, ISTM this nbuckets-nbatch coupling is not really necessary,
>     it's only constructed this way to assign independent batch/bucket. So I
>     went and changed the code like this:
> 
>       *bucketno = hashvalue & (nbuckets - 1);
>       *batchno = (hashvalue >> (32 - hashtable->log2_nbatch));
> 
>     I.e. using the top bits for batchno, low bits for bucketno (as before).
> 
>     Hopefully I got it right this time. At least it seems to be working for
>     cases that failed before (no file leaks, proper rowcounts so far).
> 
> 
> Hi,
> 
> I had a look at the patch and I was wondering if there is a way we can
> set a cap on the resized size, and instead increase the number of
> batches instead? Since this patch evaluates the growth of tuples vs
> increase of space, we could look at increasing the number of batches
> itself instead of number of buckets, if the resized number of buckets
> exceeds a threshold. It shouldnt be too hard, AIUI it would involve a
> call in MultiExecHash right after the new cost evaluation the patch adds.

So you propose to fill the hash table, and when hitting NTUP_PER_BUCKET
(i.e. after adding 'nbuckets * NTUP_PER_BUCKET' tuples) increase the
number of batches?

Hmmm, that'd be easy to implement. I don't think it's possible to do
that in MultiExecHash (that's too late). But it's a matter of changing
this condition in ExecHashTableInsert:
   if (hashtable->spaceUsed > hashtable->spaceAllowed)       ExecHashIncreaseNumBatches(hashtable);

to something like this
   if ((hashtable->spaceUsed > hashtable->spaceAllowed) ||       (hashtable->totalTuples / hashtable->nbatch
    == NTUP_PER_BUCKET * hashtable->nbuckets))       ExecHashIncreaseNumBatches(hashtable);
 

But I don't think increasing number of batches is a good approach, as it
means more rescans. I think using memory is more efficient (after all,
that's what work_mem is for, right?).

regards
Tomas

pgsql-hackers by date:

From: Stephen Frost
Date: 03 July 2014, 21:10:19
Subject: Re: tweaking NTUP_PER_BUCKET

From: Atri Sharma
Date: 03 July 2014, 21:40:40
Subject: Re: tweaking NTUP_PER_BUCKET

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

Previous

Next