Home > mailing lists

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From	Atri Sharma
Subject	Re: bad estimation together with large work_mem generates terrible slow hash joins
Date	July 3, 2014 16:42:27
Msg-id	CAOeZVicjwJfjE4DoXMCkyVx86qz51cpMzwtKti94YLdDNermug@mail.gmail.com Whole thread Raw
In response to	Re: bad estimation together with large work_mem generates terrible slow hash joins (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: bad estimation together with large work_mem generates terrible slow hash joins
List	pgsql-hackers

Tree view

On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 30.6.2014 23:12, Tomas Vondra wrote:
> Hi,
>
> attached is v5 of the patch. The main change is that scaling the number
> of buckets is done only once, after the initial hash table is build. The
> main advantage of this is lower price. This also allowed some cleanup of
> unecessary code.
>
> However, this new patch causes warning like this:
>
> WARNING: temporary file leak: File 231 still referenced
>
> I've been eyeballing the code for a while now, but I still fail to see
> where this comes from :-( Any ideas?

Meh, the patches are wrong - I haven't realized the tight coupling
between buckets/batches in ExecHashGetBucketAndBatch:

*bucketno = hashvalue & (nbuckets - 1);
*batchno = (hashvalue >> hashtable->log2_nbuckets) & (nbatch - 1);

The previous patches worked mostly by pure luck (the nbuckets was set
only in the first batch), but once I moved the code at the end, it
started failing. And by "worked" I mean "didn't throw an error, but
probably returned bogus results with (nbatch>1)".

However, ISTM this nbuckets-nbatch coupling is not really necessary,
it's only constructed this way to assign independent batch/bucket. So I
went and changed the code like this:

*bucketno = hashvalue & (nbuckets - 1);
*batchno = (hashvalue >> (32 - hashtable->log2_nbatch));

I.e. using the top bits for batchno, low bits for bucketno (as before).

Hopefully I got it right this time. At least it seems to be working for
cases that failed before (no file leaks, proper rowcounts so far).

Hi,

I had a look at the patch and I was wondering if there is a way we can set a cap on the resized size, and instead increase the number of batches instead? Since this patch evaluates the growth of tuples vs increase of space, we could look at increasing the number of batches itself instead of number of buckets, if the resized number of buckets exceeds a threshold. It shouldnt be too hard, AIUI it would involve a call in MultiExecHash right after the new cost evaluation the patch adds.

It isnt a target of the current patch, but I think that it is a logical extension to the same.

Thoughts/Comments?

Regards,

Atri

Regards,

Atri

l'apprenant

pgsql-hackers by date:

From: Fujii Masao
Date: 03 July 2014, 16:14:12
Subject: Re: docs: additional subsection for page-level locks in explicit-locking section

From: Shigeru Hanada
Date: 03 July 2014, 17:00:34
Subject: Re: [v9.5] Custom Plan API

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

Previous

Next