Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date | |
Msg-id | 53B5A28A.1080803@fuzzy.cz Whole thread Raw |
In response to | Re: bad estimation together with large work_mem generates terrible slow hash joins (Atri Sharma <atri.jiit@gmail.com>) |
List | pgsql-hackers |
On 3.7.2014 15:42, Atri Sharma wrote: > > > > On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra <tv@fuzzy.cz > <mailto:tv@fuzzy.cz>> wrote: > > On 30.6.2014 23:12, Tomas Vondra wrote: > > Hi, > > > > attached is v5 of the patch. The main change is that scaling the > number > > of buckets is done only once, after the initial hash table is > build. The > > main advantage of this is lower price. This also allowed some > cleanup of > > unecessary code. > > > > However, this new patch causes warning like this: > > > > WARNING: temporary file leak: File 231 still referenced > > > > I've been eyeballing the code for a while now, but I still fail to see > > where this comes from :-( Any ideas? > > Meh, the patches are wrong - I haven't realized the tight coupling > between buckets/batches in ExecHashGetBucketAndBatch: > > *bucketno = hashvalue & (nbuckets - 1); > *batchno = (hashvalue >> hashtable->log2_nbuckets) & (nbatch - 1); > > The previous patches worked mostly by pure luck (the nbuckets was set > only in the first batch), but once I moved the code at the end, it > started failing. And by "worked" I mean "didn't throw an error, but > probably returned bogus results with (nbatch>1)". > > However, ISTM this nbuckets-nbatch coupling is not really necessary, > it's only constructed this way to assign independent batch/bucket. So I > went and changed the code like this: > > *bucketno = hashvalue & (nbuckets - 1); > *batchno = (hashvalue >> (32 - hashtable->log2_nbatch)); > > I.e. using the top bits for batchno, low bits for bucketno (as before). > > Hopefully I got it right this time. At least it seems to be working for > cases that failed before (no file leaks, proper rowcounts so far). > > > Hi, > > I had a look at the patch and I was wondering if there is a way we can > set a cap on the resized size, and instead increase the number of > batches instead? Since this patch evaluates the growth of tuples vs > increase of space, we could look at increasing the number of batches > itself instead of number of buckets, if the resized number of buckets > exceeds a threshold. It shouldnt be too hard, AIUI it would involve a > call in MultiExecHash right after the new cost evaluation the patch adds. So you propose to fill the hash table, and when hitting NTUP_PER_BUCKET (i.e. after adding 'nbuckets * NTUP_PER_BUCKET' tuples) increase the number of batches? Hmmm, that'd be easy to implement. I don't think it's possible to do that in MultiExecHash (that's too late). But it's a matter of changing this condition in ExecHashTableInsert: if (hashtable->spaceUsed > hashtable->spaceAllowed) ExecHashIncreaseNumBatches(hashtable); to something like this if ((hashtable->spaceUsed > hashtable->spaceAllowed) || (hashtable->totalTuples / hashtable->nbatch == NTUP_PER_BUCKET * hashtable->nbuckets)) ExecHashIncreaseNumBatches(hashtable); But I don't think increasing number of batches is a good approach, as it means more rescans. I think using memory is more efficient (after all, that's what work_mem is for, right?). regards Tomas
pgsql-hackers by date: