On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 30.6.2014 23:12, Tomas Vondra wrote: > Hi, > > attached is v5 of the patch. The main change is that scaling the number > of buckets is done only once, after the initial hash table is build. The > main advantage of this is lower price. This also allowed some cleanup of > unecessary code. > > However, this new patch causes warning like this: > > WARNING: temporary file leak: File 231 still referenced > > I've been eyeballing the code for a while now, but I still fail to see > where this comes from :-( Any ideas?
Meh, the patches are wrong - I haven't realized the tight coupling between buckets/batches in ExecHashGetBucketAndBatch:
The previous patches worked mostly by pure luck (the nbuckets was set only in the first batch), but once I moved the code at the end, it started failing. And by "worked" I mean "didn't throw an error, but probably returned bogus results with (nbatch>1)".
However, ISTM this nbuckets-nbatch coupling is not really necessary, it's only constructed this way to assign independent batch/bucket. So I went and changed the code like this:
I.e. using the top bits for batchno, low bits for bucketno (as before).
Hopefully I got it right this time. At least it seems to be working for cases that failed before (no file leaks, proper rowcounts so far).
Hi,
I had a look at the patch and I was wondering if there is a way we can set a cap on the resized size, and instead increase the number of batches instead? Since this patch evaluates the growth of tuples vs increase of space, we could look at increasing the number of batches itself instead of number of buckets, if the resized number of buckets exceeds a threshold. It shouldnt be too hard, AIUI it would involve a call in MultiExecHash right after the new cost evaluation the patch adds.
It isnt a target of the current patch, but I think that it is a logical extension to the same.