Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From Tom Lane
Subject Re: DBT-3 with SF=20 got failed
Date
Msg-id 20933.1443043553@sss.pgh.pa.us
Whole thread Raw
In response to Re: DBT-3 with SF=20 got failed  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: DBT-3 with SF=20 got failed  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 09/11/2015 07:16 PM, Robert Haas wrote:
>> On Fri, Sep 11, 2015 at 1:12 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> I'm arguing for fixing the existing bug, and then addressing the case of
>>> over-estimation separately, with proper analysis.

>> Well, this is part of how we're looking it differently.  I think the
>> bug is "we're passing a value to palloc that is too large, so
>> sometimes it fails" and the way to fix that is to properly limit the
>> value.  You are clearly defining the bug a bit differently.

> Yes, I see it differently.

> I don't quite understand why limiting the value is more "proper" than 
> using a function that can handle the actual value.

> The proposed bugfix addresses the issue in the most straightforward way, 
> without introducing additional considerations about possible 
> over-estimations (which the current code completely ignores, so this is 
> a new thing). I think bugfixes should not introduce such changes to 
> behavior (albeit internal), especially not without any numbers.

This thread seems to have stalled out...

After re-reading it, I'm going to agree with Robert that we should clamp
the initial pointer-array size to work with palloc (ie, 512MB worth of
pointers, which please recall is going to represent at least 10X that much
in hashtable entries, probably more).  The argument that allowing it to be
larger would be a performance win seems entirely unbased on any evidence,
while the risks of allowing arbitrarily large allocations based on planner
estimates seem pretty obvious to me.  And the argument that such
overestimates are a bug that we can easily fix is based on even less
evidence; in fact, I would dismiss that out of hand as hubris.

Now there is a separate issue about whether we should allow hashtable
resizes to exceed that limit.  There I would vote yes, as long as the
resize is based on arrival of actual data and not estimates (following
Robert's point that the existing uses of repalloc_huge are driven by
actual need).

So I don't like any of the proposed patches exactly as-is.

BTW, just looking at the code in question, it seems to be desperately
in need of editorial review.  A couple of examples:

* Some of the code goes to the trouble of maintaining a
log2_nbuckets_optimal value, but ExecHashIncreaseNumBuckets doesn't
know about that and recomputes the value.

* ExecHashIncreaseNumBuckets starts out with a run-time test on something
that its sole caller has just Assert()'d to not be the case, and which
really ought to be impossible with or without that Assert.

* ExecHashTableInsert believes it can only increase nbuckets_optimal
if we're still in the first batch, but MultiExecHash() will call
ExecHashIncreaseNumBuckets at the end of input-acquisition whether we've
created more than one batch or not.  Meanwhile, ExecHashIncreaseNumBatches
thinks it can change the number of buckets in any case.  Maybe that's all
okay, but at the very least those tests ought to look like they'd heard of
each other.  And of those three places, having the safety check where it
is seems like the least reasonable place.  Tracking an "optimal" number
of buckets seems like an activity that should not be constrained by
whether we have any hope of being able to apply the number.

So I'm not having a lot of faith that there aren't other bugs in the
immediate area.
        regards, tom lane



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: Robert Haas
Date:
Subject: Re: clearing opfuncid vs. parallel query