Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: DBT-3 with SF=20 got failed
Date
Msg-id 55D78207.1080104@2ndquadrant.com
Whole thread Raw
In response to Re: DBT-3 with SF=20 got failed  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: DBT-3 with SF=20 got failed  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
Hello KaiGai-san,

On 08/21/2015 02:28 AM, Kouhei Kaigai wrote:
...
>>
>> But what is the impact on queries that actually need more than 1GB
>> of buckets? I assume we'd only limit the initial allocation and
>> still allow the resize based on the actual data (i.e. the 9.5
>> improvement), so the queries would start with 1GB and then resize
>> once finding out the optimal size (as done in 9.5). The resize is
>> not very expensive, but it's not free either, and with so many
>> tuples (requiring more than 1GB of buckets, i.e. ~130M tuples) it's
>> probably just a noise in the total query runtime. But I'd be nice
>> to see some proofs of that ...
>>
> The problem here is we cannot know exact size unless Hash node
> doesn't read entire inner relation. All we can do is relying
> planner's estimation, however, it often computes a crazy number of
> rows. I think resizing of hash buckets is a reasonable compromise.

I understand the estimation problem. The question I think we need to 
answer is how to balance the behavior for well- and poorly-estimated 
cases. It'd be unfortunate if we lower the memory consumption in the 
over-estimated case while significantly slowing down the well-estimated 
ones.

I don't think we have a clear answer at this point - maybe it's not a 
problem at all and it'll be a win no matter what threshold we choose. 
But it's a separate problem from the bugfix.

>> I believe the patch proposed by KaiGai-san is the right one to fix
>> the bug discussed in this thread. My understanding is KaiGai-san
>> withdrew the patch as he wants to extend it to address the
>> over-estimation issue.
>>
>> I don't think we should do that - IMHO that's an unrelated
>> improvement and should be addressed in a separate patch.
>>
> OK, it might not be a problem we should conclude within a few days,
> just before the beta release.

I don't quite see a reason to wait for the over-estimation patch. We 
probably should backpatch the bugfix anyway (although it's much less 
likely to run into that before 9.5), and we can't really backpatch the 
behavior change there (as there's no hash resize).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: minor typo in trigger.c
Next
From: Stephen Frost
Date:
Subject: Re: Warnings around booleans