Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: DBT-3 with SF=20 got failed
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801136A5F@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: DBT-3 with SF=20 got failed  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
> Hello KaiGai-san,
> 
> On 08/21/2015 02:28 AM, Kouhei Kaigai wrote:
> ...
> >>
> >> But what is the impact on queries that actually need more than 1GB
> >> of buckets? I assume we'd only limit the initial allocation and
> >> still allow the resize based on the actual data (i.e. the 9.5
> >> improvement), so the queries would start with 1GB and then resize
> >> once finding out the optimal size (as done in 9.5). The resize is
> >> not very expensive, but it's not free either, and with so many
> >> tuples (requiring more than 1GB of buckets, i.e. ~130M tuples) it's
> >> probably just a noise in the total query runtime. But I'd be nice
> >> to see some proofs of that ...
> >>
> > The problem here is we cannot know exact size unless Hash node
> > doesn't read entire inner relation. All we can do is relying
> > planner's estimation, however, it often computes a crazy number of
> > rows. I think resizing of hash buckets is a reasonable compromise.
> 
> I understand the estimation problem. The question I think we need to
> answer is how to balance the behavior for well- and poorly-estimated
> cases. It'd be unfortunate if we lower the memory consumption in the
> over-estimated case while significantly slowing down the well-estimated
> ones.
> 
> I don't think we have a clear answer at this point - maybe it's not a
> problem at all and it'll be a win no matter what threshold we choose.
> But it's a separate problem from the bugfix.
>
I agree with this is a separate (and maybe not easy) problem.

If somebody know previous research in academic area, please share with us. 

> >> I believe the patch proposed by KaiGai-san is the right one to fix
> >> the bug discussed in this thread. My understanding is KaiGai-san
> >> withdrew the patch as he wants to extend it to address the
> >> over-estimation issue.
> >>
> >> I don't think we should do that - IMHO that's an unrelated
> >> improvement and should be addressed in a separate patch.
> >>
> > OK, it might not be a problem we should conclude within a few days,
> > just before the beta release.
> 
> I don't quite see a reason to wait for the over-estimation patch. We
> probably should backpatch the bugfix anyway (although it's much less
> likely to run into that before 9.5), and we can't really backpatch the
> behavior change there (as there's no hash resize).
>
I don't argue this bugfix anymore.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] postgres_fdw extension support
Next
From: Jim Nasby
Date:
Subject: Re: Declarative partitioning