Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: accounting for memory used for BufFile during hash joins
Date
Msg-id 20190910134751.x64idfqj6qgt37om@development
Whole thread Raw
In response to Re: accounting for memory used for BufFile during hash joins  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: accounting for memory used for BufFile during hash joins
List pgsql-hackers
On Thu, Sep 05, 2019 at 09:54:33AM -0700, Melanie Plageman wrote:
>On Tue, Sep 3, 2019 at 9:36 AM Alvaro Herrera <alvherre@2ndquadrant.com>
>wrote:
>
>> On 2019-Jul-11, Tomas Vondra wrote:
>>
>> > On Wed, Jul 10, 2019 at 04:51:02PM -0700, Melanie Plageman wrote:
>>
>> > > I think implementing support for parallel hashjoin or explicitly
>> > > disabling it would be the bare minimum for this patch, which is why I
>> > > made 2 its own item. I've marked it as returned to author for this
>> > > reason.
>> >
>> > OK. I'm a bit confused / unsure what exactly our solution to the various
>> > hashjoin issues is. I have not been paying attention to all the various
>> > threads, but I thought we kinda pivoted to the BNL approach, no? I'm not
>> > against pushing this patch (the slicing one) forward and then maybe add
>> > BNL on top.
>>
>> So what's a good way forward for this patch?  Stalling forever like a
>> glacier is not an option; it'll probably end up melting.  There's a lot
>> of discussion on this thread which I haven't read, and it's not
>> immediately clear to me whether this patch should just be thrown away in
>> favor of something completely different, or it can be considered a first
>> step in a long road.
>>
>
>So, I have been working on the fallback to block nested loop join
>patch--latest non-parallel version posted here [1]. I am currently
>still working on the parallel version but don't have a complete
>working patch yet. I am hoping to finish it and solicit feedback in
>the next couple weeks.
>
>My patch chunks up a bad inner side batch and processes it a chunk
>at a time. I haven't spent too much time yet thinking about Hubert's
>suggestion proposed upthread. In the past I had asked Tomas about the
>idea of splitting up only "bad batches" to avoid having other batches
>which are very small. It seemed like this introduced additional
>complexity for future spilled tuples finding a home, however, I had
>not considered the hash function chain method Hubert is mentioning.
>
>Even if we implemented additional strategies like the one Hubert is
>suggesting, I still think that both the slicing patch originally
>proposed in this thread as well as a BNLJ fallback option could all
>work together, as I believe they solve slightly different problems.
>

I have to admit I kinda lost track of how exactly all the HJ patches
posted in various -hackers threads shall work together in the end. We have
far too many in-flight patches dealing with this part of the code at the
moment. It's a bit like with the buses - for years there were no patches
fixing those issues, and now we have 17 ;-)

My feeling is that we should get the BNLJ committed first, and then maybe
use some of those additional strategies as fallbacks (depending on which
issues are still unsolved by the BNLJ).

>If Tomas or someone else has time to pick up and modify BufFile
>accounting patch, committing that still seems like the nest logical
>step.
>

OK, I'll look into that (i.e. considering BufFile memory during planning,
and disabling HJ if not possible).

>I will work on getting a complete (parallel-aware) BNLJ patch posted
>soon.
>

Good!


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Binguo Bao
Date:
Subject: Re: [proposal] de-TOAST'ing using a iterator
Next
From: Tom Lane
Date:
Subject: Re: Pulling up direct-correlated ANY_SUBLINK