Home > mailing lists

Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: accounting for memory used for BufFile during hash joins
Date	September 5, 2019 16:54:33
Msg-id	CAAKRu_b6+jC93WP+pWxqK5KAZJC5Rmxm8uquKtEf-KQ++1Li6Q@mail.gmail.com Whole thread
In response to	Re: accounting for memory used for BufFile during hash joins (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: accounting for memory used for BufFile during hash joins
List	pgsql-hackers

Tree view

On Tue, Sep 3, 2019 at 9:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jul-11, Tomas Vondra wrote:

> On Wed, Jul 10, 2019 at 04:51:02PM -0700, Melanie Plageman wrote:

> > I think implementing support for parallel hashjoin or explicitly
> > disabling it would be the bare minimum for this patch, which is why I
> > made 2 its own item. I've marked it as returned to author for this
> > reason.
>
> OK. I'm a bit confused / unsure what exactly our solution to the various
> hashjoin issues is. I have not been paying attention to all the various
> threads, but I thought we kinda pivoted to the BNL approach, no? I'm not
> against pushing this patch (the slicing one) forward and then maybe add
> BNL on top.

So what's a good way forward for this patch? Stalling forever like a
glacier is not an option; it'll probably end up melting. There's a lot
of discussion on this thread which I haven't read, and it's not
immediately clear to me whether this patch should just be thrown away in
favor of something completely different, or it can be considered a first
step in a long road.

So, I have been working on the fallback to block nested loop join
patch--latest non-parallel version posted here [1]. I am currently
still working on the parallel version but don't have a complete
working patch yet. I am hoping to finish it and solicit feedback in

the next couple weeks.

My patch chunks up a bad inner side batch and processes it a chunk

at a time. I haven't spent too much time yet thinking about Hubert's
suggestion proposed upthread. In the past I had asked Tomas about the
idea of splitting up only "bad batches" to avoid having other batches
which are very small. It seemed like this introduced additional
complexity for future spilled tuples finding a home, however, I had
not considered the hash function chain method Hubert is mentioning.

Even if we implemented additional strategies like the one Hubert is
suggesting, I still think that both the slicing patch originally
proposed in this thread as well as a BNLJ fallback option could all
work together, as I believe they solve slightly different problems.

If Tomas or someone else has time to pick up and modify BufFile
accounting patch, committing that still seems like the nest logical
step.

I will work on getting a complete (parallel-aware) BNLJ patch posted

soon.

[1] https://www.postgresql.org/message-id/CAAKRu_ZsRU%2BnszShs3AGVorx%3De%2B2jYkL7X%3DjiNO6%2Bqbho7vRpw%40mail.gmail.com

Melanie Plageman

pgsql-hackers by date:

From: Alvaro Herrera from 2ndQuadrant
Date: 05 September 2019, 16:07:55
Subject: Re: Proposal: roll pg_stat_statements into core

From: Jeff Davis
Date: 05 September 2019, 17:15:43
Subject: Re: range_agg

Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

Previous

Next