Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-committers

From Thomas Munro
Subject Re: pgsql: Add parallel-aware hash joins.
Date
Msg-id CAEepm=3SeFvsfnnOLSA3tLtBe-rtyL=c+vfzyPCsViBjk521qw@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Add parallel-aware hash joins.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: Add parallel-aware hash joins.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
On Sun, Dec 31, 2017 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> You mentioned that prairiedog sees the problem about one time in
>> thirty.  Would you mind checking if it goes away with this patch
>> applied?
>
> I've run 55 cycles of "make installcheck" without seeing a failure
> with this patch installed.  That's not enough to be totally sure
> of course, but I think this probably fixes it.

Thanks!

> However ... I noticed that my other dinosaur gaur shows the other failure
> mode we see in the buildfarm, the "increased_batches = t" diff, and
> I can report that this patch does *not* help that.  The underlying
> EXPLAIN output goes from something like
>
> !                            Buckets: 4096  Batches: 8  Memory Usage: 208kB
>
> to something like
>
> !                            Buckets: 4096 (originally 4096)  Batches: 16 (originally 8)  Memory Usage: 176kB
>
> so again we have a case where the plan didn't change but the execution
> behavior did.  This isn't quite 100% reproducible on gaur/pademelon,
> but it fails more often than not seems like, so I can poke into it
> if you can say what info would be helpful.

Right.  That's apparently unrelated and is the last build-farm issue
on my list (so far).  I had noticed that certain BF animals are prone
to that particular failure, and they mostly have architectures that I
don't have so a few things are probably just differently sized.  At
first I thought I'd tweak the tests so that the parameters were always
stable, and I got as far as installing Debian on qemu-system-ppc (it
took a looong time to compile PostgreSQL), but that seems a bit cheap
and flimsy... better to fix the size estimation error.

I assume that what happens here is the planner's size estimation code
sometimes disagrees with Parallel Hash's chunk-based memory
accounting, even though in this case we had perfect tuple count and
tuple size information.  In an earlier version of the patch set I
refactored the planner to be chunk-aware (even for parallel-oblivious
hash join), but later in the process I tried to simplify and shrink
the patch set and avoid making unnecessary changes to non-Parallel
Hash code paths.  I think I'll need to make the planner aware of the
maximum amount of fragmentation possible when parallel-aware
(something like: up to one tuple's worth at the end of each chunk, and
up to one whole wasted chunk per participating backend).  More soon.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Add parallel-aware hash joins.
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Add parallel-aware hash joins.