Home > mailing lists

Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-committers

From	Thomas Munro
Subject	Re: pgsql: Add parallel-aware hash joins.
Date	December 22, 2017 14:16:10
Msg-id	CAEepm=0WxwzpHVHt3PcWHBV=L3k3FDb6dvMq1A2Li49LGBa7TA@mail.gmail.com Whole thread Raw
In response to	Re: pgsql: Add parallel-aware hash joins. (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: pgsql: Add parallel-aware hash joins. (Andres Freund <andres@anarazel.de>)
List	pgsql-committers

Tree view

On Fri, Dec 22, 2017 at 1:48 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I don't think that's quite it, because it should never have set
> 'writing' for any batch number >= nbatch.
>
> It's late here, but I'll take this up tomorrow and either find a fix
> or figure out how to avoid antisocial noise levels on the build farm
> in the meantime.

Not there yet but I learned some things and am still working on it.  I
spent a lot of time trying to reproduce the assertion failure, and
succeeded exactly once.  Unfortunately the one time I managed do to
that I'd built with clang -O2 and got a core file that I couldn't get
much useful info out of, and I've been trying to do it again with -O0
ever since without luck.  The time I succeeded, I reproduced it by
creating the tables "simple" and "bigger_than_it_looks" from join.sql
and then doing this in a loop:

  set min_parallel_table_scan_size = 0;
  set parallel_setup_cost = 0;
  set work_mem = '192kB';

  explain analyze select count(*) from simple r join
bigger_than_it_looks s using (id);

The machine that it happened on is resource constrained, and exhibits
another problem: though the above query normally runs in ~20ms,
sometimes it takes several seconds and occasionally much longer.  That
never happens on fast development systems or test servers which run it
quickly every time, and it doesn't happen on my 2 core slow system if
I have only two workers (or one worker + leader).  I dug into that and
figured out what was going wrong and wrote that up separately[1],
because I think it's an independent bug needing to be fixed, not the
root cause here.  However, I think it could easily be contributing to
the timing required to trigger the bug we're looking for.

Andres, your machine francolin crashed -- got a core file?

[1] https://www.postgresql.org/message-id/CAEepm%3D0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-committers by date:

From: Alvaro Herrera
Date: 22 December 2017, 04:15:23
Subject: pgsql: Minor edits to catalog files and scripts

From: Andres Freund
Date: 22 December 2017, 14:22:03
Subject: Re: pgsql: Add parallel-aware hash joins.

Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-committers

Previous

Next