Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90
Date
Msg-id CA+Tgmob-jDKc8DUTzsCsQEmX7JNqHQC9u3=059WaO95YLwoaAA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Aug 28, 2017 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That sentence isn't wrong as written.
>
> Count the "that"s (you're failing to notice the next line).

OK, true.  But "Academic literature" -> "The academic literature" is
just second-guessing, I think.

>> I don't really understand the part about depending on a parallel-aware
>> node.  I mean, there should always be one, except in the
>> single-copy-Gather case, but why is it right to depend on that rather
>> than anything else?  What happens when the Parallel Hash patch goes in
>> and we have multiple parallel-aware scan nodes (plus a parallel-aware
>> Hash node) under the same Gather?
>
> Well, that's what I'm asking.  AFAICS we only really need the scan node(s)
> to be marked as depending on the Gather's rescan parameter.  It would not,
> however, hurt anything for nodes above them to be so marked as well ---
> and even if we didn't explicitly mark them, those nodes would end up
> depending on the parameter anyway because of the way that parameter
> dependency propagation works.  I think the question boils down to whether
> a "parallel_aware" node would ever not be underneath a related Gather.

There should never be a parallel_aware node that's not beneath a
Gather or Gather Merge; I don't know what the meaning of such a plan
would be, so I think we're safe against such a thing appearing in the
future.  What I'm unclear about is what happens with nodes that aren't
directly in the chain between the Gather and the parallel-aware node.
For instance:

Something
-> Gather -> Merge Join   -> Sort     -> Parallel Seq Scan on a   -> Merge Join     -> Sort       -> Seq Scan on b
->Sort       -> Seq Scan on c
 

If the Gather gets rescanned, is it OK to force a re-sort of a but not
of b or c?  Hmm, maybe so.  The workers are going to have to do the
sorts of b and c since any workers from a previous scan are GONE, but
if the leader's done that work, it's still good.  Similarly:

Something
-> Gather -> Merge Join   -> Sort     -> Parallel Seq Scan on a   -> Hash Join     -> Index Scan on b     -> Hash
->Seq Scan on c
 

If the leader's got an existing hash table built on c, it can reuse
it.  The workers will need to build one.  Now consider Parallel Hash
(not yet committed), where we might get this:

Something
-> Gather -> Merge Join   -> Sort     -> Parallel Seq Scan on a   -> Hash Join     -> Index Scan on b     -> Parallel
Hash      -> Parallel Seq Scan on c
 

Now what?  We clearly still need to force a re-sort of a, but what
about the shared hash table built on c?  If we've still got that hash
table and it was a single-batch join, there's probably no need to
rescan it even though both the Parallel Hash node and the Parallel Seq
Scan beneath it are parallel-aware.  In this case all workers
collaborated to build a shared hash table covering all rows from c; if
we've still got that, it's still good.  In fact, even the workers can
reuse that hash table even though, for them, it'll not really be a
re-scan at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Custom allocators in libpq
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands