Re: Redesigning the executor (async, JIT, memory efficiency) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Redesigning the executor (async, JIT, memory efficiency)
Date
Msg-id CA+TgmoZYnDhA_fsWaJdC60H-kL9KZYV8KRdUx0znPLo1q-p=Ag@mail.gmail.com
Whole thread Raw
In response to Re: Redesigning the executor (async, JIT, memory efficiency)  (Andres Freund <andres@anarazel.de>)
Responses Re: Redesigning the executor (async, JIT, memory efficiency)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, May 25, 2018 at 2:40 AM, Andres Freund <andres@anarazel.de> wrote:
> I think we're going to have to continue showing the tree plan. I think
> the linear version is far too hard to understand for anything
> nontrivial.

Some of that is because this format is intrinsically hard to read, but
it's also partly because of unfamiliarity and partly because you
haven't put much work into it.  You could certainly make it easier to
read just by expending a little more effort on it:

0: Initialize Sort (for step 7)
1: Initialize Seq Scan on lineitem (for step 2)
2: Iterate Seq Scan on linitem into slot 0
    If no more tuples, go to step 5.
3: Test slot 0 using <deparsed qual>
    If failed, go to step 2.
4: Load Hash Aggregate from slot 0.
    Go to step 2.
5: Retrieve from Hash Aggregate into slot 1.
    If no more tuples, go to step 7.
etc.

Even with this sort of TLC, I agree that reading this is no picnic,
but I think continuing to show the tree structure when the executing
plan has been linearized seems to have its own set of difficulties.
The way that time is reported today in EXPLAIN ANALYZE presumes that
execute recurses down the tree; that's how times for higher nodes end
up including times for subordinate nodes.  If we time each step
executed using this method, that will no longer be true.  We could
make it true.  For example we could make the Seq Scan time include
steps 1, 2, and 3, the Hash Aggregate on top of it include that time
plus steps 4 and 5, the Sort on top of that also include steps 0, 6,
7, and 8, etc.  But that sucks.  If we've got timing information for
each step individually, that's awesome, and power users are going to
want to see it.  One of the most common questions one has when looking
at today's EXPLAIN ANALYZE output is "where did the time go?" and we
want to provide the most accurate possible answer to that question.

Also, consider a plan that today looks like this:

Append
-> Seq Scan on p1
-> Seq Scan on p2
-> Seq Scan on p3

The linearized plan could omit the Append node altogether.  We just
went and made Append nodes have a cost because they do, but in cases
where the Append is neither parallel-aware nor the new async-capable
Append type we're presumably going to add nor doing partition pruning,
we can and probably should optimize it away completely.  That will
change which plan is fastest in some cases, I'm fairly sure.  It
doesn't keep you from reporting on the Append node as if it still
existed, but if you do, you're sorta lying.  Another case that I think
is kind of interesting is:

Nested Loop Semi Join
-> Seq Scan
-> Index Scan
    Index Cond: ...

If the index cond passes, we can emit the tuple returned by the Seq
Scan and jump back to fetch the next tuple from the Seq Scan.  Perhaps
the Nested Loop doesn't really exist at all.  If the join is something
other than a semi/anti join then you probably need the join to
project, unless all of the columns needed are from one side, but
that's all, unless the join itself has a Join Filter.  I think users
are going to want to see EXPLAIN ANALYZE output that makes it really
clear how the query has been optimized.  Sure, the existing format may
be easier for users in the short term, and certainly can be an option,
but seeing something that conceals rather than reveals how the
execution is really progressing may not be all that great in the end.

Maybe there are other ways of displaying the linearized plan that
would increase comprehensibility further.  For example, in your
version of the plan for TPC-H Q01, steps 0-1 are initialization, 2-4
are a loop, 5-6 are a loop, 7 is an independent step, 8-9 are another
loop, and 10 represents completion.  Maybe that can be used somehow to
make something easier to understand.  For example:

Initialize
-> Initialize Sort 1
-> Initiialize Seq Scan on lineitem
Loop
-> Iterate Seq Scan on lineitem
    Break if no more tuples
-> Continue if ...
-> Load Into Hash Aggregate 1
Loop
-> Fetch From Hash Aggregate 1
-> Load Into Sort 1
Perform Sort 1
Loop
-> Fetch from Sort 1
-> Return Tuple

Maybe that doesn't seem that much easier, but at least it doesn't
involve needing to understand step numbers.  It strikes me that this
kind of organization could potentially also be useful for a query
progress indicator.  If the whole plan is a series of loops and other
steps, you can say things like -- well, the steps we've already
completed represent 30% of the estimated cost.  The steps we haven't
started yet represent another 30% of the estimated cost.  So we're
somewhere between 30% done and 70% done.  And then we can guess what
fraction of the iterations we've completed for the loop we're
currently performing.  Obviously this is all pie-in-the-sky, but it
certainly seems like this kind of thing would enable estimation that
we can't easily perform today.

How will a linearized plan handle handle backward fetches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Next
From: Andrew Gierth
Date:
Subject: Re: SPI/backend equivalent of extended-query Describe(statement)?