Re: Redesigning the executor (async, JIT, memory efficiency) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Redesigning the executor (async, JIT, memory efficiency) |
Date | |
Msg-id | CA+TgmoZYnDhA_fsWaJdC60H-kL9KZYV8KRdUx0znPLo1q-p=Ag@mail.gmail.com Whole thread Raw |
In response to | Re: Redesigning the executor (async, JIT, memory efficiency) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Redesigning the executor (async, JIT, memory efficiency)
|
List | pgsql-hackers |
On Fri, May 25, 2018 at 2:40 AM, Andres Freund <andres@anarazel.de> wrote: > I think we're going to have to continue showing the tree plan. I think > the linear version is far too hard to understand for anything > nontrivial. Some of that is because this format is intrinsically hard to read, but it's also partly because of unfamiliarity and partly because you haven't put much work into it. You could certainly make it easier to read just by expending a little more effort on it: 0: Initialize Sort (for step 7) 1: Initialize Seq Scan on lineitem (for step 2) 2: Iterate Seq Scan on linitem into slot 0 If no more tuples, go to step 5. 3: Test slot 0 using <deparsed qual> If failed, go to step 2. 4: Load Hash Aggregate from slot 0. Go to step 2. 5: Retrieve from Hash Aggregate into slot 1. If no more tuples, go to step 7. etc. Even with this sort of TLC, I agree that reading this is no picnic, but I think continuing to show the tree structure when the executing plan has been linearized seems to have its own set of difficulties. The way that time is reported today in EXPLAIN ANALYZE presumes that execute recurses down the tree; that's how times for higher nodes end up including times for subordinate nodes. If we time each step executed using this method, that will no longer be true. We could make it true. For example we could make the Seq Scan time include steps 1, 2, and 3, the Hash Aggregate on top of it include that time plus steps 4 and 5, the Sort on top of that also include steps 0, 6, 7, and 8, etc. But that sucks. If we've got timing information for each step individually, that's awesome, and power users are going to want to see it. One of the most common questions one has when looking at today's EXPLAIN ANALYZE output is "where did the time go?" and we want to provide the most accurate possible answer to that question. Also, consider a plan that today looks like this: Append -> Seq Scan on p1 -> Seq Scan on p2 -> Seq Scan on p3 The linearized plan could omit the Append node altogether. We just went and made Append nodes have a cost because they do, but in cases where the Append is neither parallel-aware nor the new async-capable Append type we're presumably going to add nor doing partition pruning, we can and probably should optimize it away completely. That will change which plan is fastest in some cases, I'm fairly sure. It doesn't keep you from reporting on the Append node as if it still existed, but if you do, you're sorta lying. Another case that I think is kind of interesting is: Nested Loop Semi Join -> Seq Scan -> Index Scan Index Cond: ... If the index cond passes, we can emit the tuple returned by the Seq Scan and jump back to fetch the next tuple from the Seq Scan. Perhaps the Nested Loop doesn't really exist at all. If the join is something other than a semi/anti join then you probably need the join to project, unless all of the columns needed are from one side, but that's all, unless the join itself has a Join Filter. I think users are going to want to see EXPLAIN ANALYZE output that makes it really clear how the query has been optimized. Sure, the existing format may be easier for users in the short term, and certainly can be an option, but seeing something that conceals rather than reveals how the execution is really progressing may not be all that great in the end. Maybe there are other ways of displaying the linearized plan that would increase comprehensibility further. For example, in your version of the plan for TPC-H Q01, steps 0-1 are initialization, 2-4 are a loop, 5-6 are a loop, 7 is an independent step, 8-9 are another loop, and 10 represents completion. Maybe that can be used somehow to make something easier to understand. For example: Initialize -> Initialize Sort 1 -> Initiialize Seq Scan on lineitem Loop -> Iterate Seq Scan on lineitem Break if no more tuples -> Continue if ... -> Load Into Hash Aggregate 1 Loop -> Fetch From Hash Aggregate 1 -> Load Into Sort 1 Perform Sort 1 Loop -> Fetch from Sort 1 -> Return Tuple Maybe that doesn't seem that much easier, but at least it doesn't involve needing to understand step numbers. It strikes me that this kind of organization could potentially also be useful for a query progress indicator. If the whole plan is a series of loops and other steps, you can say things like -- well, the steps we've already completed represent 30% of the estimated cost. The steps we haven't started yet represent another 30% of the estimated cost. So we're somewhere between 30% done and 70% done. And then we can guess what fraction of the iterations we've completed for the loop we're currently performing. Obviously this is all pie-in-the-sky, but it certainly seems like this kind of thing would enable estimation that we can't easily perform today. How will a linearized plan handle handle backward fetches? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: