Thread: match_unsorted_outer() vs. cost_nestloop()
In joinpath.c, match_unsorted_outer() considers materializing the inner side of each nested loop if the inner path is not an index scan, bitmap heap scan, tid scan, material path, function scan, CTE scan, or worktable scan. In costsize.c, cost_nestloop() charges the startup cost only once if the inner path is a hash path or material path; otherwise, it charges it for every anticipated rescan. It seems to me, perhaps naively, like the criteria used in these two places are more different than they maybe should be. For example, function scan nodes insert their results into a tuplestore so that rescans get the same set of tuples, which is why we don't consider inserting a materialize node over them in match_unsorted_outer() - but I think that also means that we oughtn't to be counting the startup cost for every rescan. I'm not exactly sure which ones should match or not match. Hash paths, maybe, shouldn't. I believe the reason why we don't count the startup cost of the hash path over again is because we're assuming that it's attributable to the cost of building the hash table, which only needs to be done once. I don't think that's 100% accurate because the hash path could have inherited some of that cost from its underlying paths. At any rate, it's conceivable that materializing could be enough cheaper than repeating the join that a materialize nodes makes sense. Thoughts? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > In joinpath.c, match_unsorted_outer() considers materializing the > inner side of each nested loop if the inner path is not an index scan, > bitmap heap scan, tid scan, material path, function scan, CTE scan, or > worktable scan. In costsize.c, cost_nestloop() charges the startup > cost only once if the inner path is a hash path or material path; > otherwise, it charges it for every anticipated rescan. > It seems to me, perhaps naively, like the criteria used in these two > places are more different than they maybe should be. They are considering totally different effects, so I'm not sure I follow that conclusion. I'll certainly concede that the costing of materialize plans is rather bogus --- it's been a long time since materialize behaved the way cost_material thinks it does (ie, read the whole input before handing anything back). But our cost model doesn't have a way to represent the true value of a materialize node, which is that a re-read is a lot cheaper than the original fetch. I've occasionally tried to think of a way to deal with that without introducing a lot of extra calculations and complexity everywhere else ... regards, tom lane
On Fri, Sep 4, 2009 at 9:54 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> In joinpath.c, match_unsorted_outer() considers materializing the >> inner side of each nested loop if the inner path is not an index scan, >> bitmap heap scan, tid scan, material path, function scan, CTE scan, or >> worktable scan. In costsize.c, cost_nestloop() charges the startup >> cost only once if the inner path is a hash path or material path; >> otherwise, it charges it for every anticipated rescan. > >> It seems to me, perhaps naively, like the criteria used in these two >> places are more different than they maybe should be. > > They are considering totally different effects, so I'm not sure I > follow that conclusion. > > I'll certainly concede that the costing of materialize plans is rather > bogus --- it's been a long time since materialize behaved the way > cost_material thinks it does (ie, read the whole input before handing > anything back). But our cost model doesn't have a way to represent the > true value of a materialize node, which is that a re-read is a lot > cheaper than the original fetch. I've occasionally tried to think of a > way to deal with that without introducing a lot of extra calculations > and complexity everywhere else ... I noticed that, too, and I don't know what to do about it either. I guess my point is that for node types that dump their output into a tuplestore anyway, it doesn't seem like cost_nestloop() should charge n * the startup cost. I believe that at least function, CTE, and worktable scans fall into this category. No? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I guess my point is that for node types that dump their output into a > tuplestore anyway, it doesn't seem like cost_nestloop() should charge > n * the startup cost. I believe that at least function, CTE, and > worktable scans fall into this category. No? Yeah, probably. The comment is correct as is: * their sum. What's not so clear is whether the inner path's * startup_cost must be paid again on each rescan ofthe inner path. This * is not true if the inner path is materialized or is a hashjoin, but * probably is true otherwise. What's not correct is the code's expansion of "is materialized" as "is a MaterialPath". However, I'm not sure it's worth just adding these other tuplestore-using types to the list. We really ought to think a bit harder about representing the difference between initial scan cost and rescan cost. It might be sufficient to have cost_nestloop just hardwire the knowledge that certain inner path types have a different behavior here --- that is, for a rescan there is zero start cost and some very low per-tuple cost, independent of the path's nominal cost values (which would now be defined as always the costs for the first scan). And maybe the same in cost_mergejoin. Offhand I don't think anyplace else really needs to think about rescan costs. I think this would be enough to deal with the issue for those plan types that materialize their output, because they all have about the same runtime behavior in this regard. What gets more exciting is if you'd like to model other effects this way --- for example, the one that rescanning an indexscan is probably a lot cheaper than the original fetch because of caching effects. But we already have that sort of thing accounted for (to some extent anyway) elsewhere, so I think we can probably ignore it here. regards, tom lane
I wrote: > It might be sufficient to have cost_nestloop just hardwire the knowledge > that certain inner path types have a different behavior here --- that > is, for a rescan there is zero start cost and some very low per-tuple > cost, independent of the path's nominal cost values (which would now > be defined as always the costs for the first scan). And maybe the same > in cost_mergejoin. Offhand I don't think anyplace else really needs to > think about rescan costs. After thinking about that a bit more, I think the best way might be to create a "cost_rescan" function that is given a Path and returns the startup cost and total cost to be assumed for a rescan of this Path. It would know about the special behavior of MaterialPath and the other tuplestore-using plan types, and for everything else would just return the path's regular costs. Alternatively we could create a cost_foo_rescan() function paralleling each cost_foo() function, but given the small number of distinct behaviors I think that would be fairly redundant and hard to maintain. regards, tom lane
On Sat, Sep 5, 2009 at 8:39 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I wrote: >> It might be sufficient to have cost_nestloop just hardwire the knowledge >> that certain inner path types have a different behavior here --- that >> is, for a rescan there is zero start cost and some very low per-tuple >> cost, independent of the path's nominal cost values (which would now >> be defined as always the costs for the first scan). And maybe the same >> in cost_mergejoin. Offhand I don't think anyplace else really needs to >> think about rescan costs. > > After thinking about that a bit more, I think the best way might be > to create a "cost_rescan" function that is given a Path and returns > the startup cost and total cost to be assumed for a rescan of this Path. > It would know about the special behavior of MaterialPath and the other > tuplestore-using plan types, and for everything else would just return > the path's regular costs. > > Alternatively we could create a cost_foo_rescan() function paralleling > each cost_foo() function, but given the small number of distinct > behaviors I think that would be fairly redundant and hard to maintain. That sounds very reasonable. There's another sort of interesting point here, too, relating to dealing with imperfect statistics. When costing a nest-join, the number of times we expect to rescan the inner side is equal to the estimated number of outer rows. If that estimate is > 1, materialization will often look like a winner, provided that it's not estimated to overflow work_mem, because the we'll say, oh, look all these rescans get much cheaper and all it costs us is a little bookkeeping overhead. But if the estimate is <= 1, materialization will certainly look like a loser, because there are no rescans to make cheaper and we still have the bookkeeping overhead. But let's say the estimate is off. If the real number of outer rows turns out to be 0, then materialization costs nothing, because we'll never evaluate the inner side at all. And if it turns out to be 2 or more, then materialization probably wins, as long as it doesn't spill to disk. Only if the number of outer rows turns out to be exactly 1 does materialization lose - and even then it's pretty cheap, unless, of course, it spills to disk. So maybe we should consider FORCING materialization to be used in certain cases, something like this: 1. If the inner path is something that's relatively cheap to rescan (already materialized or an index scan), then try only the non-materialized path. This is what we already do now. 2. Otherwise, estimate the memory cost of materializing the inner side. If it seems that it will fit within work_mem, then try the materialized path ONLY, on the assumption that there's far more upside than downside. 3. Otherwise, try both plans (?). Likely both of them are going to be pretty bad... It's also worth noting that the comment here doesn't really match the code. "If the cheapest inner path is a join or seqscan, we should consider materializing it." But that's not what the code actually tests - instead, it checks that the cheapest inner path is NOT an index scan, bitmap heap scan, tidscan, material path, function scan, CTE scan, or worktable scan. Maybe that works out to the same thing, but it's not obvious. And, by the way, is the algorithm proposed in the comment sensible anyway? Under what circumstances would it make sense to materialize a sequential scan? It seems to me that if the relation is small enough to fit in memory, then materializing it won't save much - if it's not, then materialization is going to be slow, too. I guess if you have work_mem very large relative to shared buffers, then you might get the planner to think that it can cache it with materialization but not without materialization, but I don't that the real world can ever work that way. I don't think I've ever seen the planner choose to materialize a seqscan, but it might become much more common with this change if we're not careful, because while cost_index() contains some logic to take into effect the caching effects associated with rescanning and index, cost_seqscan() does not. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > And, by the way, is the algorithm proposed in the comment sensible > anyway? Under what circumstances would it make sense to materialize a > sequential scan? Expensive filter conditions, for example. I've occasionally wondered if this code isn't outright wrong anyway: when you consider the costs of checking tuple visibility and the costs involved in access to a shared buffer, it's possible that copying tuples to a local materialization store would be a win for rescans in any case. (Of course it's a lot easier to credit that concept when the store doesn't spill to disk.) Given the basic bogosity of the costing rules I wasn't eager to mess with it; in fact I think we deliberately tweaked things in this area to prevent materialization, because otherwise the planner *always* wanted to materialize and that didn't seem to be a win. But now that we have a plan for a less obviously broken costing approach, maybe we should open the floodgates and allow materialization to be considered for any inner path that doesn't materialize itself already. regards, tom lane
On Sep 6, 2009, at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> And, by the way, is the algorithm proposed in the comment sensible >> anyway? Under what circumstances would it make sense to >> materialize a >> sequential scan? > > Expensive filter conditions, for example. Ah, right. Yeah that could be a big win. > I've occasionally wondered if this code isn't outright wrong anyway: > when you consider the costs of checking tuple visibility and the costs > involved in access to a shared buffer, it's possible that copying > tuples > to a local materialization store would be a win for rescans in any > case. > (Of course it's a lot easier to credit that concept when the store > doesn't spill to disk.) Given the basic bogosity of the costing rules > I wasn't eager to mess with it; in fact I think we deliberately > tweaked > things in this area to prevent materialization, because otherwise the > planner *always* wanted to materialize and that didn't seem to be a > win. > But now that we have a plan for a less obviously broken costing > approach, maybe we should open the floodgates and allow > materialization > to be considered for any inner path that doesn't materialize itself > already Maybe. I think some experimentation will be required. We also have to be aware of effects on planning time; match_unsorted_outer() is, AIR, a significant part of the CPU cost of planning large join problems. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sep 6, 2009, at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... But now that we have a plan for a less obviously broken costing >> approach, maybe we should open the floodgates and allow >> materialization >> to be considered for any inner path that doesn't materialize itself >> already > Maybe. I think some experimentation will be required. We also have > to be aware of effects on planning time; match_unsorted_outer() is, > AIR, a significant part of the CPU cost of planning large join problems. I've committed some changes pursuant to this discussion. It may be that match_unsorted_outer gets a bit slower, but I'm not too worried about that. My experience is that the code that tries different mergejoin options eats way more cycles than the nestloop code does. regards, tom lane