Thread: EXPLAIN WITH
I'm a bit unsatisfied with this output because it doesn't tell me the plan it used to construct the CTE being scanned. rhaas=# explain with wumpus as (select * from foo where id < 200) select * from foo f, wumpus c, wumpus u where f.creator_id = c.id and f.last_updater_id = u.id; QUERY PLAN -----------------------------------------------------------------------------Hash Join (cost=31.44..74.94 rows=1000 width=36) Hash Cond: (f.creator_id = c.id) InitPlan -> Seq Scan on foo (cost=0.00..18.50 rows=199 width=12) Filter: (id < 200) -> Hash Join (cost=6.47..36.22 rows=1000 width=24) Hash Cond: (f.last_updater_id = u.id) -> Seq Scan on foo f (cost=0.00..16.00 rows=1000 width=12) -> Hash (cost=3.98..3.98 rows=199 width=12) -> CTE Scan on wumpus u (cost=0.00..3.98 rows=199 width=12) -> Hash (cost=3.98..3.98 rows=199 width=12) -> CTE Scan on wumpus c (cost=0.00..3.98 rows=199 width=12) (12 rows) I haven't looked at what would be required to fix this, but I think we should fix it before releasing 8.4. ...Robert
On Sun, Apr 5, 2009 at 9:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I'm a bit unsatisfied with this output because it doesn't tell me the > plan it used to construct the CTE being scanned. I'm totally wrong. Sorry for the noise. ...Robert
* Robert Haas (robertmhaas@gmail.com) wrote: > On Sun, Apr 5, 2009 at 9:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I'm a bit unsatisfied with this output because it doesn't tell me the > > plan it used to construct the CTE being scanned. > > I'm totally wrong. Sorry for the noise. Eh. It could be made clearer what's happening, imv. Esp. if you have a number of WITH pieces and want to know which ones connect to which in the plan between the CTE nodes and the plans that they're running... eg: postgres=# explain with wumpus as (select * from foo where id < 200), abc as (select * from tab1) select * from foo f, wumpus c, wumpus u, abc x where f.creator_id = c.id and f.last_updater_id = u.id; QUERY PLAN ------------------------------------------------------------------------------------------Nested Loop (cost=1266.59..1001458.19rows=35936310 width=52) InitPlan -> Seq Scan on foo (cost=0.00..34.25 rows=647 width=12) Filter: (id < 200) -> Seq Scan on tab1 (cost=0.00..27.70 rows=1770 width=16) -> CTE Scan on abc x (cost=0.00..35.40rows=1770 width=16) -> Materialize (cost=1204.64..1566.67 rows=20303 width=36) -> Hash Join (cost=42.05..1025.34 rows=20303 width=36) Hash Cond: (f.creator_id = c.id) -> Hash Join (cost=21.03..275.66rows=6276 width=24) Hash Cond: (f.last_updater_id = u.id) -> SeqScan on foo f (cost=0.00..29.40 rows=1940 width=12) -> Hash (cost=12.94..12.94 rows=647 width=12) -> CTE Scan on wumpus u (cost=0.00..12.94 rows=647 width=12) -> Hash (cost=12.94..12.94rows=647 width=12) -> CTE Scan on wumpus c (cost=0.00..12.94 rows=647 width=12) (16 rows) I see the initplan's for foo and tab1, and can figure out where those fit in, but in a more complex situation it would be much less clear, I believe. Would be nice if there was a CTE ID or similar to link between the pieces of the InitPlan and the CTE nodes. I think that makes sense anyway, I havn't played with CTE much but that's what it looks like to me. Stephen
* Stephen Frost (sfrost@snowman.net) wrote: > Would be nice if there was a CTE ID or similar to link between > the pieces of the InitPlan and the CTE nodes. Erm, of course, the CTE *has* an ID already, since you name them. Could we get that ID/name up into the piece of the InitPlan that is handling that CTE? Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Stephen Frost (sfrost@snowman.net) wrote: >> Would be nice if there was a CTE ID or similar to link between >> the pieces of the InitPlan and the CTE nodes. > Erm, of course, the CTE *has* an ID already, since you name them. Could > we get that ID/name up into the piece of the InitPlan that is handling > that CTE? I'm not sure but will be glad to take a look. Assuming it's not unreasonably difficult, does anyone object to a format like this: Nested Loop (cost=1266.59..1001458.19 rows=35936310 width=52) InitPlan CTE abc -> Seq Scan on foo (cost=0.00..34.25rows=647 width=12) Filter: (id < 200) CTE wumpus -> Seq Scan on tab1 (cost=0.00..27.70rows=1770 width=16) -> CTE Scan on abc x (cost=0.00..35.40 rows=1770 width=16) -> Materialize (cost=1204.64..1566.67rows=20303 width=36) -> Hash Join (cost=42.05..1025.34 rows=20303 width=36) ... regards, tom lane
I wrote: > Stephen Frost <sfrost@snowman.net> writes: >> Erm, of course, the CTE *has* an ID already, since you name them. Could >> we get that ID/name up into the piece of the InitPlan that is handling >> that CTE? > I'm not sure but will be glad to take a look. Assuming it's not > unreasonably difficult, does anyone object to a format like this: I looked at this a bit and found that it *is* unreasonably difficult given what's currently stored in plan trees. SubPlans and InitPlans are identified only by ID numbers. In principle we could search the plan tree for a CTE Scan node referencing that ID number and then grab the name of the RTE it refers to ... but ick, especially considering we'd have to do that for every subplan, even the ones that are not CTEs. What we could do instead, which is a pretty simple change, is to add a "name" field to struct SubPlan. If we were going to do this, I'd vote for filling it in for every subplan. For actual CTEs we could fill in "CTE name"; for anonymous subqueries we could do no better than "InitPlan n" or "SubPlan n". However, that would still be a step forward, because then we could have references to subplans say "subplan n" instead of just "subplan". So instead of regression=# explain select * from tenk1 a where unique2 not in (select f1 from int4_tbl); QUERYPLAN ----------------------------------------------------------------Seq Scan on tenk1 a (cost=1.06..484.06 rows=5000 width=244) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) (4 rows) you'd get Seq Scan on tenk1 a (cost=1.06..484.06 rows=5000 width=244) Filter: (NOT (hashed subplan 1)) SubPlan 1 -> Seq Scanon int4_tbl (cost=0.00..1.05 rows=5 width=4) This isn't terribly compelling in this example, of course, but it gets a lot more important when you've got a dozen of 'em. From the perspective of the backend this is a simple and cheap change. I think the objection that is most likely to be raised is that it would confuse or break programs that analyze EXPLAIN output in any degree of detail. Of course such programs are going to need some work for 8.4 already. Comments? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > This isn't terribly compelling in this example, of course, but > it gets a lot more important when you've got a dozen of 'em. Exactly. > >From the perspective of the backend this is a simple and cheap change. Awesome. > I think the objection that is most likely to be raised is that it would > confuse or break programs that analyze EXPLAIN output in any degree of > detail. Of course such programs are going to need some work for 8.4 > already. I definitely feel that it would be best to make this change now, when we're introducing CTE as a type that anything doing EXPLAIN would need to deal with at some level anyway than to add it later (eg: 8.5). I'm definitely a +1 on this. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > I definitely feel that it would be best to make this change now, when > we're introducing CTE as a type that anything doing EXPLAIN would need > to deal with at some level anyway than to add it later (eg: 8.5). I'm > definitely a +1 on this. Here's a somewhat contrived example of what you get now: regression=# explain with wumpus as (select * from tenk1 ) select * from wumpus where unique2 = (select sum(f1) from int4_tbl)or exists(select 1 from tenk1 z where z.unique1 = wumpus.thousand); QUERY PLAN -------------------------------------------------------------------------------------CTE Scan on wumpus (cost=459.07..83371.07rows=5025 width=244) Filter: ((unique2 = $1) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) CTEwumpus -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) InitPlan 2 (returns $1) -> Aggregate (cost=1.06..1.07rows=1 width=4) -> Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) SubPlan 4 -> SeqScan on tenk1 z (cost=0.00..458.00 rows=10000 width=4) SubPlan 3 -> Index Scan using tenk1_unique1 on tenk1 z (cost=0.00..8.27rows=1 width=0) Index Cond: (unique1 = $2) (12 rows) Before, it would have been pretty tough to tell which subplan did what. The ruleutils printout of a subplan reference still leaves some things to be desired, eg in this example you can't really tell it's an EXISTS check as opposed to something else. But fixing that is probably too much for this late date --- I'm not even real clear on what a better printout would look like, exactly. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > I think the objection that is most likely to be raised is that it would > confuse or break programs that analyze EXPLAIN output in any degree of > detail. Of course such programs are going to need some work for 8.4 > already. As someone who has written more than one such program, I say go for it, looks like a good solution. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200904052149 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAknZX7IACgkQvJuQZxSWSsjTXgCfdF+qXRLliWjgd4FFFJLzVsh0 zf0AoNN3S02y+4DopOwo/bGTwEXZuA5S =yiKq -----END PGP SIGNATURE-----
On Mon, Apr 6, 2009 at 2:49 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > >> I think the objection that is most likely to be raised is that it would >> confuse or break programs that analyze EXPLAIN output in any degree of >> detail. Of course such programs are going to need some work for 8.4 >> already. > > As someone who has written more than one such program, I say go for it, > looks like a good solution. Argh! Some of us have already made those changes and spent quite some time testing them for 8.4 :-( -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Mon, Apr 6, 2009 at 3:52 AM, Dave Page <dpage@pgadmin.org> wrote: > Argh! Some of us have already made those changes and spent quite some > time testing them for 8.4 :-( I feel your pain, but OTOH, if it's not possible to fix a feature that was introduced in 8.4 before 8.4 has even gone to beta, that's setting the bar pretty high. It would not surprise me if we have other issues of similar magnitude come up during beta, and if we're not going to fix them, then what's the point of beta? ...Robert
On Mon, Apr 6, 2009 at 11:27 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 6, 2009 at 3:52 AM, Dave Page <dpage@pgadmin.org> wrote: >> Argh! Some of us have already made those changes and spent quite some >> time testing them for 8.4 :-( > > I feel your pain, but OTOH, if it's not possible to fix a feature that > was introduced in 8.4 before 8.4 has even gone to beta, that's setting > the bar pretty high. It would not surprise me if we have other issues > of similar magnitude come up during beta, and if we're not going to > fix them, then what's the point of beta? I realise it needs to be done, it's just annoying. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com