Thread: REVIEW: EXPLAIN and nfiltered
Greetings, On 2010-01-15 11:37 PM +200, Marko Tiikkaja wrote: > On 2010-11-18 5:45 PM +0200, Marko Tiikkaja wrote: > > Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan > > qual filtered from a node's input. > > Rebased against master. This patch looked good, in general, to me. I added a few documentation updates and a comment, but it's a very straight-forward patch as far as I can tell. Passes all regressions and my additional testing. commit fac899f7967ce74e14a90af9ca24e1a1f5a580e7 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 22:14:54 2011 -0500 Fix < & > in docs to be < >, as required. commit 5fcdb75a646912b8b273703caf33dadb80122e1c Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 22:05:05 2011 -0500 Update documentation for EXPLAIN ANALYZE/nfiltered This patch updates some documentation around EXPLAIN ANALYZE, whose output has been changed by the patch which added nfiltered to it. Also added a comment in the only place that seemed to need one. commit 9ebb0108a217c2d3b7f815d1d902d6bdcc276104 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 21:33:28 2011 -0500 Add nfiltered in EXPLAIN ANALYZE This patch add the number of rows a plan qual filtered from a node's input to the EXPLAIN ANALYZE output. Patch by: Marko Tiikkaja Thanks, Stephen
Attachment
On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote: > This patch looked good, in general, to me. I added a few documentation > updates and a comment, but it's a very straight-forward patch as far as > I can tell. Passes all regressions and my additional testing. I have not looked at the code for this patch at all as yet, but just as a general user comment - I really, really want this. It's one of about, uh, two pieces of information that the EXPLAIN output doesn't give you that is incredibly important for troubleshooting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 20, 2011 at 12:16, Stephen Frost <sfrost@snowman.net> wrote: > This patch looked good, in general, to me. I added a few documentation > updates and a comment, but it's a very straight-forward patch as far as > I can tell. Passes all regressions and my additional testing. Looks good and useful for me, too. We need to adjust a bit more documentation. The patch changes all of EXPLAIN ANALYZE outputs. When I grep'ed the docs with "loops=", EXPLAIN ANALYZE is also used in perform.sgml and auto-explain.sgml in addition to explain.sgml. It's good to have documentation about "nfiltered" parameter. The best place would be around the descriptions of "loops" in "Using EXPLAIN" page: http://developer.postgresql.org/pgdocs/postgres/using-explain.html -- Itagaki Takahiro
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote: >> This patch looked good, in general, to me. �I added a few documentation >> updates and a comment, but it's a very straight-forward patch as far as >> I can tell. �Passes all regressions and my additional testing. > I have not looked at the code for this patch at all as yet, but just > as a general user comment - I really, really want this. It's one of > about, uh, two pieces of information that the EXPLAIN output doesn't > give you that is incredibly important for troubleshooting. What's the other one? The main problem I've got with this patch is that there's no place to shoehorn the information into the textual EXPLAIN format without breaking a lot of expectations (and hence code --- it's insane to imagine that any significant amount of client-side code has been rewritten to make use of xml/json output yet). It would be nice to know what other requests are likely to be coming down the pike before we decide exactly how we're going to break things. regards, tom lane
On Thu, Jan 20, 2011 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote: >>> This patch looked good, in general, to me. I added a few documentation >>> updates and a comment, but it's a very straight-forward patch as far as >>> I can tell. Passes all regressions and my additional testing. > >> I have not looked at the code for this patch at all as yet, but just >> as a general user comment - I really, really want this. It's one of >> about, uh, two pieces of information that the EXPLAIN output doesn't >> give you that is incredibly important for troubleshooting. > > What's the other one? In the following sort of plan: rhaas=# explain analyze select * from bob b, sally s where b.a = s.a; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..117890.00 rows=1000 width=8) (actual time=0.036..533.372 rows=1000 loops=1) -> Seq Scan on sally s (cost=0.00..5770.00 rows=400000 width=4) (actual time=0.014..46.469 rows=400000 loops=1) -> Index Scan using bob_pkey on bob b (cost=0.00..0.27 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=400000) Index Cond: (a = s.a)Total runtime: 533.935 ms (5 rows) ...you cannot really tell how many rows the index scan was expected to match, or actually did match. The answer to the latter question certainly isn't 0. We previously discussed making the rows= line go out to three decimal places when used in an inner-index-scan context, which would help a lot - you could then multiply rows by loops to get an approximate answer. My preferred fix would be just to remove the unhelpful division-by-nloops code that gets applied in this case, but that's less backward-compatible. > The main problem I've got with this patch is that there's no place to > shoehorn the information into the textual EXPLAIN format without > breaking a lot of expectations (and hence code --- it's insane to > imagine that any significant amount of client-side code has been > rewritten to make use of xml/json output yet). It would be nice to know > what other requests are likely to be coming down the pike before we > decide exactly how we're going to break things. It's hard to predict the nature of future feature requests, but this and the above are at the top of my list of ongoing gripes, and there isn't a close third. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > The main problem I've got with this patch is that there's no place to > shoehorn the information into the textual EXPLAIN format without > breaking a lot of expectations (and hence code --- it's insane to > imagine that any significant amount of client-side code has been > rewritten to make use of xml/json output yet). It would be nice to know > what other requests are likely to be coming down the pike before we > decide exactly how we're going to break things. While I agree completely about the general "if you're going to break, break it big" approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. This is going into a new major version, not something which is being back-patched, and users now have a way in a released version to get away from relying on the string output. Have we worried about adding new plan nodes due to breakage in the explain output..? It strikes me that we've actually changed it with some regularity, in one aspect or another, over a couple of releases. Maybe my memory is bad though. Thanks, Stephen
On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> The main problem I've got with this patch is that there's no place to >> shoehorn the information into the textual EXPLAIN format without >> breaking a lot of expectations (and hence code --- it's insane to >> imagine that any significant amount of client-side code has been >> rewritten to make use of xml/json output yet). It would be nice to know >> what other requests are likely to be coming down the pike before we >> decide exactly how we're going to break things. > > While I agree completely about the general "if you're going to break, > break it big" approach, but I don't particularly care for holding output > strings from EXPLAIN to the same level that we do the wireline protocol. > This is going into a new major version, not something which is being > back-patched, and users now have a way in a released version to get away > from relying on the string output. I agree; we make bigger changes than this all the time. At the risk of putting words in Tom's mouth, I think part of the concern here may be that the EXPLAIN output is quite verbose already, and adding a few more details is going to make it harder to read in the cases where you *don't* care about this additional information. That's a valid concern, but I don't know what to do about it - not having this information available isn't better. It's tempting to propose moving the "actual" numbers down to the next line, so that the lines aren't so ridiculously long. Looking at the patch, I have to say I had hoped this was going to show nfiltered in both the estimated and actual cases, which it doesn't. Now maybe that's more work than we want to put in, but it would be nice to have. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2011-01-20 7:07 PM +0200, Robert Haas wrote: > Looking at the patch, I have to say I had hoped this was going to show > nfiltered in both the estimated and actual cases, which it doesn't. > Now maybe that's more work than we want to put in, but it would be > nice to have. That would be fantastical, if we can make it happen. Regards, Marko Tiikkaja
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net> wrote: >> While I agree completely about the general "if you're going to break, >> break it big" approach, but I don't particularly care for holding output >> strings from EXPLAIN to the same level that we do the wireline protocol. > I agree; we make bigger changes than this all the time. No, we don't. It's true that a client that wants to truly *understand* the plan has to know a lot of things, but the fundamental format of EXPLAIN ANALYZE output has been real stable for a real long time: node name (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx) detail line: somethingor other -> subnode name ... more of the same ... This level of understanding seems plenty sufficient for something like explain.depesz.com, to name just one popular tool. The last format change of any kind we made in this skeleton was to increase the number of decimal places in the "actual time" numbers from 2 to 3 (wow). That was in 7.4. Modulo that detail, this basic contract has been valid since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch will break it. It might be interesting for somebody to go look at Hubert's code and see just how much it really knows about the EXPLAIN output format, and how much it's had to change across PG releases. regards, tom lane
On Thu, Jan 20, 2011 at 12:57 PM, Magnus Hagander <magnus@hagander.net> wrote: > > On Jan 20, 2011 6:43 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> >> Robert Haas <robertmhaas@gmail.com> writes: >> > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net> >> > wrote: >> >> While I agree completely about the general "if you're going to break, >> >> break it big" approach, but I don't particularly care for holding >> >> output >> >> strings from EXPLAIN to the same level that we do the wireline >> >> protocol. >> >> > I agree; we make bigger changes than this all the time. >> >> No, we don't. It's true that a client that wants to truly *understand* >> the plan has to know a lot of things, but the fundamental format of >> EXPLAIN ANALYZE output has been real stable for a real long time: >> >> node name (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual >> time=xxx.xxx..xxx.xxx rows=xxx loops=xxx) >> detail line: something or other >> -> subnode name ... more of the same ... >> >> This level of understanding seems plenty sufficient for something like >> explain.depesz.com, to name just one popular tool. The last format >> change of any kind we made in this skeleton was to increase the number >> of decimal places in the "actual time" numbers from 2 to 3 (wow). >> That was in 7.4. Modulo that detail, this basic contract has been valid >> since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch >> will break it. >> >> It might be interesting for somebody to go look at Hubert's code and see >> just how much it really knows about the EXPLAIN output format, and how >> much it's had to change across PG releases. >> > > Haven't looked at what changes with this patch, but dont forget PgAdmin that > also parses the output. Though if the format changes enough to affect it, > that might be the driving force to have it use xml format instead, which is > the one that is intended for machine parsing after all.. How much has that code been updated from one release to the next? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<p><br /> On Jan 20, 2011 6:43 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> writes:<br/> > > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <<a href="mailto:sfrost@snowman.net">sfrost@snowman.net</a>>wrote:<br /> > >> While I agree completely about thegeneral "if you're going to break,<br /> > >> break it big" approach, but I don't particularly care for holdingoutput<br /> > >> strings from EXPLAIN to the same level that we do the wireline protocol.<br /> ><br/> > > I agree; we make bigger changes than this all the time.<br /> ><br /> > No, we don't. It's truethat a client that wants to truly *understand*<br /> > the plan has to know a lot of things, but the fundamental formatof<br /> > EXPLAIN ANALYZE output has been real stable for a real long time:<br /> ><br /> > node name (cost=xxx.xx..xxx.xxrows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)<br /> > detail line: somethingor other<br /> > -> subnode name ... more of the same ...<br /> ><br /> > This level of understandingseems plenty sufficient for something like<br /> > <a href="http://explain.depesz.com">explain.depesz.com</a>,to name just one popular tool. The last format<br /> > changeof any kind we made in this skeleton was to increase the number<br /> > of decimal places in the "actual time" numbersfrom 2 to 3 (wow).<br /> > That was in 7.4. Modulo that detail, this basic contract has been valid<br /> >since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch<br /> > will break it.<br /> ><br /> >It might be interesting for somebody to go look at Hubert's code and see<br /> > just how much it really knows aboutthe EXPLAIN output format, and how<br /> > much it's had to change across PG releases.<br /> ><p>Haven't lookedat what changes with this patch, but dont forget PgAdmin that also parses the output. Though if the format changesenough to affect it, that might be the driving force to have it use xml format instead, which is the one that is intendedfor machine parsing after all.. <p>/Magnus
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I agree; we make bigger changes than this all the time. > > No, we don't. Alright, do we want to go down the road of adding new things to the XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT version, to avoid this concern? Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > How much has that code been updated from one release to the next? Just an FYI, I talked to depesz on IRC (please chime in if you disagree with any of this) and he indicated that he's had to update the code from time to time, mostly because the parser was too strict. He also mentioned that he didn't feel it was terribly complicated or that it'd be difficult to update for this. Looking over the code, it's got a simple regex for matching that line which would have to be updated, but I don't think it'd require much more than that. Thanks, Stephen
On Thu, Jan 20, 2011 at 1:47 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >> > I agree; we make bigger changes than this all the time. >> >> No, we don't. > > Alright, do we want to go down the road of adding new things to the > XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT > version, to avoid this concern? No, because, for one thing, the text output is what people are going to send me when they want me to fix their crap. If the information isn't there, I lose. And no, I don't want them to send me the XML. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > How much has that code been updated from one release to the next? > > Just an FYI, I talked to depesz on IRC (please chime in if you disagree > with any of this) and he indicated that he's had to update the code > from time to time, mostly because the parser was too strict. > > He also mentioned that he didn't feel it was terribly complicated or > that it'd be difficult to update for this. Looking over the code, it's > got a simple regex for matching that line which would have to be > updated, but I don't think it'd require much more than that. i'll be happy to update the Pg::Explain to handle new elements of textual plans, so if this would be of concern - please don't treat "compatibility with explain.depesz.com" as your responsibility/problem. I'll fix the parser (have to add json/xml parsing too anyway), and I, too, would love to get more information. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
hubert depesz lubaczewski <depesz@depesz.com> writes: > On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote: >> He also mentioned that he didn't feel it was terribly complicated or >> that it'd be difficult to update for this. Looking over the code, it's >> got a simple regex for matching that line which would have to be >> updated, but I don't think it'd require much more than that. > i'll be happy to update the Pg::Explain to handle new elements of > textual plans, so if this would be of concern - please don't treat > "compatibility with explain.depesz.com" as your responsibility/problem. The point isn't whether it'd be "terribly difficult" to update client side EXPLAIN-parsing code ... it's whether we should break it in the first place. I don't find the proposed format so remarkably well-designed that it's worth creating compatibility problems for. The main functional problem I see with this format is that it assumes there is one and only one filter step associated with every plan node. That is just plain wrong. Many don't have any, and there are important cases where there are two. I'm thinking in particular that it might be useful to distinguish the effects of the recheck and the filter conditions of a bitmap heap scan. Maybe it'd also be interesting to separate the join and non-join filter clauses of a join node, though I'm less sure about the usefulness of that. So the line I'm thinking we should pursue is to visually associate the new counter with the filter condition, either like Filter Cond: (x > 42) (nfiltered = 123) or Filter Cond: (x > 42)Rows Filtered: 123 The latter is less ambiguous, but takes more vertical space. The former is very unlikely to break any client code, because I doubt there is any that inquires into the details of what a filter condition expression really means. The latter *might* break code depending on how much it assumes about the number of detail lines attached to a plan node ... but as Robert pointed out, we've added new detail lines before. BTW, is it just me, or is the terminology "number filtered" pretty confusing/ambiguous in itself? It doesn't seem at all clear to me whether that's the number of rows passed by the filter condition or the number of rows rejected. Perhaps "nremoved" would be clearer. regards, tom lane
On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The main functional problem I see with this format is that it assumes > there is one and only one filter step associated with every plan node. > That is just plain wrong. Many don't have any, and there are important > cases where there are two. I'm thinking in particular that it might be > useful to distinguish the effects of the recheck and the filter > conditions of a bitmap heap scan. If it's not too hard to do that, I'm all in favor. > Maybe it'd also be interesting to > separate the join and non-join filter clauses of a join node, though > I'm less sure about the usefulness of that. That would also be extremely useful. > So the line I'm thinking we should pursue is to visually associate the > new counter with the filter condition, either like > > Filter Cond: (x > 42) (nfiltered = 123) > > or > > Filter Cond: (x > 42) > Rows Filtered: 123 > > The latter is less ambiguous, but takes more vertical space. The former > is very unlikely to break any client code, because I doubt there is any > that inquires into the details of what a filter condition expression > really means. The latter *might* break code depending on how much > it assumes about the number of detail lines attached to a plan node > ... but as Robert pointed out, we've added new detail lines before. I like the idea of putting it on the same line as the filter condition, but your proposal for how to do that doesn't wow me - the parentheses look too similar to the ones around the qual itself. > BTW, is it just me, or is the terminology "number filtered" pretty > confusing/ambiguous in itself? It doesn't seem at all clear to me > whether that's the number of rows passed by the filter condition or > the number of rows rejected. Perhaps "nremoved" would be clearer. I think filtered is pretty clear and like it... removed sounds like you deleted something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > I think filtered is pretty clear and like it... I find it ambiguous. [Takes sip of filtered water.] How about excluded? -Kevin
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, is it just me, or is the terminology "number filtered" pretty >> confusing/ambiguous in itself? �It doesn't seem at all clear to me >> whether that's the number of rows passed by the filter condition or >> the number of rows rejected. �Perhaps "nremoved" would be clearer. > I think filtered is pretty clear and like it... removed sounds like > you deleted something. Well, you did delete something, no? There are rows that aren't in the output that would have been there if not for the filter condition. And, btw, one person thinking it's clear doesn't make it so. There are actually three numbers that could be involved here: the number of rows arriving at the filter, the number passed by it, and the number rejected by it. I think that "nfiltered" could possibly mean any of those three. A non-native speaker of English would be even less likely to be sure of what was meant. regards, tom lane
On 1/20/2011 12:47 PM, Tom Lane wrote: > So the line I'm thinking we should pursue is to visually associate the > new counter with the filter condition, either like > > Filter Cond: (x> 42) (nfiltered = 123) > > or > > Filter Cond: (x> 42) > Rows Filtered: 123 I'd prefer the latter. Sometimes the Filter Cond is very complex and finding the nfiltered information would be easier if it always had its own row. Regards, Marko Tiikkaja
On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I think filtered is pretty clear and like it... > > I find it ambiguous. [Takes sip of filtered water.] Oh, you mean water that had some things you didn't want taken out of it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 20, 2011 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> BTW, is it just me, or is the terminology "number filtered" pretty >>> confusing/ambiguous in itself? It doesn't seem at all clear to me >>> whether that's the number of rows passed by the filter condition or >>> the number of rows rejected. Perhaps "nremoved" would be clearer. > >> I think filtered is pretty clear and like it... removed sounds like >> you deleted something. > > Well, you did delete something, no? There are rows that aren't in the > output that would have been there if not for the filter condition. What I mean to say is that I fear that removed would give the impression that some modification had been made to the database. Perhaps that's silly, but it's what came to mind. > And, btw, one person thinking it's clear doesn't make it so. That's why I said "I think" rather than "Any fool should be able to see that". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Robert Haas wrote: > On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner > wrote: >> Robert Haas wrote: >> >>> I think filtered is pretty clear and like it... >> >> I find it ambiguous. [Takes sip of filtered water.] > > Oh, you mean water that had some things you didn't want taken out > of it? Right -- God only knows the number of things were filtered out to leave me with filtered water. What's "filtered" in this case is what was passed through, not what was removed. I hadn't even thought about "filtered" as meaning the input to the filtering process until Tom mentioned it, but on a different day, in a different mood, it might also be what I assumed was meant by "number filtered". It's kinda hard to imagine a *more* ambiguous term. -Kevin
On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Right -- God only knows the number of things were filtered out to > leave me with filtered water. What's "filtered" in this case is what > was passed through, not what was removed. Hmm, I guess I see your point now. Well, I'm not wedded to the name, but I don't like removed any better. Rows eliminated? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Robert Haas wrote: >> Oh, you mean water that had some things you didn't want taken out >> of it? > Right -- God only knows the number of things were filtered out to > leave me with filtered water. What's "filtered" in this case is what > was passed through, not what was removed. I think it's pretty common to use the phrase "filtered out" to identify the stuff that gets removed by the filter, as opposed to what gets through. So we could possibly use "Rows Filtered Out: nnn". I still think that's more awkward than "Rows Removed: nnn" though. regards, tom lane
On 21/01/11 15:24, Robert Haas wrote: > On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Right -- God only knows the number of things were filtered out to >> leave me with filtered water. What's "filtered" in this case is what >> was passed through, not what was removed. > Hmm, I guess I see your point now. Well, I'm not wedded to the name, > but I don't like removed any better. > > Rows eliminated? > Rows filtered *out* ?
On Thu, Jan 20, 2011 at 9:35 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > On 21/01/11 15:24, Robert Haas wrote: >> >> On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner >> <Kevin.Grittner@wicourts.gov> wrote: >>> >>> Right -- God only knows the number of things were filtered out to >>> leave me with filtered water. What's "filtered" in this case is what >>> was passed through, not what was removed. >> >> Hmm, I guess I see your point now. Well, I'm not wedded to the name, >> but I don't like removed any better. >> >> Rows eliminated? >> > > Rows filtered *out* ? Seems like Tom just had the same thought. Works for me. I'm still not thrilled with the proposed formatting, but I can probably live with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan21, 2011, at 03:29 , Tom Lane wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> Robert Haas wrote: >>> Oh, you mean water that had some things you didn't want taken out >>> of it? > >> Right -- God only knows the number of things were filtered out to >> leave me with filtered water. What's "filtered" in this case is what >> was passed through, not what was removed. > > I think it's pretty common to use the phrase "filtered out" to identify > the stuff that gets removed by the filter, as opposed to what gets > through. So we could possibly use "Rows Filtered Out: nnn". I still > think that's more awkward than "Rows Removed: nnn" though. "Rows Skipped: nnn", maybe? best regards, Florian Pflug
2011/1/21 Florian Pflug <fgp@phlo.org>: > On Jan21, 2011, at 03:29 , Tom Lane wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>> Robert Haas wrote: >>>> Oh, you mean water that had some things you didn't want taken out >>>> of it? >> >>> Right -- God only knows the number of things were filtered out to >>> leave me with filtered water. What's "filtered" in this case is what >>> was passed through, not what was removed. >> >> I think it's pretty common to use the phrase "filtered out" to identify >> the stuff that gets removed by the filter, as opposed to what gets >> through. So we could possibly use "Rows Filtered Out: nnn". I still >> think that's more awkward than "Rows Removed: nnn" though. > > "Rows Skipped: nnn", maybe? +1. Very straightforward to me. Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2011/1/21 Florian Pflug <fgp@phlo.org>: >> "Rows Skipped: nnn", maybe? > +1. Very straightforward to me. I didn't really care for that one, because I think it *won't* be straightforward when there's more than one filter condition at a node. Imagine Bitmap Heap Scan ... Recheck Cond: blah blah Rows Skipped: 42 Filter Cond: blah blah blah Rows Skipped: 77 To me, "rows skipped" sounds like a statement about the overall behavior of the plan node, and thus the above looks contradictory. Another point is that even if you're okay with the above for textual output, we do not have a choice about choosing distinct field names for the two counts for XML/JSON output. Reflecting on that, I'm inclined to suggest Bitmap Heap Scan ... Recheck Cond: blah blah Rows Removed by Recheck: 42 Filter Cond: blah blah blah Rows Removedby Filter: 77 or even more verbosely Bitmap Heap Scan ... Recheck Cond: blah blah Rows Removed by Recheck Cond: 42 Filter Cond: blah blah blah RowsRemoved by Filter Cond: 77 ie repeat the label of the filtering condition exactly. This is looking pretty long, but from the viewpoint of vertical or horizontal space occupied by the printout, I doubt it matters. regards, tom lane
On Sat, Jan 22, 2011 at 11:55:51AM -0500, Tom Lane wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: > > 2011/1/21 Florian Pflug <fgp@phlo.org>: > >> "Rows Skipped: nnn", maybe? > > > +1. Very straightforward to me. > > I didn't really care for that one, because I think it *won't* be > straightforward when there's more than one filter condition at a node. > Imagine > > Bitmap Heap Scan ... > Recheck Cond: blah blah > Rows Skipped: 42 > Filter Cond: blah blah blah > Rows Skipped: 77 > > To me, "rows skipped" sounds like a statement about the overall behavior > of the plan node, and thus the above looks contradictory. Another point > is that even if you're okay with the above for textual output, we do not > have a choice about choosing distinct field names for the two counts for > XML/JSON output. > > Reflecting on that, I'm inclined to suggest > > Bitmap Heap Scan ... > Recheck Cond: blah blah > Rows Removed by Recheck: 42 > Filter Cond: blah blah blah > Rows Removed by Filter: 77 > > or even more verbosely > > Bitmap Heap Scan ... > Recheck Cond: blah blah > Rows Removed by Recheck Cond: 42 > Filter Cond: blah blah blah > Rows Removed by Filter Cond: 77 > > ie repeat the label of the filtering condition exactly. This is looking > pretty long, but from the viewpoint of vertical or horizontal space > occupied by the printout, I doubt it matters. +1 for this. It says what happened. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Jan22, 2011, at 17:55 , Tom Lane wrote: > Reflecting on that, I'm inclined to suggest > > Bitmap Heap Scan ... > Recheck Cond: blah blah > Rows Removed by Recheck: 42 > Filter Cond: blah blah blah > Rows Removed by Filter: 77 > > or even more verbosely > > Bitmap Heap Scan ... > Recheck Cond: blah blah > Rows Removed by Recheck Cond: 42 > Filter Cond: blah blah blah > Rows Removed by Filter Cond: 77 > > ie repeat the label of the filtering condition exactly. This is looking > pretty long, but from the viewpoint of vertical or horizontal space > occupied by the printout, I doubt it matters. +1. Repeating the label of the condition adds enough context to make "Removed" unambiguous IMHO. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Jan22, 2011, at 17:55 , Tom Lane wrote: >> Reflecting on that, I'm inclined to suggest >> Bitmap Heap Scan ... >> Recheck Cond: blah blah >> Rows Removed by Recheck Cond: 42 >> Filter Cond: blah blah blah >> Rows Removed by Filter Cond: 77 > +1. Repeating the label of the condition adds enough context to make > "Removed" unambiguous IMHO. Given where we've ended up on what we want printed, I'm forced to conclude that there is basically nothing usable in the submitted patch. We have the following problems: 1. It only instruments the ExecQual() call in execScan.c. There are close to 20 other calls that also need instrumentation, unless we intentionally decide not to bother with counting results for certain filters (but see #4). 2. Putting the counter in struct Instrumentation doesn't scale very nicely to cases with more than one qual per node. We could put some arbitrary number of counters into that struct and make some arbitrary assignments of which is used for what, but ugh. I am tempted to suggest that these counters should go right into the PlanState nodes for the relevant plan node types; which would likely mean that they'd get incremented unconditionally whether we're running EXPLAIN ANALYZE or not. Offhand I don't have a problem with that --- it's not apparent to me that if (node->ps.instrument) node->counter += 1; is faster than an unconditional node->counter += 1; on modern machines in the first place, and in the second place we have certainly expended many more cycles than that by the time we have completed a failing ExecQual call, so it's unlikely to matter. 3. The additions to explain.c of course need complete revision to support this output style. Likewise the documentation (and as was mentioned upthread this isn't enough documentation anyway, since it utterly fails to explain what nfiltered is to users). 4. I'm not entirely sure what to do with nodeResult's resconstantqual. If we do instrument that, it would have to be done differently from everywhere else, since execQual is called only once not once per tuple. But on the whole I think we could leave it out, since it's pretty obvious from the node's overall behavior whether the qual passed or not. I had thought perhaps I could fix this patch up and commit it, but a complete rewrite seems beyond the bounds of what should happen during a CommitFest, especially since there are lots of other patches in need of attention. So I'm marking it Returned With Feedback. regards, tom lane
On 1/24/2011 7:02 PM, Tom Lane wrote: > Given where we've ended up on what we want printed, I'm forced to > conclude that there is basically nothing usable in the submitted patch. I personally feel that if we could even add this for explicit Filter conditions, people would be a lot happier. While I agree that having all the fancy stuff discussed in this thread would be nice, I don't think they're worth postponing the Filter part to 9.2. Regards, Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 1/24/2011 7:02 PM, Tom Lane wrote: >> Given where we've ended up on what we want printed, I'm forced to >> conclude that there is basically nothing usable in the submitted patch. > I personally feel that if we could even add this for explicit Filter > conditions, people would be a lot happier. While I agree that having > all the fancy stuff discussed in this thread would be nice, I don't > think they're worth postponing the Filter part to 9.2. I think there's probably only a day or two's work involved in coding up what I sketched. If you were to commit to doing that pretty quickly, I'd personally be happy to regard the patch as Waiting On Author rather than postponed to 9.2. regards, tom lane