Thread: REVIEW: EXPLAIN and nfiltered

REVIEW: EXPLAIN and nfiltered

From
Stephen Frost
Date:
Greetings,

On 2010-01-15 11:37 PM +200, Marko Tiikkaja wrote:
> On 2010-11-18 5:45 PM +0200, Marko Tiikkaja wrote:
> > Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan
> > qual filtered from a node's input.
>
> Rebased against master.

This patch looked good, in general, to me.  I added a few documentation
updates and a comment, but it's a very straight-forward patch as far as
I can tell.  Passes all regressions and my additional testing.

commit fac899f7967ce74e14a90af9ca24e1a1f5a580e7
Author: Stephen Frost <sfrost@snowman.net>
Date:   Wed Jan 19 22:14:54 2011 -0500

    Fix < & > in docs to be < >, as required.

commit 5fcdb75a646912b8b273703caf33dadb80122e1c
Author: Stephen Frost <sfrost@snowman.net>
Date:   Wed Jan 19 22:05:05 2011 -0500

    Update documentation for EXPLAIN ANALYZE/nfiltered

    This patch updates some documentation around EXPLAIN ANALYZE, whose
    output has been changed by the patch which added nfiltered to it.

    Also added a comment in the only place that seemed to need one.

commit 9ebb0108a217c2d3b7f815d1d902d6bdcc276104
Author: Stephen Frost <sfrost@snowman.net>
Date:   Wed Jan 19 21:33:28 2011 -0500

    Add nfiltered in EXPLAIN ANALYZE

    This patch add the number of rows a plan qual filtered from a node's
    input to the EXPLAIN ANALYZE output.

    Patch by: Marko Tiikkaja

    Thanks,

        Stephen

Attachment

Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote:
> This patch looked good, in general, to me.  I added a few documentation
> updates and a comment, but it's a very straight-forward patch as far as
> I can tell.  Passes all regressions and my additional testing.

I have not looked at the code for this patch at all as yet, but just
as a general user comment - I really, really want this.  It's one of
about, uh, two pieces of information that the EXPLAIN output doesn't
give you that is incredibly important for troubleshooting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Itagaki Takahiro
Date:
On Thu, Jan 20, 2011 at 12:16, Stephen Frost <sfrost@snowman.net> wrote:
> This patch looked good, in general, to me.  I added a few documentation
> updates and a comment, but it's a very straight-forward patch as far as
> I can tell.  Passes all regressions and my additional testing.

Looks good and useful for me, too.

We need to adjust a bit more documentation. The patch changes all of
EXPLAIN ANALYZE outputs. When I grep'ed the docs with "loops=",
EXPLAIN ANALYZE is also used in perform.sgml and auto-explain.sgml
in addition to explain.sgml.

It's good to have documentation about "nfiltered" parameter. The best
place would be around the descriptions of "loops" in "Using EXPLAIN" page:

http://developer.postgresql.org/pgdocs/postgres/using-explain.html

--
Itagaki Takahiro


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> This patch looked good, in general, to me. �I added a few documentation
>> updates and a comment, but it's a very straight-forward patch as far as
>> I can tell. �Passes all regressions and my additional testing.

> I have not looked at the code for this patch at all as yet, but just
> as a general user comment - I really, really want this.  It's one of
> about, uh, two pieces of information that the EXPLAIN output doesn't
> give you that is incredibly important for troubleshooting.

What's the other one?

The main problem I've got with this patch is that there's no place to
shoehorn the information into the textual EXPLAIN format without
breaking a lot of expectations (and hence code --- it's insane to
imagine that any significant amount of client-side code has been
rewritten to make use of xml/json output yet).  It would be nice to know
what other requests are likely to be coming down the pike before we
decide exactly how we're going to break things.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote:
>>> This patch looked good, in general, to me.  I added a few documentation
>>> updates and a comment, but it's a very straight-forward patch as far as
>>> I can tell.  Passes all regressions and my additional testing.
>
>> I have not looked at the code for this patch at all as yet, but just
>> as a general user comment - I really, really want this.  It's one of
>> about, uh, two pieces of information that the EXPLAIN output doesn't
>> give you that is incredibly important for troubleshooting.
>
> What's the other one?

In the following sort of plan:

rhaas=# explain analyze select * from bob b, sally s where b.a = s.a;
   QUERY PLAN 

---------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.00..117890.00 rows=1000 width=8) (actual 
time=0.036..533.372 rows=1000 loops=1)  ->  Seq Scan on sally s  (cost=0.00..5770.00 rows=400000 width=4)
(actual time=0.014..46.469 rows=400000 loops=1)  ->  Index Scan using bob_pkey on bob b  (cost=0.00..0.27 rows=1
width=4) (actual time=0.001..0.001 rows=0 loops=400000)        Index Cond: (a = s.a)Total runtime: 533.935 ms
(5 rows)

...you cannot really tell how many rows the index scan was expected to
match, or actually did match.  The answer to the latter question
certainly isn't 0.  We previously discussed making the rows= line go
out to three decimal places when used in an inner-index-scan context,
which would help a lot - you could then multiply rows by loops to get
an approximate answer.  My preferred fix would be just to remove the
unhelpful division-by-nloops code that gets applied in this case, but
that's less backward-compatible.

> The main problem I've got with this patch is that there's no place to
> shoehorn the information into the textual EXPLAIN format without
> breaking a lot of expectations (and hence code --- it's insane to
> imagine that any significant amount of client-side code has been
> rewritten to make use of xml/json output yet).  It would be nice to know
> what other requests are likely to be coming down the pike before we
> decide exactly how we're going to break things.

It's hard to predict the nature of future feature requests, but this
and the above are at the top of my list of ongoing gripes, and there
isn't a close third.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> The main problem I've got with this patch is that there's no place to
> shoehorn the information into the textual EXPLAIN format without
> breaking a lot of expectations (and hence code --- it's insane to
> imagine that any significant amount of client-side code has been
> rewritten to make use of xml/json output yet).  It would be nice to know
> what other requests are likely to be coming down the pike before we
> decide exactly how we're going to break things.

While I agree completely about the general "if you're going to break,
break it big" approach, but I don't particularly care for holding output
strings from EXPLAIN to the same level that we do the wireline protocol.
This is going into a new major version, not something which is being
back-patched, and users now have a way in a released version to get away
from relying on the string output.

Have we worried about adding new plan nodes due to breakage in the
explain output..?  It strikes me that we've actually changed it with
some regularity, in one aspect or another, over a couple of releases.
Maybe my memory is bad though.
Thanks,
    Stephen

Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> The main problem I've got with this patch is that there's no place to
>> shoehorn the information into the textual EXPLAIN format without
>> breaking a lot of expectations (and hence code --- it's insane to
>> imagine that any significant amount of client-side code has been
>> rewritten to make use of xml/json output yet).  It would be nice to know
>> what other requests are likely to be coming down the pike before we
>> decide exactly how we're going to break things.
>
> While I agree completely about the general "if you're going to break,
> break it big" approach, but I don't particularly care for holding output
> strings from EXPLAIN to the same level that we do the wireline protocol.
> This is going into a new major version, not something which is being
> back-patched, and users now have a way in a released version to get away
> from relying on the string output.

I agree; we make bigger changes than this all the time.  At the risk
of putting words in Tom's mouth, I think part of the concern here may
be that the EXPLAIN output is quite verbose already, and adding a few
more details is going to make it harder to read in the cases where you
*don't* care about this additional information.  That's a valid
concern, but I don't know what to do about it - not having this
information available isn't better.  It's tempting to propose moving
the "actual" numbers down to the next line, so that the lines aren't
so ridiculously long.

Looking at the patch, I have to say I had hoped this was going to show
nfiltered in both the estimated and actual cases, which it doesn't.
Now maybe that's more work than we want to put in, but it would be
nice to have.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Marko Tiikkaja
Date:
On 2011-01-20 7:07 PM +0200, Robert Haas wrote:
> Looking at the patch, I have to say I had hoped this was going to show
> nfiltered in both the estimated and actual cases, which it doesn't.
> Now maybe that's more work than we want to put in, but it would be
> nice to have.

That would be fantastical, if we can make it happen.


Regards,
Marko Tiikkaja


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> While I agree completely about the general "if you're going to break,
>> break it big" approach, but I don't particularly care for holding output
>> strings from EXPLAIN to the same level that we do the wireline protocol.

> I agree; we make bigger changes than this all the time.

No, we don't.  It's true that a client that wants to truly *understand*
the plan has to know a lot of things, but the fundamental format of
EXPLAIN ANALYZE output has been real stable for a real long time:
node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)  detail line:
somethingor other  ->  subnode name  ... more of the same ...
 

This level of understanding seems plenty sufficient for something like
explain.depesz.com, to name just one popular tool.  The last format
change of any kind we made in this skeleton was to increase the number
of decimal places in the "actual time" numbers from 2 to 3 (wow).
That was in 7.4.  Modulo that detail, this basic contract has been valid
since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
will break it.

It might be interesting for somebody to go look at Hubert's code and see
just how much it really knows about the EXPLAIN output format, and how
much it's had to change across PG releases.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 12:57 PM, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Jan 20, 2011 6:43 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <sfrost@snowman.net>
>> > wrote:
>> >> While I agree completely about the general "if you're going to break,
>> >> break it big" approach, but I don't particularly care for holding
>> >> output
>> >> strings from EXPLAIN to the same level that we do the wireline
>> >> protocol.
>>
>> > I agree; we make bigger changes than this all the time.
>>
>> No, we don't.  It's true that a client that wants to truly *understand*
>> the plan has to know a lot of things, but the fundamental format of
>> EXPLAIN ANALYZE output has been real stable for a real long time:
>>
>>  node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual
>> time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
>>   detail line: something or other
>>   ->  subnode name  ... more of the same ...
>>
>> This level of understanding seems plenty sufficient for something like
>> explain.depesz.com, to name just one popular tool.  The last format
>> change of any kind we made in this skeleton was to increase the number
>> of decimal places in the "actual time" numbers from 2 to 3 (wow).
>> That was in 7.4.  Modulo that detail, this basic contract has been valid
>> since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
>> will break it.
>>
>> It might be interesting for somebody to go look at Hubert's code and see
>> just how much it really knows about the EXPLAIN output format, and how
>> much it's had to change across PG releases.
>>
>
> Haven't looked at what changes with this patch, but dont forget PgAdmin that
> also parses the output. Though if the format changes enough to affect it,
> that might be the driving force to have it use xml format instead, which is
> the one that is intended for machine parsing after all..

How much has that code been updated from one release to the next?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Magnus Hagander
Date:
<p><br /> On Jan 20, 2011 6:43 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>
writes:<br/> > > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost <<a
href="mailto:sfrost@snowman.net">sfrost@snowman.net</a>>wrote:<br /> > >> While I agree completely about
thegeneral "if you're going to break,<br /> > >> break it big" approach, but I don't particularly care for
holdingoutput<br /> > >> strings from EXPLAIN to the same level that we do the wireline protocol.<br />
><br/> > > I agree; we make bigger changes than this all the time.<br /> ><br /> > No, we don't.  It's
truethat a client that wants to truly *understand*<br /> > the plan has to know a lot of things, but the fundamental
formatof<br /> > EXPLAIN ANALYZE output has been real stable for a real long time:<br /> ><br /> >  node name
 (cost=xxx.xx..xxx.xxrows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)<br /> >   detail line:
somethingor other<br /> >   ->  subnode name  ... more of the same ...<br /> ><br /> > This level of
understandingseems plenty sufficient for something like<br /> > <a
href="http://explain.depesz.com">explain.depesz.com</a>,to name just one popular tool.  The last format<br /> >
changeof any kind we made in this skeleton was to increase the number<br /> > of decimal places in the "actual time"
numbersfrom 2 to 3 (wow).<br /> > That was in 7.4.  Modulo that detail, this basic contract has been valid<br />
>since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch<br /> > will break it.<br /> ><br />
>It might be interesting for somebody to go look at Hubert's code and see<br /> > just how much it really knows
aboutthe EXPLAIN output format, and how<br /> > much it's had to change across PG releases.<br /> ><p>Haven't
lookedat what changes with this patch, but dont forget PgAdmin that also parses the output. Though if the format
changesenough to affect it, that might be the driving force to have it use xml format instead, which is the one that is
intendedfor machine parsing after all.. <p>/Magnus 

Re: REVIEW: EXPLAIN and nfiltered

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I agree; we make bigger changes than this all the time.
>
> No, we don't.

Alright, do we want to go down the road of adding new things to the
XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
version, to avoid this concern?
Stephen

Re: REVIEW: EXPLAIN and nfiltered

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> How much has that code been updated from one release to the next?

Just an FYI, I talked to depesz on IRC (please chime in if you disagree
with any of this) and he indicated that he's had to update the code
from time to time, mostly because the parser was too strict.

He also mentioned that he didn't feel it was terribly complicated or
that it'd be difficult to update for this.  Looking over the code, it's
got a simple regex for matching that line which would have to be
updated, but I don't think it'd require much more than that.
Thanks,
    Stephen

Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 1:47 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > I agree; we make bigger changes than this all the time.
>>
>> No, we don't.
>
> Alright, do we want to go down the road of adding new things to the
> XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
> version, to avoid this concern?

No, because, for one thing, the text output is what people are going
to send me when they want me to fix their crap.  If the information
isn't there, I lose.  And no, I don't want them to send me the XML.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
hubert depesz lubaczewski
Date:
On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > How much has that code been updated from one release to the next?
> 
> Just an FYI, I talked to depesz on IRC (please chime in if you disagree
> with any of this) and he indicated that he's had to update the code
> from time to time, mostly because the parser was too strict.
> 
> He also mentioned that he didn't feel it was terribly complicated or
> that it'd be difficult to update for this.  Looking over the code, it's
> got a simple regex for matching that line which would have to be
> updated, but I don't think it'd require much more than that.

i'll be happy to update the Pg::Explain to handle new elements of
textual plans, so if this would be of concern - please don't treat
"compatibility with explain.depesz.com" as your responsibility/problem.

I'll fix the parser (have to add json/xml parsing too anyway), and I,
too, would love to get more information.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
>> He also mentioned that he didn't feel it was terribly complicated or
>> that it'd be difficult to update for this.  Looking over the code, it's
>> got a simple regex for matching that line which would have to be
>> updated, but I don't think it'd require much more than that.

> i'll be happy to update the Pg::Explain to handle new elements of
> textual plans, so if this would be of concern - please don't treat
> "compatibility with explain.depesz.com" as your responsibility/problem.

The point isn't whether it'd be "terribly difficult" to update client
side EXPLAIN-parsing code ... it's whether we should break it in the
first place.  I don't find the proposed format so remarkably
well-designed that it's worth creating compatibility problems for.

The main functional problem I see with this format is that it assumes
there is one and only one filter step associated with every plan node.
That is just plain wrong.  Many don't have any, and there are important
cases where there are two.  I'm thinking in particular that it might be
useful to distinguish the effects of the recheck and the filter
conditions of a bitmap heap scan.  Maybe it'd also be interesting to
separate the join and non-join filter clauses of a join node, though
I'm less sure about the usefulness of that.

So the line I'm thinking we should pursue is to visually associate the
new counter with the filter condition, either like
Filter Cond: (x > 42)  (nfiltered = 123)

or
Filter Cond: (x > 42)Rows Filtered: 123

The latter is less ambiguous, but takes more vertical space.  The former
is very unlikely to break any client code, because I doubt there is any
that inquires into the details of what a filter condition expression
really means.  The latter *might* break code depending on how much
it assumes about the number of detail lines attached to a plan node
... but as Robert pointed out, we've added new detail lines before.

BTW, is it just me, or is the terminology "number filtered" pretty
confusing/ambiguous in itself?  It doesn't seem at all clear to me
whether that's the number of rows passed by the filter condition or
the number of rows rejected.  Perhaps "nremoved" would be clearer.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The main functional problem I see with this format is that it assumes
> there is one and only one filter step associated with every plan node.
> That is just plain wrong.  Many don't have any, and there are important
> cases where there are two.  I'm thinking in particular that it might be
> useful to distinguish the effects of the recheck and the filter
> conditions of a bitmap heap scan.

If it's not too hard to do that, I'm all in favor.

> Maybe it'd also be interesting to
> separate the join and non-join filter clauses of a join node, though
> I'm less sure about the usefulness of that.

That would also be extremely useful.

> So the line I'm thinking we should pursue is to visually associate the
> new counter with the filter condition, either like
>
>        Filter Cond: (x > 42)  (nfiltered = 123)
>
> or
>
>        Filter Cond: (x > 42)
>        Rows Filtered: 123
>
> The latter is less ambiguous, but takes more vertical space.  The former
> is very unlikely to break any client code, because I doubt there is any
> that inquires into the details of what a filter condition expression
> really means.  The latter *might* break code depending on how much
> it assumes about the number of detail lines attached to a plan node
> ... but as Robert pointed out, we've added new detail lines before.

I like the idea of putting it on the same line as the filter
condition, but your proposal for how to do that doesn't wow me - the
parentheses look too similar to the ones around the qual itself.

> BTW, is it just me, or is the terminology "number filtered" pretty
> confusing/ambiguous in itself?  It doesn't seem at all clear to me
> whether that's the number of rows passed by the filter condition or
> the number of rows rejected.  Perhaps "nremoved" would be clearer.

I think filtered is pretty clear and like it...  removed sounds like
you deleted something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I think filtered is pretty clear and like it...
I find it ambiguous.  [Takes sip of filtered water.]  How about
excluded?
-Kevin


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, is it just me, or is the terminology "number filtered" pretty
>> confusing/ambiguous in itself? �It doesn't seem at all clear to me
>> whether that's the number of rows passed by the filter condition or
>> the number of rows rejected. �Perhaps "nremoved" would be clearer.

> I think filtered is pretty clear and like it...  removed sounds like
> you deleted something.

Well, you did delete something, no?  There are rows that aren't in the
output that would have been there if not for the filter condition.

And, btw, one person thinking it's clear doesn't make it so.  There
are actually three numbers that could be involved here: the number of
rows arriving at the filter, the number passed by it, and the number
rejected by it.  I think that "nfiltered" could possibly mean any of
those three.  A non-native speaker of English would be even less
likely to be sure of what was meant.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Marko Tiikkaja
Date:
On 1/20/2011 12:47 PM, Tom Lane wrote:
> So the line I'm thinking we should pursue is to visually associate the
> new counter with the filter condition, either like
>
>     Filter Cond: (x>  42)  (nfiltered = 123)
>
> or
>
>     Filter Cond: (x>  42)
>     Rows Filtered: 123

I'd prefer the latter.  Sometimes the Filter Cond is very complex and 
finding the nfiltered information would be easier if it always had its 
own row.


Regards,
Marko Tiikkaja


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think filtered is pretty clear and like it...
>
> I find it ambiguous.  [Takes sip of filtered water.]

Oh, you mean water that had some things you didn't want taken out of it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> BTW, is it just me, or is the terminology "number filtered" pretty
>>> confusing/ambiguous in itself?  It doesn't seem at all clear to me
>>> whether that's the number of rows passed by the filter condition or
>>> the number of rows rejected.  Perhaps "nremoved" would be clearer.
>
>> I think filtered is pretty clear and like it...  removed sounds like
>> you deleted something.
>
> Well, you did delete something, no?  There are rows that aren't in the
> output that would have been there if not for the filter condition.

What I mean to say is that I fear that removed would give the
impression that some modification had been made to the database.
Perhaps that's silly, but it's what came to mind.

> And, btw, one person thinking it's clear doesn't make it so.

That's why I said "I think" rather than "Any fool should be able to see that".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
"Kevin Grittner"
Date:
> Robert Haas  wrote:
> On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner
>  wrote:
>> Robert Haas  wrote:
>>
>>> I think filtered is pretty clear and like it...
>>
>> I find it ambiguous. [Takes sip of filtered water.]
> 
> Oh, you mean water that had some things you didn't want taken out
> of it?
Right -- God only knows the number of things were filtered out to
leave me with filtered water.  What's "filtered" in this case is what
was passed through, not what was removed.
I hadn't even thought about "filtered" as meaning the input to the
filtering process until Tom mentioned it, but on a different day, in
a different mood, it might also be what I assumed was meant by
"number filtered".
It's kinda hard to imagine a *more* ambiguous term.
-Kevin


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Right -- God only knows the number of things were filtered out to
> leave me with filtered water.  What's "filtered" in this case is what
> was passed through, not what was removed.

Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
but I don't like removed any better.

Rows eliminated?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Robert Haas  wrote:
>> Oh, you mean water that had some things you didn't want taken out
>> of it?
> Right -- God only knows the number of things were filtered out to
> leave me with filtered water.  What's "filtered" in this case is what
> was passed through, not what was removed.

I think it's pretty common to use the phrase "filtered out" to identify
the stuff that gets removed by the filter, as opposed to what gets
through.  So we could possibly use "Rows Filtered Out: nnn".  I still
think that's more awkward than "Rows Removed: nnn" though.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Mark Kirkwood
Date:
On 21/01/11 15:24, Robert Haas wrote:
> On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov>  wrote:
>> Right -- God only knows the number of things were filtered out to
>> leave me with filtered water.  What's "filtered" in this case is what
>> was passed through, not what was removed.
> Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
> but I don't like removed any better.
>
> Rows eliminated?
>

Rows filtered *out* ?


Re: REVIEW: EXPLAIN and nfiltered

From
Robert Haas
Date:
On Thu, Jan 20, 2011 at 9:35 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 21/01/11 15:24, Robert Haas wrote:
>>
>> On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov>  wrote:
>>>
>>> Right -- God only knows the number of things were filtered out to
>>> leave me with filtered water.  What's "filtered" in this case is what
>>> was passed through, not what was removed.
>>
>> Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
>> but I don't like removed any better.
>>
>> Rows eliminated?
>>
>
> Rows filtered *out* ?

Seems like Tom just had the same thought.  Works for me.  I'm still
not thrilled with the proposed formatting, but I can probably live
with it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: REVIEW: EXPLAIN and nfiltered

From
Florian Pflug
Date:
On Jan21, 2011, at 03:29 , Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Robert Haas  wrote:
>>> Oh, you mean water that had some things you didn't want taken out
>>> of it?
> 
>> Right -- God only knows the number of things were filtered out to
>> leave me with filtered water.  What's "filtered" in this case is what
>> was passed through, not what was removed.
> 
> I think it's pretty common to use the phrase "filtered out" to identify
> the stuff that gets removed by the filter, as opposed to what gets
> through.  So we could possibly use "Rows Filtered Out: nnn".  I still
> think that's more awkward than "Rows Removed: nnn" though.

"Rows Skipped: nnn", maybe?

best regards,
Florian Pflug



Re: REVIEW: EXPLAIN and nfiltered

From
Hitoshi Harada
Date:
2011/1/21 Florian Pflug <fgp@phlo.org>:
> On Jan21, 2011, at 03:29 , Tom Lane wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>>> Robert Haas  wrote:
>>>> Oh, you mean water that had some things you didn't want taken out
>>>> of it?
>>
>>> Right -- God only knows the number of things were filtered out to
>>> leave me with filtered water.  What's "filtered" in this case is what
>>> was passed through, not what was removed.
>>
>> I think it's pretty common to use the phrase "filtered out" to identify
>> the stuff that gets removed by the filter, as opposed to what gets
>> through.  So we could possibly use "Rows Filtered Out: nnn".  I still
>> think that's more awkward than "Rows Removed: nnn" though.
>
> "Rows Skipped: nnn", maybe?

+1. Very straightforward to me.

Regards,

--
Hitoshi Harada


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> 2011/1/21 Florian Pflug <fgp@phlo.org>:
>> "Rows Skipped: nnn", maybe?

> +1. Very straightforward to me.

I didn't really care for that one, because I think it *won't* be
straightforward when there's more than one filter condition at a node.
Imagine
Bitmap Heap Scan ...    Recheck Cond: blah blah    Rows Skipped: 42    Filter Cond: blah blah blah    Rows Skipped: 77

To me, "rows skipped" sounds like a statement about the overall behavior
of the plan node, and thus the above looks contradictory.  Another point
is that even if you're okay with the above for textual output, we do not
have a choice about choosing distinct field names for the two counts for
XML/JSON output.

Reflecting on that, I'm inclined to suggest
Bitmap Heap Scan ...    Recheck Cond: blah blah    Rows Removed by Recheck: 42    Filter Cond: blah blah blah    Rows
Removedby Filter: 77
 

or even more verbosely
Bitmap Heap Scan ...    Recheck Cond: blah blah    Rows Removed by Recheck Cond: 42    Filter Cond: blah blah blah
RowsRemoved by Filter Cond: 77
 

ie repeat the label of the filtering condition exactly.  This is looking
pretty long, but from the viewpoint of vertical or horizontal space
occupied by the printout, I doubt it matters.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
David Fetter
Date:
On Sat, Jan 22, 2011 at 11:55:51AM -0500, Tom Lane wrote:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
> > 2011/1/21 Florian Pflug <fgp@phlo.org>:
> >> "Rows Skipped: nnn", maybe?
> 
> > +1. Very straightforward to me.
> 
> I didn't really care for that one, because I think it *won't* be
> straightforward when there's more than one filter condition at a node.
> Imagine
> 
>     Bitmap Heap Scan ...
>         Recheck Cond: blah blah
>         Rows Skipped: 42
>         Filter Cond: blah blah blah
>         Rows Skipped: 77
> 
> To me, "rows skipped" sounds like a statement about the overall behavior
> of the plan node, and thus the above looks contradictory.  Another point
> is that even if you're okay with the above for textual output, we do not
> have a choice about choosing distinct field names for the two counts for
> XML/JSON output.
> 
> Reflecting on that, I'm inclined to suggest
> 
>     Bitmap Heap Scan ...
>         Recheck Cond: blah blah
>         Rows Removed by Recheck: 42
>         Filter Cond: blah blah blah
>         Rows Removed by Filter: 77
> 
> or even more verbosely
> 
>     Bitmap Heap Scan ...
>         Recheck Cond: blah blah
>         Rows Removed by Recheck Cond: 42
>         Filter Cond: blah blah blah
>         Rows Removed by Filter Cond: 77
> 
> ie repeat the label of the filtering condition exactly.  This is looking
> pretty long, but from the viewpoint of vertical or horizontal space
> occupied by the printout, I doubt it matters.

+1 for this.  It says what happened. :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: REVIEW: EXPLAIN and nfiltered

From
Florian Pflug
Date:
On Jan22, 2011, at 17:55 , Tom Lane wrote:
> Reflecting on that, I'm inclined to suggest
> 
>     Bitmap Heap Scan ...
>         Recheck Cond: blah blah
>         Rows Removed by Recheck: 42
>         Filter Cond: blah blah blah
>         Rows Removed by Filter: 77
> 
> or even more verbosely
> 
>     Bitmap Heap Scan ...
>         Recheck Cond: blah blah
>         Rows Removed by Recheck Cond: 42
>         Filter Cond: blah blah blah
>         Rows Removed by Filter Cond: 77
> 
> ie repeat the label of the filtering condition exactly.  This is looking
> pretty long, but from the viewpoint of vertical or horizontal space
> occupied by the printout, I doubt it matters.

+1. Repeating the label of the condition adds enough context to make
"Removed" unambiguous IMHO.

best regards,
Florian Pflug



Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Jan22, 2011, at 17:55 , Tom Lane wrote:
>> Reflecting on that, I'm inclined to suggest
>>    Bitmap Heap Scan ...
>>         Recheck Cond: blah blah
>>         Rows Removed by Recheck Cond: 42
>>         Filter Cond: blah blah blah
>>         Rows Removed by Filter Cond: 77

> +1. Repeating the label of the condition adds enough context to make
> "Removed" unambiguous IMHO.

Given where we've ended up on what we want printed, I'm forced to
conclude that there is basically nothing usable in the submitted patch.
We have the following problems:

1. It only instruments the ExecQual() call in execScan.c.  There are
close to 20 other calls that also need instrumentation, unless we
intentionally decide not to bother with counting results for certain
filters (but see #4).

2. Putting the counter in struct Instrumentation doesn't scale very
nicely to cases with more than one qual per node.  We could put some
arbitrary number of counters into that struct and make some arbitrary
assignments of which is used for what, but ugh.  I am tempted to suggest
that these counters should go right into the PlanState nodes for the
relevant plan node types; which would likely mean that they'd get
incremented unconditionally whether we're running EXPLAIN ANALYZE or
not.  Offhand I don't have a problem with that --- it's not apparent
to me that
    if (node->ps.instrument)        node->counter += 1;

is faster than an unconditional
    node->counter += 1;

on modern machines in the first place, and in the second place we have
certainly expended many more cycles than that by the time we have
completed a failing ExecQual call, so it's unlikely to matter.

3. The additions to explain.c of course need complete revision to
support this output style.  Likewise the documentation (and as was
mentioned upthread this isn't enough documentation anyway, since it
utterly fails to explain what nfiltered is to users).

4. I'm not entirely sure what to do with nodeResult's resconstantqual.
If we do instrument that, it would have to be done differently from
everywhere else, since execQual is called only once not once per tuple.
But on the whole I think we could leave it out, since it's pretty
obvious from the node's overall behavior whether the qual passed or not.


I had thought perhaps I could fix this patch up and commit it, but a
complete rewrite seems beyond the bounds of what should happen during a
CommitFest, especially since there are lots of other patches in need of
attention.  So I'm marking it Returned With Feedback.
        regards, tom lane


Re: REVIEW: EXPLAIN and nfiltered

From
Marko Tiikkaja
Date:
On 1/24/2011 7:02 PM, Tom Lane wrote:
> Given where we've ended up on what we want printed, I'm forced to
> conclude that there is basically nothing usable in the submitted patch.

I personally feel that if we could even add this for explicit Filter 
conditions, people would be a lot happier.  While I agree that having 
all the fancy stuff discussed in this thread would be nice, I don't 
think they're worth postponing the Filter part to 9.2.


Regards,
Marko Tiikkaja


Re: REVIEW: EXPLAIN and nfiltered

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 1/24/2011 7:02 PM, Tom Lane wrote:
>> Given where we've ended up on what we want printed, I'm forced to
>> conclude that there is basically nothing usable in the submitted patch.

> I personally feel that if we could even add this for explicit Filter 
> conditions, people would be a lot happier.  While I agree that having 
> all the fancy stuff discussed in this thread would be nice, I don't 
> think they're worth postponing the Filter part to 9.2.

I think there's probably only a day or two's work involved in coding up
what I sketched.  If you were to commit to doing that pretty quickly,
I'd personally be happy to regard the patch as Waiting On Author rather
than postponed to 9.2.
        regards, tom lane