Re: EXPLAIN and nfiltered, take two - Mailing list pgsql-hackers

From Robert Haas
Subject Re: EXPLAIN and nfiltered, take two
Date
Msg-id CA+TgmobvNt+C9E1jB-3c3Ev6KdqPRt_0nZa0U+nvHS_WnBk5+g@mail.gmail.com
Whole thread Raw
In response to Re: EXPLAIN and nfiltered, take two  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: EXPLAIN and nfiltered, take two
List pgsql-hackers
On Tue, Sep 20, 2011 at 8:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko.tiikkaja@2ndquadrant.com> writes:
>>> The attached patch is the best I could come up with.  I considered
>>> showing "Rows Removed by Foo: (never executed)" and omitting the line
>>> altogether, but I didn't particularly like either of those options.  The
>>> current patch simply displays "Rows Removed by Foo: 0".
>
> I ran into a couple more issues with this patch.
>
> One is the handling of bitmapqualorig filtering (and correspondingly
> indexqualorig, which the patch misses entirely).  These counts are
> really quite a bit different from the other filter conditions we are
> dealing with, because what they represent is not anything directly
> user-controllable, but how lossy the indexscan is.  That is, we get a
> count for any tuple that the index told us to visit but that turned out
> to not actually satisfy the indexqual.  So the count is guaranteed zero
> for any non-lossy indexscan, which includes most cases.  In view of
> that, I find it useless and probably confusing to put out "Rows Removed
> by Recheck Cond: 0" unless we're dealing with a lossy index.

I don't really see the point of this.  I think printing it always is
both clear and appropriate; it would be even nicer if we also had a
line for "Rows Rechecked".

I am slightly worried that this additional information is going to
make the output too verbose.  But if that turns out to be the problem,
I think the solution is to add another option to control whether this
information is emitted - that's why we have a flexible options syntax
in the first place - and not to start guessing which information the
user will think is boring or confusing in any particular case.

I think we are getting to the point where EXPLAIN is complex enough
that it should really have its own chapter in the documentation.  The
existing treatment in the SQL reference page is a good start, but it
doesn't really do the topic justice.  And being able to document what
all of these things mean would, I think, ease the problem of trying to
make everything 100% self-documenting.

> Also, upthread it was argued that we shouldn't measure the effects of
> joinqual filtering.  I don't buy this for a minute, especially not in
> merge/hash joins, where a row thrown away by joinqual filtering is just
> as expensive as one thrown away by otherqual filtering, and where you
> can *not* determine how big the raw merge/hash join result is if you're
> not told how much the joinqual removed.  I see the point about it not
> being clear how to explain things for SEMI/ANTI join cases, but I think
> we need to figure that out, not just punt.

Yep, I agree.  We should measure everything we possibly can.  I don't
have a clear idea how this ought to be represented, but leaving it out
doesn't seem like the right answer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Isolation tests still falling over routinely
Next
From: Fujii Masao
Date:
Subject: Re: Online base backup from the hot-standby