Thread: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Andres Freund
Date:
Hi,

When analyzing the plan of a query I often find myself questioning 
whether an additional index may be sensible, or if it is sensible that a 
SeqScan is used if an index is available.

The current EXPLAIN ANALYZE only shows the number of tuples matching the 
qualifier of an SeqScan Node - for analyzing the above situation it is 
at least equally interesting how many tuples were read and discarded.

Therefore I produced a patch which adds a 'discarded=%f' part to the 
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment 
- the discarded counter is increased in execScan directly instead of a 
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for 
some there might be a sensible semantic meaning to it...

Good idea - Bad idea?

Greetings,

Andres


Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Heikki Linnakangas
Date:
Andres Freund wrote:
> 
> When analyzing the plan of a query I often find myself questioning 
> whether an additional index may be sensible, or if it is sensible that a 
> SeqScan is used if an index is available.
> 
> The current EXPLAIN ANALYZE only shows the number of tuples matching the 
> qualifier of an SeqScan Node - for analyzing the above situation it is 
> at least equally interesting how many tuples were read and discarded.
> 
> Therefore I produced a patch which adds a 'discarded=%f' part to the 
> analyze output.
> As this is only a RFD the implementation is a bit hackish at the moment 
> - the discarded counter is increased in execScan directly instead of a 
> helper routine in instrument.c.
> Also the discarded count is displayed in other node types as well - for 
> some there might be a sensible semantic meaning to it...
> 
> Good idea - Bad idea?

Isn't the discarded count always equal to (# of rows in table - matched 
tuples)? Seems pretty redundant to me.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Andres Freund
Date:
Hi,

On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
> Andres Freund wrote:
>> When analyzing the plan of a query I often find myself questioning
>> whether an additional index may be sensible, or if it is sensible that
>> a SeqScan is used if an index is available.
>>
>> The current EXPLAIN ANALYZE only shows the number of tuples matching
>> the qualifier of an SeqScan Node - for analyzing the above situation
>> it is at least equally interesting how many tuples were read and
>> discarded.
>> Good idea - Bad idea?
> Isn't the discarded count always equal to (# of rows in table - matched
> tuples)? Seems pretty redundant to me.
Not for EXISTS(), LIMIT and similar.

Also when looking at more complex plans its quite a nuisance to go 
through all participating tables and do a separate count(*). Especially 
its not your plan but some clients plan etc.

Andres


Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
>> Isn't the discarded count always equal to (# of rows in table - matched
>> tuples)? Seems pretty redundant to me.

> Not for EXISTS(), LIMIT and similar.

It doesn't really seem useful enough to justify breaking client-side
code that looks at EXPLAIN output.

This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format.  The barrier for adding additional output fields would
be a lot lower in such a format.
        regards, tom lane


Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Greg Stark
Date:
On Fri, May 22, 2009 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It doesn't really seem useful enough to justify breaking client-side
> code that looks at EXPLAIN output.

Fwiw at least pgadmin I don't think would be confused by this. These
tool authors aren't enamoured of fragile assumptions and the
maintenance headaches they cause either.

> This sort of ties into the discussions we have periodically about
> allowing EXPLAIN to output XML or some other more-machine-friendly
> data format.  The barrier for adding additional output fields would
> be a lot lower in such a format.

This is still pretty much true if only for the sheer unscalability of
the amount of data being presented for users to sift through. I do
want us to add a ton more instrumentation into the explain plan and
this is only one small addition. If we add number of hard and soft
i/os, time spent in user and system space, etc the result would be
pretty unreadable and they're at least as important as things like
this.

--
greg


Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

From
Andres Freund
Date:
Hi,

On 05/22/2009 05:54 PM, Tom Lane wrote:
> This sort of ties into the discussions we have periodically about
> allowing EXPLAIN to output XML or some other more-machine-friendly
> data format.  The barrier for adding additional output fields would
> be a lot lower in such a format.
So the best thing would be to work on that front...

Tom (Raney), did you further work on your XML explain patch? Could you 
use help?


Greetings,

Andres