Thread: [PATCH] explain tup_fetched/returned in monitoring-stats

[PATCH] explain tup_fetched/returned in monitoring-stats

From
Abhijit Menon-Sen
Date:
I'm making some changes to a program that, among other things, reports
tup_fetched/tup_returned as if it were a cache hit rate, analogous to
blks_hit/blks_fetched.

The documentation didn't help me to understand if that was appropriate,
so I looked at the source and asked on IRC. It seems I'm not the first
person to be confused by these descriptions, so here's a tiny patch to
clarify the meaning of fetched and returned.

-- Abhijit

Attachment

Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Tom Lane
Date:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
> I'm making some changes to a program that, among other things, reports
> tup_fetched/tup_returned as if it were a cache hit rate, analogous to
> blks_hit/blks_fetched.

> The documentation didn't help me to understand if that was appropriate,
> so I looked at the source and asked on IRC. It seems I'm not the first
> person to be confused by these descriptions, so here's a tiny patch to
> clarify the meaning of fetched and returned.

It may be tiny, but I don't believe it's correct.  t_tuples_returned for
instance is incremented by both pgstat_count_heap_getnext() (ie,
successful returns from heap_getnext()) and pgstat_count_index_tuples()
(which counts heap TIDs returned from either index_getnext_tid or
index_getbitmap).  t_tuples_fetched is incremented by
pgstat_count_heap_fetch(), which is called in heap_fetch and
index_fetch_heap.

Right at the moment it's not obvious to me that these calls are usefully
placed.  My intuitive understanding of "fetched" vs "returned" is that
the former should count physical tuple visits while the latter should
count tuples that satisfied some qual or other.  It doesn't seem like
the implementation actually squares with that.  But in any case,
indexscan vs heapscan is a completely wrong description of the
difference.
        regards, tom lane



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Abhijit Menon-Sen
Date:
At 2012-10-12 13:05:44 -0400, tgl@sss.pgh.pa.us wrote:
>
> t_tuples_returned for instance is incremented by both
> pgstat_count_heap_getnext() (ie, successful returns from
> heap_getnext()) and pgstat_count_index_tuples() (which
> counts heap TIDs returned from either index_getnext_tid
> or index_getbitmap).

But pgstat_count_index_tuples() is called only on the index relation,
right? And pgstat_count_heap_fetch() is called by index_fetch_heap on
the index relation too.

Earlier, I thought that pgstat_recv_tabstat() adds t_tuples_fetched and
t_tuples_returned only for tables to the database stats (as the comments
and variable names suggest), but it makes more sense for it to include
index relations too (and the code in pgstat_initstats does imply that's
what is happening).

> But in any case, indexscan vs heapscan is a completely wrong
> description of the difference.

Yes. I'm sorry. Is there any concise description that applies? I think
it's worth fixing, seeing that multiple competent people have got the
wrong idea about what it means.

-- Abhijit



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Robert Haas
Date:
On Fri, Oct 12, 2012 at 2:05 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> Yes. I'm sorry. Is there any concise description that applies? I think
> it's worth fixing, seeing that multiple competent people have got the
> wrong idea about what it means.

I don't think there is.  I think we need to replace those counters
with something better.  The status quo is quite bizarre.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Abhijit Menon-Sen
Date:
At 2012-10-15 10:28:17 -0400, robertmhaas@gmail.com wrote:
>
> > Is there any concise description that applies? […]
>
> I don't think there is.  I think we need to replace those counters
> with something better.  The status quo is quite bizarre.

Fair enough. Do you have any ideas?

I see two possibilities: first, they could become the tuple analogue of
blks_read and blks_hit, i.e. tuples fetched from disk, and tuples found
in memory. (I don't know if there's a simple way to count that, and I'm
not sure it would be very useful; we have blks_{read,hit} after all.)

Second, it could do what I thought it did, which is count tuples fetched
by sequential and index scans respectively. I'm not sure how useful the
values would be, but at least it's information you can't get elsewhere.

Also, what are the compatibility implications of changing this? I don't
think anyone is using the current *values*, but I imagine that changing
the column names might break some people's queries.

(I don't feel strongly about any course of action here. I just think the
current situation is unhelpful, and if there's a consensus about what to
change—whether code or documentation—I'm willing to do the work.)

-- Abhijit



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Simon Riggs
Date:
On 20 October 2012 07:43, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2012-10-15 10:28:17 -0400, robertmhaas@gmail.com wrote:
>>
>> > Is there any concise description that applies? […]
>>
>> I don't think there is.  I think we need to replace those counters
>> with something better.  The status quo is quite bizarre.
>
> Fair enough. Do you have any ideas?
>
> I see two possibilities: first, they could become the tuple analogue of
> blks_read and blks_hit, i.e. tuples fetched from disk, and tuples found
> in memory. (I don't know if there's a simple way to count that, and I'm
> not sure it would be very useful; we have blks_{read,hit} after all.)
>
> Second, it could do what I thought it did, which is count tuples fetched
> by sequential and index scans respectively. I'm not sure how useful the
> values would be, but at least it's information you can't get elsewhere.

We already have the second one on pg_stat_all_tables.

A third possibility exists, which is the one Tom described above.

Collecting information at pg_stat_database level isn't interesting
anyway (to me) for information that can be collected at table level.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Bruce Momjian
Date:
On Sat, Oct 20, 2012 at 12:27:16PM +0100, Simon Riggs wrote:
> On 20 October 2012 07:43, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> > At 2012-10-15 10:28:17 -0400, robertmhaas@gmail.com wrote:
> >>
> >> > Is there any concise description that applies? […]
> >>
> >> I don't think there is.  I think we need to replace those counters
> >> with something better.  The status quo is quite bizarre.
> >
> > Fair enough. Do you have any ideas?
> >
> > I see two possibilities: first, they could become the tuple analogue of
> > blks_read and blks_hit, i.e. tuples fetched from disk, and tuples found
> > in memory. (I don't know if there's a simple way to count that, and I'm
> > not sure it would be very useful; we have blks_{read,hit} after all.)
> >
> > Second, it could do what I thought it did, which is count tuples fetched
> > by sequential and index scans respectively. I'm not sure how useful the
> > values would be, but at least it's information you can't get elsewhere.
> 
> We already have the second one on pg_stat_all_tables.
> 
> A third possibility exists, which is the one Tom described above.
> 
> Collecting information at pg_stat_database level isn't interesting
> anyway (to me) for information that can be collected at table level.

Added to TODO:
Restructure pg_stat_database columns tup_returned and tup_fetched toreturn meaningful values

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PATCH] explain tup_fetched/returned in monitoring-stats

From
Robert Haas
Date:
On Sat, Oct 20, 2012 at 2:43 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2012-10-15 10:28:17 -0400, robertmhaas@gmail.com wrote:
>>
>> > Is there any concise description that applies? […]
>>
>> I don't think there is.  I think we need to replace those counters
>> with something better.  The status quo is quite bizarre.
>
> Fair enough. Do you have any ideas?
>
> I see two possibilities: first, they could become the tuple analogue of
> blks_read and blks_hit, i.e. tuples fetched from disk, and tuples found
> in memory. (I don't know if there's a simple way to count that, and I'm
> not sure it would be very useful; we have blks_{read,hit} after all.)
>
> Second, it could do what I thought it did, which is count tuples fetched
> by sequential and index scans respectively. I'm not sure how useful the
> values would be, but at least it's information you can't get elsewhere.

I guess I'd LIKE to see:

- index tuples fetched by index scans
- heap tuples fetched by index scans (not necessarily the same because
of HOT chains)
- index tuples fetched by index-only scans
- heap tuples fetched by index-only scans
- index tuples fetched by bitmap index scans
- heap tuples fetched by bitmap index scans
- heap tuples fetched by sequential scans

...and then, within that, it might be useful to break it down by
buffers hit vs. buffers missed.  However, that's probably more
counters than we can afford with the current stats infrastructure.

However, if I get to dream...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company