Thread: Interpreting statistics collector output

Interpreting statistics collector output

From
Steve Madsen
Date:
Can anyone provide a brief overview of how to go about interpreting
the information generated by the statistics collector?  I've looked
around and can't find old mailing list messages or anything in the
manual beyond the basics of how to query the statistics.

Cache hit rates are easy to compute, but is this useful beyond
knowing if Postgres has enough buffers?

Can anything useful be gleaned from looking at idx_scan /
idx_tup_read / idx_tup_fetch?

--
Steve Madsen <steve@lightyearsoftware.com>
Light Year Software, LLC  http://lightyearsoftware.com
ZingLists: Stay organized, and share lists online.  http://zinglists.com



Re: Interpreting statistics collector output

From
Decibel!
Date:
On Wed, Aug 08, 2007 at 11:01:13AM -0400, Steve Madsen wrote:
> Can anyone provide a brief overview of how to go about interpreting
> the information generated by the statistics collector?  I've looked
> around and can't find old mailing list messages or anything in the
> manual beyond the basics of how to query the statistics.
>
> Cache hit rates are easy to compute, but is this useful beyond
> knowing if Postgres has enough buffers?
>
> Can anything useful be gleaned from looking at idx_scan /
> idx_tup_read / idx_tup_fetch?

Yes, that will give you information about how often an index is being
used. If you see indexes where idx_scan is a small number, that's an
indication that that index isn't being used for queries and could
potentially be dropped.

Something else I like to look at is pg_stat_all_tables seq_scan and
seq_tup_read. If seq_scan is a large number and seq_tup_read/seq_scan is
also large, that indicates that you could use an index on that table.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Interpreting statistics collector output

From
Steve Madsen
Date:
On Aug 8, 2007, at 6:08 PM, Decibel! wrote:
> Something else I like to look at is pg_stat_all_tables seq_scan and
> seq_tup_read. If seq_scan is a large number and seq_tup_read/
> seq_scan is
> also large, that indicates that you could use an index on that table.

If seq_tup_read / seq_scan is large relative to the number of rows in
the table, wouldn't that imply that those sequential scans are often
returning most of the rows in the table?  In that case, would an
index help much or is a sequential scan the expected result?

--
Steve Madsen <steve@lightyearsoftware.com>
Light Year Software, LLC  http://lightyearsoftware.com
ZingLists: Stay organized, and share lists online.  http://zinglists.com



Re: Interpreting statistics collector output

From
Decibel!
Date:
On Thu, Aug 09, 2007 at 09:14:55PM -0400, Steve Madsen wrote:
> On Aug 8, 2007, at 6:08 PM, Decibel! wrote:
> >Something else I like to look at is pg_stat_all_tables seq_scan and
> >seq_tup_read. If seq_scan is a large number and seq_tup_read/
> >seq_scan is
> >also large, that indicates that you could use an index on that table.
>
> If seq_tup_read / seq_scan is large relative to the number of rows in
> the table, wouldn't that imply that those sequential scans are often
> returning most of the rows in the table?  In that case, would an
> index help much or is a sequential scan the expected result?

I can't really think of a case where a seqscan wouldn't return all the
rows in the table... that's what it's meant to do.

What I was driving at by looking at seq_tup_read is that a small table
isn't going to use indexes anyway, so for the small tables it's
generally not worth worrying about indexes. If you wanted to be more
accurate you could look at reltuples or maybe relpages in pg_class
instead.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Interpreting statistics collector output

From
Steve Madsen
Date:
On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
> I can't really think of a case where a seqscan wouldn't return all the
> rows in the table... that's what it's meant to do.

Isn't a sequential scan the only option if an appropriate index does
not exist?  E.g., for a query with a WHERE clause, but none of the
referenced columns are indexed.

Put another way: consider a large table with no indexes.
seq_tup_read / seq_scan is the average number of rows returned per
scan, and if this is a small percentage of the row count, then it
seems reasonable to say an index should help query performance.
(With the understanding that it's fewer common rather than many
unique queries.)

> What I was driving at by looking at seq_tup_read is that a small table
> isn't going to use indexes anyway, so for the small tables it's
> generally not worth worrying about indexes.

OK, so it sounds like there is a threshold to be determined where
this sort of analysis isn't very interesting.

In the interests of stirring up more interest, earlier this week I
released a plugin for Ruby on Rails that extracts these statistics
from Postgres and provides helper methods to format them nicely in
web page views.  I'm using it to put DB monitors on an administrative
dashboard for a site I run.

I'd love to add some analysis logic to the plugin.  If useful
suggestions can be drawn from interpreting the numbers, small badges
can show up in the web page, making the tuning process a little less
painful.

(The plugins project home page is at http://groups.google.com/group/
pgsql_stats.)

--
Steve Madsen <steve@lightyearsoftware.com>
Light Year Software, LLC  http://lightyearsoftware.com
ZingLists: Stay organized, and share lists online.  http://zinglists.com



Re: Interpreting statistics collector output

From
Decibel!
Date:
On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
> >I can't really think of a case where a seqscan wouldn't return all the
> >rows in the table... that's what it's meant to do.
>
> Isn't a sequential scan the only option if an appropriate index does
> not exist?  E.g., for a query with a WHERE clause, but none of the
> referenced columns are indexed.

Yes, and that seqscan is going to read the entire table and then apply a
filter.

> Put another way: consider a large table with no indexes.
> seq_tup_read / seq_scan is the average number of rows returned per
> scan, and if this is a small percentage of the row count, then it
> seems reasonable to say an index should help query performance.
> (With the understanding that it's fewer common rather than many
> unique queries.)

decibel=# select * into i from generate_series(1,99999) i;
SELECT
decibel=# select seq_scan, seq_tup_read  from pg_stat_all_tables where relname='i';
 seq_scan | seq_tup_read
----------+--------------
        0 |            0
(1 row)

decibel=# select * from i where i=1;
 i
---
 1
(1 row)

decibel=# select seq_scan, seq_tup_read  from pg_stat_all_tables where relname='i';
 seq_scan | seq_tup_read
----------+--------------
        1 |        99999
(1 row)
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Interpreting statistics collector output

From
Gregory Stark
Date:
"Decibel!" <decibel@decibel.org> writes:

> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
>> >I can't really think of a case where a seqscan wouldn't return all the
>> >rows in the table... that's what it's meant to do.

LIMIT

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Interpreting statistics collector output

From
Decibel!
Date:
On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote:
> "Decibel!" <decibel@decibel.org> writes:
>> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
>>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
>>>> I can't really think of a case where a seqscan wouldn't return
>>>> all the
>>>> rows in the table... that's what it's meant to do.
>
> LIMIT

Ok, you got me. :P But normally you wouldn't do a LIMIT without some
kind of an ORDER BY, which would mean scanning the whole table.

In any case, it's not a perfect metric, but in general use it seems
to be "good enough".
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Interpreting statistics collector output

From
Ron Mayer
Date:
Decibel! wrote:
> On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote:
>> "Decibel!" <decibel@decibel.org> writes:
>>> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
>>>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
>>>>> I can't really think of a case where a seqscan wouldn't return all the
>>>>> rows in the table... that's what it's meant to do.
>>
>> LIMIT
>
> Ok, you got me. :P But normally you wouldn't do a LIMIT without some
> kind of an ORDER BY, which would mean scanning the whole table.


Seems "where exists" does it too, no?



test=#  explain analyze select 1 where exists (select * from bigtbl);
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Result  (cost=9681.61..9681.62 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=1)
   One-Time Filter: $0
   InitPlan
     ->  Seq Scan on bigtbl  (cost=0.00..9681.61 rows=140461 width=443) (actual time=0.027..0.027 rows=1 loops=1)
 Total runtime: 0.177 ms
(5 rows)