Thread: Interpreting statistics collector output
Can anyone provide a brief overview of how to go about interpreting the information generated by the statistics collector? I've looked around and can't find old mailing list messages or anything in the manual beyond the basics of how to query the statistics. Cache hit rates are easy to compute, but is this useful beyond knowing if Postgres has enough buffers? Can anything useful be gleaned from looking at idx_scan / idx_tup_read / idx_tup_fetch? -- Steve Madsen <steve@lightyearsoftware.com> Light Year Software, LLC http://lightyearsoftware.com ZingLists: Stay organized, and share lists online. http://zinglists.com
On Wed, Aug 08, 2007 at 11:01:13AM -0400, Steve Madsen wrote: > Can anyone provide a brief overview of how to go about interpreting > the information generated by the statistics collector? I've looked > around and can't find old mailing list messages or anything in the > manual beyond the basics of how to query the statistics. > > Cache hit rates are easy to compute, but is this useful beyond > knowing if Postgres has enough buffers? > > Can anything useful be gleaned from looking at idx_scan / > idx_tup_read / idx_tup_fetch? Yes, that will give you information about how often an index is being used. If you see indexes where idx_scan is a small number, that's an indication that that index isn't being used for queries and could potentially be dropped. Something else I like to look at is pg_stat_all_tables seq_scan and seq_tup_read. If seq_scan is a large number and seq_tup_read/seq_scan is also large, that indicates that you could use an index on that table. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Aug 8, 2007, at 6:08 PM, Decibel! wrote: > Something else I like to look at is pg_stat_all_tables seq_scan and > seq_tup_read. If seq_scan is a large number and seq_tup_read/ > seq_scan is > also large, that indicates that you could use an index on that table. If seq_tup_read / seq_scan is large relative to the number of rows in the table, wouldn't that imply that those sequential scans are often returning most of the rows in the table? In that case, would an index help much or is a sequential scan the expected result? -- Steve Madsen <steve@lightyearsoftware.com> Light Year Software, LLC http://lightyearsoftware.com ZingLists: Stay organized, and share lists online. http://zinglists.com
On Thu, Aug 09, 2007 at 09:14:55PM -0400, Steve Madsen wrote: > On Aug 8, 2007, at 6:08 PM, Decibel! wrote: > >Something else I like to look at is pg_stat_all_tables seq_scan and > >seq_tup_read. If seq_scan is a large number and seq_tup_read/ > >seq_scan is > >also large, that indicates that you could use an index on that table. > > If seq_tup_read / seq_scan is large relative to the number of rows in > the table, wouldn't that imply that those sequential scans are often > returning most of the rows in the table? In that case, would an > index help much or is a sequential scan the expected result? I can't really think of a case where a seqscan wouldn't return all the rows in the table... that's what it's meant to do. What I was driving at by looking at seq_tup_read is that a small table isn't going to use indexes anyway, so for the small tables it's generally not worth worrying about indexes. If you wanted to be more accurate you could look at reltuples or maybe relpages in pg_class instead. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Aug 15, 2007, at 11:52 AM, Decibel! wrote: > I can't really think of a case where a seqscan wouldn't return all the > rows in the table... that's what it's meant to do. Isn't a sequential scan the only option if an appropriate index does not exist? E.g., for a query with a WHERE clause, but none of the referenced columns are indexed. Put another way: consider a large table with no indexes. seq_tup_read / seq_scan is the average number of rows returned per scan, and if this is a small percentage of the row count, then it seems reasonable to say an index should help query performance. (With the understanding that it's fewer common rather than many unique queries.) > What I was driving at by looking at seq_tup_read is that a small table > isn't going to use indexes anyway, so for the small tables it's > generally not worth worrying about indexes. OK, so it sounds like there is a threshold to be determined where this sort of analysis isn't very interesting. In the interests of stirring up more interest, earlier this week I released a plugin for Ruby on Rails that extracts these statistics from Postgres and provides helper methods to format them nicely in web page views. I'm using it to put DB monitors on an administrative dashboard for a site I run. I'd love to add some analysis logic to the plugin. If useful suggestions can be drawn from interpreting the numbers, small badges can show up in the web page, making the tuning process a little less painful. (The plugins project home page is at http://groups.google.com/group/ pgsql_stats.) -- Steve Madsen <steve@lightyearsoftware.com> Light Year Software, LLC http://lightyearsoftware.com ZingLists: Stay organized, and share lists online. http://zinglists.com
On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote: > On Aug 15, 2007, at 11:52 AM, Decibel! wrote: > >I can't really think of a case where a seqscan wouldn't return all the > >rows in the table... that's what it's meant to do. > > Isn't a sequential scan the only option if an appropriate index does > not exist? E.g., for a query with a WHERE clause, but none of the > referenced columns are indexed. Yes, and that seqscan is going to read the entire table and then apply a filter. > Put another way: consider a large table with no indexes. > seq_tup_read / seq_scan is the average number of rows returned per > scan, and if this is a small percentage of the row count, then it > seems reasonable to say an index should help query performance. > (With the understanding that it's fewer common rather than many > unique queries.) decibel=# select * into i from generate_series(1,99999) i; SELECT decibel=# select seq_scan, seq_tup_read from pg_stat_all_tables where relname='i'; seq_scan | seq_tup_read ----------+-------------- 0 | 0 (1 row) decibel=# select * from i where i=1; i --- 1 (1 row) decibel=# select seq_scan, seq_tup_read from pg_stat_all_tables where relname='i'; seq_scan | seq_tup_read ----------+-------------- 1 | 99999 (1 row) -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
"Decibel!" <decibel@decibel.org> writes: > On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote: >> On Aug 15, 2007, at 11:52 AM, Decibel! wrote: >> >I can't really think of a case where a seqscan wouldn't return all the >> >rows in the table... that's what it's meant to do. LIMIT -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote: > "Decibel!" <decibel@decibel.org> writes: >> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote: >>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote: >>>> I can't really think of a case where a seqscan wouldn't return >>>> all the >>>> rows in the table... that's what it's meant to do. > > LIMIT Ok, you got me. :P But normally you wouldn't do a LIMIT without some kind of an ORDER BY, which would mean scanning the whole table. In any case, it's not a perfect metric, but in general use it seems to be "good enough". -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Decibel! wrote: > On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote: >> "Decibel!" <decibel@decibel.org> writes: >>> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote: >>>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote: >>>>> I can't really think of a case where a seqscan wouldn't return all the >>>>> rows in the table... that's what it's meant to do. >> >> LIMIT > > Ok, you got me. :P But normally you wouldn't do a LIMIT without some > kind of an ORDER BY, which would mean scanning the whole table. Seems "where exists" does it too, no? test=# explain analyze select 1 where exists (select * from bigtbl); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Result (cost=9681.61..9681.62 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=1) One-Time Filter: $0 InitPlan -> Seq Scan on bigtbl (cost=0.00..9681.61 rows=140461 width=443) (actual time=0.027..0.027 rows=1 loops=1) Total runtime: 0.177 ms (5 rows)