Thread: row-level stats and last analyze time
[ CC'ing -hackers ] On Sun, 2007-04-22 at 16:10 +0200, Guillaume Lelarge wrote: > This patch adds a sentence on monitoring.sgml explaining that > stats_row_level needs to be enabled if user wants to get last > vacuum/analyze execution time. This behavior was introduced in r1.120 of postmaster/pgstat.c: Modify pgstats code to reduce performance penalties from oversized stats data files: avoid creating stats hashtable entries for tables that aren't being touched except by vacuum/analyze [...] which included other modifications to reduce the pgstat I/O volume in 8.1. I don't think this particular change was wise: the reduction in pgstat volume is pretty marginal, and it is counter-intuitive for stats_row_level to effect whether the last ANALYZE / VACUUM is recorded. (Plus, the optimization is not even enabled with the default postgresql.conf settings.) -Neil
On Tuesday 24 April 2007 17:38, Neil Conway wrote: > [ CC'ing -hackers ] > > On Sun, 2007-04-22 at 16:10 +0200, Guillaume Lelarge wrote: > > This patch adds a sentence on monitoring.sgml explaining that > > stats_row_level needs to be enabled if user wants to get last > > vacuum/analyze execution time. > > This behavior was introduced in r1.120 of postmaster/pgstat.c: > > Modify pgstats code to reduce performance penalties from > oversized stats data files: avoid creating stats > hashtable entries for tables that aren't being touched > except by vacuum/analyze [...] > > which included other modifications to reduce the pgstat I/O volume in > 8.1. I don't think this particular change was wise: the reduction in > pgstat volume is pretty marginal, and it is counter-intuitive for > stats_row_level to effect whether the last ANALYZE / VACUUM is recorded. > (Plus, the optimization is not even enabled with the default > postgresql.conf settings.) > +1 -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Tue, 2007-04-24 at 17:38 -0400, Neil Conway wrote: > which included other modifications to reduce the pgstat I/O volume in > 8.1. I don't think this particular change was wise I looked into this a bit further: (1) I believe the reasoning for Tom's earlier change was not to reduce the I/O between the backend and the pgstat process: it was to keep the in-memory stats hash tables small, and to reduce the amount of data that needs to be written to disk. When the only stats messages we get for a table are VACUUM or ANALYZE messages, we discard the message in the pgstat daemon. (2) If stats_row_level is false, there won't be a stats hash entry for any tables, so we can skip sending the VACUUM or ANALYZE message in the first place, by the same logic. (This is more debatable if the user just disabled stats_row_level for the current session, although since only a super-user can do that, perhaps that's OK.) (3) I don't like the fact that the current coding is so willing to throw away VACUUM and ANALYZE pgstat messages. I think it is quite plausible that the DBA might be interested in the last-VACUUM and last-ANALYZE information for a table which hasn't had live operations applied to it recently. The rest of the pgstat code has a similarly disappointing willingness to silently discard messages it doesn't think are worth keeping (e.g. pgstat_recv_autovac() is ignored for databases with no other activity, and pgstat_count_xact_commit/rollback() is a no-op unless *either* row-level or block-level stats are enabled.) If we're so concerned about saving space in the stats hash tables for tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange to record the timestamps for database-wide VACUUMs and ANALYZEs separately from table-local VACUUMs and ANALYZEs? That is, a table's last_vacuum time could effectively be the max of the last database-wide vacuum time and the last VACUUM on that particular table. (Recording the time of the last database-wide VACUUM might be worth doing anyway, e.g. for avoiding wraparound failure). Comments? -Neil
Neil Conway wrote: > (3) I don't like the fact that the current coding is so willing to throw > away VACUUM and ANALYZE pgstat messages. I think it is quite plausible > that the DBA might be interested in the last-VACUUM and last-ANALYZE > information for a table which hasn't had live operations applied to it > recently. The rest of the pgstat code has a similarly disappointing > willingness to silently discard messages it doesn't think are worth > keeping (e.g. pgstat_recv_autovac() is ignored for databases with no > other activity, and pgstat_count_xact_commit/rollback() is a no-op > unless *either* row-level or block-level stats are enabled.) One thing to keep in mind is that autovac drives some decision from whether the database has a pgstat entry or not. In particular it means it doesn't bother processing non-connectable databases, unless they are close to Xid wraparound. I think this behavior is a useful one, since usually vacuuming those databases is a waste of time anyway. Whether to drive it from pgstat or from somewhere else is another matter, but if you want to drive it from another mechanism, keep in mind that the autovacuum launcher (which is the process that makes this decision) is not connected to any database so it cannot examine any catalog's content. There are of course ways around that: for example you could put the information in the pg_database flatfile. But it's something to keep in mind if you want to change it. > If we're so concerned about saving space in the stats hash tables for > tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange > to record the timestamps for database-wide VACUUMs and ANALYZEs > separately from table-local VACUUMs and ANALYZEs? That is, a table's > last_vacuum time could effectively be the max of the last database-wide > vacuum time and the last VACUUM on that particular table. (Recording the > time of the last database-wide VACUUM might be worth doing anyway, e.g. > for avoiding wraparound failure). Another thing to keep in mind is that autovacuum does not do database-wide vacuums anymore -- they are not needed. Xid wraparound decisions are handled on a table-by-table basis, so information about when the last database-wide vacuum was is not needed. Note that Xid wraparound decisions are driven by information in pg_class. So it's not a problem that pgstat may lose the info from this POV. The bottom line is that the current pgstat behavior and autovacuum are closely related. So if you want to change pgstats you should also keep an eye on how it's going to affect autovac. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote: > (1) I believe the reasoning for Tom's earlier change was not to reduce > the I/O between the backend and the pgstat process [...] Tom, any comments on this? Your change introduced an undocumented regression into 8.2. I think you're on the hook for a documentation update at the very least, if not a revert. -Neil
Neil Conway <neilc@samurai.com> writes: > On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote: >> (1) I believe the reasoning for Tom's earlier change was not to reduce >> the I/O between the backend and the pgstat process [...] > Tom, any comments on this? Your change introduced an undocumented > regression into 8.2. I think you're on the hook for a documentation > update at the very least, if not a revert. The documentation update seems the most prudent thing to me. The problem with the prior behavior is that it guarantees that every table in the database will eventually have a pg_stat entry, even if stats_row_level and stats_block_level are both off. In a DB with lots of tables that creates a significant overhead *for a feature the DBA probably thinks is turned off*. This is not how it worked before 8.2, and so 8.2.0's behavior is arguably a performance regression compared to 8.1 and before. Now this patch went in before we realized that 8.2.x had a bug in computing the stats-file-update delay, and it could be that after fixing that the problem is not so pressing. But I don't particularly care for new features that impose a performance penalty on those who aren't using them, and that's exactly what last_vacuum/last_analyze tracking does if we allow it to bloat the stats file in the default configuration. The long-term answer of course is to devise a more efficient stats reporting scheme, but I'm not sure offhand what that would look like. regards, tom lane
Has this been done yet? I don't think so. --------------------------------------------------------------------------- Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote: > >> (1) I believe the reasoning for Tom's earlier change was not to reduce > >> the I/O between the backend and the pgstat process [...] > > > Tom, any comments on this? Your change introduced an undocumented > > regression into 8.2. I think you're on the hook for a documentation > > update at the very least, if not a revert. > > The documentation update seems the most prudent thing to me. The > problem with the prior behavior is that it guarantees that every table > in the database will eventually have a pg_stat entry, even if > stats_row_level and stats_block_level are both off. In a DB with lots > of tables that creates a significant overhead *for a feature the DBA > probably thinks is turned off*. This is not how it worked before 8.2, > and so 8.2.0's behavior is arguably a performance regression compared > to 8.1 and before. > > Now this patch went in before we realized that 8.2.x had a bug in > computing the stats-file-update delay, and it could be that after fixing > that the problem is not so pressing. But I don't particularly care for > new features that impose a performance penalty on those who aren't using > them, and that's exactly what last_vacuum/last_analyze tracking does if > we allow it to bloat the stats file in the default configuration. > > The long-term answer of course is to devise a more efficient stats > reporting scheme, but I'm not sure offhand what that would look like. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +