Thread: row-level stats and last analyze time

row-level stats and last analyze time

From
Neil Conway
Date:
[ CC'ing -hackers ]

On Sun, 2007-04-22 at 16:10 +0200, Guillaume Lelarge wrote:
> This patch adds a sentence on monitoring.sgml explaining that
> stats_row_level needs to be enabled if user wants to get last
> vacuum/analyze execution time.

This behavior was introduced in r1.120 of postmaster/pgstat.c:

    Modify pgstats code to reduce performance penalties from
    oversized stats data files: avoid creating stats
    hashtable entries for tables that aren't being touched
    except by vacuum/analyze [...]

which included other modifications to reduce the pgstat I/O volume in
8.1. I don't think this particular change was wise: the reduction in
pgstat volume is pretty marginal, and it is counter-intuitive for
stats_row_level to effect whether the last ANALYZE / VACUUM is recorded.
(Plus, the optimization is not even enabled with the default
postgresql.conf settings.)

-Neil



Re: [DOCS] row-level stats and last analyze time

From
Robert Treat
Date:
On Tuesday 24 April 2007 17:38, Neil Conway wrote:
> [ CC'ing -hackers ]
>
> On Sun, 2007-04-22 at 16:10 +0200, Guillaume Lelarge wrote:
> > This patch adds a sentence on monitoring.sgml explaining that
> > stats_row_level needs to be enabled if user wants to get last
> > vacuum/analyze execution time.
>
> This behavior was introduced in r1.120 of postmaster/pgstat.c:
>
>     Modify pgstats code to reduce performance penalties from
>     oversized stats data files: avoid creating stats
>     hashtable entries for tables that aren't being touched
>     except by vacuum/analyze [...]
>
> which included other modifications to reduce the pgstat I/O volume in
> 8.1. I don't think this particular change was wise: the reduction in
> pgstat volume is pretty marginal, and it is counter-intuitive for
> stats_row_level to effect whether the last ANALYZE / VACUUM is recorded.
> (Plus, the optimization is not even enabled with the default
> postgresql.conf settings.)
>

+1

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: [DOCS] row-level stats and last analyze time

From
Neil Conway
Date:
On Tue, 2007-04-24 at 17:38 -0400, Neil Conway wrote:
> which included other modifications to reduce the pgstat I/O volume in
> 8.1. I don't think this particular change was wise

I looked into this a bit further:

(1) I believe the reasoning for Tom's earlier change was not to reduce
the I/O between the backend and the pgstat process: it was to keep the
in-memory stats hash tables small, and to reduce the amount of data that
needs to be written to disk. When the only stats messages we get for a
table are VACUUM or ANALYZE messages, we discard the message in the
pgstat daemon.

(2) If stats_row_level is false, there won't be a stats hash entry for
any tables, so we can skip sending the VACUUM or ANALYZE message in the
first place, by the same logic. (This is more debatable if the user just
disabled stats_row_level for the current session, although since only a
super-user can do that, perhaps that's OK.)

(3) I don't like the fact that the current coding is so willing to throw
away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
that the DBA might be interested in the last-VACUUM and last-ANALYZE
information for a table which hasn't had live operations applied to it
recently. The rest of the pgstat code has a similarly disappointing
willingness to silently discard messages it doesn't think are worth
keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
other activity, and pgstat_count_xact_commit/rollback() is a no-op
unless *either* row-level or block-level stats are enabled.)

If we're so concerned about saving space in the stats hash tables for
tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
to record the timestamps for database-wide VACUUMs and ANALYZEs
separately from table-local VACUUMs and ANALYZEs? That is, a table's
last_vacuum time could effectively be the max of the last database-wide
vacuum time and the last VACUUM on that particular table. (Recording the
time of the last database-wide VACUUM might be worth doing anyway, e.g.
for avoiding wraparound failure).

Comments?

-Neil



Re: [DOCS] row-level stats and last analyze time

From
Alvaro Herrera
Date:
Neil Conway wrote:

> (3) I don't like the fact that the current coding is so willing to throw
> away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
> that the DBA might be interested in the last-VACUUM and last-ANALYZE
> information for a table which hasn't had live operations applied to it
> recently. The rest of the pgstat code has a similarly disappointing
> willingness to silently discard messages it doesn't think are worth
> keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
> other activity, and pgstat_count_xact_commit/rollback() is a no-op
> unless *either* row-level or block-level stats are enabled.)

One thing to keep in mind is that autovac drives some decision from
whether the database has a pgstat entry or not.  In particular it means
it doesn't bother processing non-connectable databases, unless they are
close to Xid wraparound.

I think this behavior is a useful one, since usually vacuuming those
databases is a waste of time anyway.  Whether to drive it from pgstat or
from somewhere else is another matter, but if you want to drive it from
another mechanism, keep in mind that the autovacuum launcher (which is
the process that makes this decision) is not connected to any database
so it cannot examine any catalog's content.  There are of course ways
around that: for example you could put the information in the
pg_database flatfile.  But it's something to keep in mind if you want to
change it.

> If we're so concerned about saving space in the stats hash tables for
> tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
> to record the timestamps for database-wide VACUUMs and ANALYZEs
> separately from table-local VACUUMs and ANALYZEs? That is, a table's
> last_vacuum time could effectively be the max of the last database-wide
> vacuum time and the last VACUUM on that particular table. (Recording the
> time of the last database-wide VACUUM might be worth doing anyway, e.g.
> for avoiding wraparound failure).

Another thing to keep in mind is that autovacuum does not do
database-wide vacuums anymore -- they are not needed.  Xid wraparound
decisions are handled on a table-by-table basis, so information about
when the last database-wide vacuum was is not needed.

Note that Xid wraparound decisions are driven by information in
pg_class.  So it's not a problem that pgstat may lose the info from this
POV.

The bottom line is that the current pgstat behavior and autovacuum are
closely related.  So if you want to change pgstats you should also keep
an eye on how it's going to affect autovac.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [DOCS] row-level stats and last analyze time

From
Neil Conway
Date:
On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote:
> (1) I believe the reasoning for Tom's earlier change was not to reduce
> the I/O between the backend and the pgstat process [...]

Tom, any comments on this? Your change introduced an undocumented
regression into 8.2. I think you're on the hook for a documentation
update at the very least, if not a revert.

-Neil



Re: [DOCS] row-level stats and last analyze time

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote:
>> (1) I believe the reasoning for Tom's earlier change was not to reduce
>> the I/O between the backend and the pgstat process [...]

> Tom, any comments on this? Your change introduced an undocumented
> regression into 8.2. I think you're on the hook for a documentation
> update at the very least, if not a revert.

The documentation update seems the most prudent thing to me.  The
problem with the prior behavior is that it guarantees that every table
in the database will eventually have a pg_stat entry, even if
stats_row_level and stats_block_level are both off.  In a DB with lots
of tables that creates a significant overhead *for a feature the DBA
probably thinks is turned off*.  This is not how it worked before 8.2,
and so 8.2.0's behavior is arguably a performance regression compared
to 8.1 and before.

Now this patch went in before we realized that 8.2.x had a bug in
computing the stats-file-update delay, and it could be that after fixing
that the problem is not so pressing.  But I don't particularly care for
new features that impose a performance penalty on those who aren't using
them, and that's exactly what last_vacuum/last_analyze tracking does if
we allow it to bloat the stats file in the default configuration.

The long-term answer of course is to devise a more efficient stats
reporting scheme, but I'm not sure offhand what that would look like.

            regards, tom lane

Re: [DOCS] row-level stats and last analyze time

From
Bruce Momjian
Date:
Has this been done yet?  I don't think so.

---------------------------------------------------------------------------

Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > On Thu, 2007-26-04 at 18:07 -0400, Neil Conway wrote:
> >> (1) I believe the reasoning for Tom's earlier change was not to reduce
> >> the I/O between the backend and the pgstat process [...]
>
> > Tom, any comments on this? Your change introduced an undocumented
> > regression into 8.2. I think you're on the hook for a documentation
> > update at the very least, if not a revert.
>
> The documentation update seems the most prudent thing to me.  The
> problem with the prior behavior is that it guarantees that every table
> in the database will eventually have a pg_stat entry, even if
> stats_row_level and stats_block_level are both off.  In a DB with lots
> of tables that creates a significant overhead *for a feature the DBA
> probably thinks is turned off*.  This is not how it worked before 8.2,
> and so 8.2.0's behavior is arguably a performance regression compared
> to 8.1 and before.
>
> Now this patch went in before we realized that 8.2.x had a bug in
> computing the stats-file-update delay, and it could be that after fixing
> that the problem is not so pressing.  But I don't particularly care for
> new features that impose a performance penalty on those who aren't using
> them, and that's exactly what last_vacuum/last_analyze tracking does if
> we allow it to bloat the stats file in the default configuration.
>
> The long-term answer of course is to devise a more efficient stats
> reporting scheme, but I'm not sure offhand what that would look like.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +