Thread: Higher level questions around shared memory stats

Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

Separate from the minutia in [1] I'd like to discuss a few questions of more
general interest. I'll post another question or two later.


1) How to react to corrupted statsfiles?

In HEAD we stop reading stats at the point we detect the stats file to be
corrupted. The contents of the currently read stats "entry" are zeroed out,
but prior entries stay loaded.

This means that we can get an entirely inconsisten state of the stats, without
really knowing:

E.g. if a per-db stats file fails to load halfway through, we'll have already
loaded the pg_stat_database entry. Thus pg_stat_database.stats_reset will not
be reset, but at the same time we'll only have part of the data in
pg_stat_database.


That seems like a mess? IMO it'd make more sense to just throw away all stats
in that case.  Either works for the shmstats patch, it's just a few lines to
change.

Unless there's support for throwing away stats, I'll change the shmstats patch
to match the current behaviour. Currently it resets all "global" stats like
bgwriter, checkpointer etc whenever there's a failure, but keeps already
loaded database / table / function stats, which doesn't make much sense.


2) What do we want to do with stats when starting up in recovery?

Today we throw out stats whenever crash recovery is needed. Which includes
starting up a replica DB_SHUTDOWNED_IN_RECOVERY.

The shared memory stats patch loads the stats in the startup process (whether
there's recovery or not). It's obviously no problem to just mirror the current
behaviour, we just need to decide what we want?

The reason we throw stats away before recovery seem to originate in concerns
around old stats being confused for new stats. Of course, "now" that we have
streaming replication, throwing away stats *before* recovery, doesn't provide
any sort of protection against that. In fact, currently there's no automatic
mechanism whatsoever to get rid of stats for dropped objects on a standby.


In the shared memory stats patch, dropped catalog objects cause the stats to
also be dropped on the standby. So that whole concern is largely gone.

I'm inclined to, for now, either leave the behaviour exactly as it currently
is, or to continuing throwing away stats during normal crash recovery, but to
stop doing so for DB_SHUTDOWNED_IN_RECOVERY.

I think it'd now be feasible to just never throw stats away during crash
recovery, by writing out stats during checkpoint/restartpoints, but that's
clearly work for later. The alternatives in the prior paragraph in contrast is
just a question of when to call the "restore" and when the "throw away"
function, a call that has to be made anyway.


3) Does anybody care about preserving the mishmash style of function comments
present in pgstat? There's at least:

/* ----------
 * pgstat_write_db_statsfile() -
 *        Write the stat file for a single database.
 *
 *    If writing to the permanent file (happens when the collector is

/* ----------
 * pgstat_get_replslot_entry
 *
 * Return the entry of replication slot stats with the given name. Return
 * NULL if not found and the caller didn't request to create it.

/*
 * Lookup the hash table entry for the specified table. If no hash
 * table entry exists, initialize it, if the create parameter is true.
 * Else, return NULL.
 */

/* ----------
 * pgstat_send() -
 *
 *        Send out one statistics message to the collector
 * ----------
 */

/*
 * PostPrepare_PgStat
 *        Clean up after successful PREPARE.
 *
 * Note: AtEOXact_PgStat is not called during PREPARE.
 */


---- or not. Summary indented with two tabs. Longer comment indented with a
tab. The function name in the comment or not. Function parens or not. And
quite a few more differences.

I find these a *pain* to maintain. Most function comments have to be touched
to remove references to the stats collector and invariably such changes end up
changing formatting as well. Whenever adding a new function I have an internal
debate about which comment style to follow.

I've already spent a considerable amount of time going through the patch to
reduce incidental "format" changes, but there's quite a few more instances
that need to be cleaned up.

I'm inclined to just do a pass through the files and normalize the comment
styles, in a separate commit.  Personally I'd go for no ---, no copy of the
function name, no tabs - but I don't really care as long as it's consistent.

Greetings,

Andres Freund

[1] https://postgr.es/m/20220329072417.er26q5pxc4pbldn7%40alap3.anarazel.de



Re: Higher level questions around shared memory stats

From
Robert Haas
Date:
On Tue, Mar 29, 2022 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> 1) How to react to corrupted statsfiles?
>
> IMO it'd make more sense to just throw away all stats
> in that case.

That seems reasonable to me. I think there's some downside, in that
stats are important, and having some of them might be better than
having none of them. On the other hand, stats file corruption should
be very rare, and if it's not, we need to fix that. I think what's
actually most important here is the error reporting. We need to make
it clear, at least via log messages, that something bad has happened.
And maybe we should have, inside the stats system, something that
keeps track of when the stats file was last recreated from scratch
because of a corruption event, separately from when it was last
intentionally reset.

> 2) What do we want to do with stats when starting up in recovery?
>
> Today we throw out stats whenever crash recovery is needed. Which includes
> starting up a replica DB_SHUTDOWNED_IN_RECOVERY.
>
> The shared memory stats patch loads the stats in the startup process (whether
> there's recovery or not). It's obviously no problem to just mirror the current
> behaviour, we just need to decide what we want?
>
> The reason we throw stats away before recovery seem to originate in concerns
> around old stats being confused for new stats. Of course, "now" that we have
> streaming replication, throwing away stats *before* recovery, doesn't provide
> any sort of protection against that. In fact, currently there's no automatic
> mechanism whatsoever to get rid of stats for dropped objects on a standby.

Does redo update the stats?

> 3) Does anybody care about preserving the mishmash style of function comments
> present in pgstat? There's at least:

I can't speak for everyone in the universe, but I think it should be
fine to clean that up.

> I'm inclined to just do a pass through the files and normalize the comment
> styles, in a separate commit.  Personally I'd go for no ---, no copy of the
> function name, no tabs - but I don't really care as long as it's consistent.

+1 for that style.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-29 16:24:02 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> > 1) How to react to corrupted statsfiles?
> >
> > IMO it'd make more sense to just throw away all stats
> > in that case.
>
> That seems reasonable to me. I think there's some downside, in that
> stats are important, and having some of them might be better than
> having none of them. On the other hand, stats file corruption should
> be very rare, and if it's not, we need to fix that.

I think it's reasonably rare because in cases there'd be corruption, we'd
typically not even have written them out / throw them away explicitly - we
only read stats when starting without crash recovery.

So the "expected" case of corruption afaicts solely is a OS crash just after
the shutdown checkpoint completed?


> I think what's actually most important here is the error reporting. We need
> to make it clear, at least via log messages, that something bad has
> happened.

The message currently (on HEAD, but similarly on the path) is:
                ereport(pgStatRunningInCollector ? LOG : WARNING,
                        (errmsg("corrupted statistics file \"%s\"",
                                statfile)));


> And maybe we should have, inside the stats system, something that
> keeps track of when the stats file was last recreated from scratch because
> of a corruption event, separately from when it was last intentionally reset.

That would be easy to add. Don't think we have a good place to show the
information right now - perhaps just new functions not part of any view?

I can think of these different times:

- Last time stats were removed due to starting up in crash recovery
- Last time stats were created from scratch, because no stats data file was
  present at startup
- Last time stats were thrown away due to corruption
- Last time a subset of stats were reset using one of the pg_reset* functions

Makes sense?


> > 2) What do we want to do with stats when starting up in recovery?
> >
> > Today we throw out stats whenever crash recovery is needed. Which includes
> > starting up a replica DB_SHUTDOWNED_IN_RECOVERY.
> >
> > The shared memory stats patch loads the stats in the startup process (whether
> > there's recovery or not). It's obviously no problem to just mirror the current
> > behaviour, we just need to decide what we want?
> >
> > The reason we throw stats away before recovery seem to originate in concerns
> > around old stats being confused for new stats. Of course, "now" that we have
> > streaming replication, throwing away stats *before* recovery, doesn't provide
> > any sort of protection against that. In fact, currently there's no automatic
> > mechanism whatsoever to get rid of stats for dropped objects on a standby.
>
> Does redo update the stats?

With "update" do you mean generate new stats? In the shared memory stats patch
it triggers stats to be dropped, on HEAD it just resets all stats at startup.

Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.


Thanks,

Andres



Re: Higher level questions around shared memory stats

From
Robert Haas
Date:
On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote:
> I think it's reasonably rare because in cases there'd be corruption, we'd
> typically not even have written them out / throw them away explicitly - we
> only read stats when starting without crash recovery.
>
> So the "expected" case of corruption afaicts solely is a OS crash just after
> the shutdown checkpoint completed?

Can we prevent that case from occurring, so that there are no expected cases?

> > And maybe we should have, inside the stats system, something that
> > keeps track of when the stats file was last recreated from scratch because
> > of a corruption event, separately from when it was last intentionally reset.
>
> That would be easy to add. Don't think we have a good place to show the
> information right now - perhaps just new functions not part of any view?

I defer to you on where to put it.

> I can think of these different times:
>
> - Last time stats were removed due to starting up in crash recovery
> - Last time stats were created from scratch, because no stats data file was
>   present at startup
> - Last time stats were thrown away due to corruption
> - Last time a subset of stats were reset using one of the pg_reset* functions
>
> Makes sense?

Yes. Possibly that last could be broken in to two: when all stats were
last reset, when some stats were last reset.

> > Does redo update the stats?
>
> With "update" do you mean generate new stats? In the shared memory stats patch
> it triggers stats to be dropped, on HEAD it just resets all stats at startup.
>
> Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.

Well, I guess what I'm trying to figure out is what happens if we run
in recovery for a long time -- say, a year -- and then get promoted.
Do we have reasons to expect that the stats will be accurate enough to
use at that point, or will they be way off?

I don't have a great understanding of how this all works, but if
running recovery for a long time is going to lead to a situation where
the stats progressively diverge from reality, then preserving them
doesn't seem as valuable as if they're going to be more or less
accurate.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Higher level questions around shared memory stats

From
Peter Eisentraut
Date:
On 29.03.22 23:01, Andres Freund wrote:
>> I think what's actually most important here is the error reporting. We need
>> to make it clear, at least via log messages, that something bad has
>> happened.
> The message currently (on HEAD, but similarly on the path) is:
>                 ereport(pgStatRunningInCollector ? LOG : WARNING,
>                         (errmsg("corrupted statistics file \"%s\"",
>                                 statfile)));

Corrupted how?  How does it know?  Is there a checksum, was the file the 
wrong length, what happened?  I think this could use more detail.




Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-30 21:44:20 +0200, Peter Eisentraut wrote:
> On 29.03.22 23:01, Andres Freund wrote:
> > > I think what's actually most important here is the error reporting. We need
> > > to make it clear, at least via log messages, that something bad has
> > > happened.
> > The message currently (on HEAD, but similarly on the path) is:
> >                 ereport(pgStatRunningInCollector ? LOG : WARNING,
> >                         (errmsg("corrupted statistics file \"%s\"",
> >                                 statfile)));
> 
> Corrupted how?

We can't parse it. Which can mean that it's truncated (we notice this because
we expect an 'E' as the last byte), bits flipped in the wrong place (there's
different bytes indicating different types of stats). Corruption within
individual stats aren't detected.

Note that this is very old code / behaviour, not meaningfully affected by
shared memory stats patch.


> How does it know?  Is there a checksum, was the file the wrong length, what
> happened?  I think this could use more detail.

I agree. But it's independent of the shared memory stats patch, so I don't
want to tie improving it to that already huge patch.

Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-30 14:42:23 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote:
> > I think it's reasonably rare because in cases there'd be corruption, we'd
> > typically not even have written them out / throw them away explicitly - we
> > only read stats when starting without crash recovery.
> >
> > So the "expected" case of corruption afaicts solely is a OS crash just after
> > the shutdown checkpoint completed?
> 
> Can we prevent that case from occurring, so that there are no expected cases?

We likely can, at least for the causes of corruption I know of. We already
write the statsfile into a temporary filename and then rename into place. I
think all we'd need to do is to use durable_rename() to make sure it's durable
once renamed into place.

It's really unrelated to the shared memory stats patch though, so I'd prefer
not to tie it to that.


> > I can think of these different times:
> >
> > - Last time stats were removed due to starting up in crash recovery
> > - Last time stats were created from scratch, because no stats data file was
> >   present at startup
> > - Last time stats were thrown away due to corruption
> > - Last time a subset of stats were reset using one of the pg_reset* functions
> >
> > Makes sense?
> 
> Yes. Possibly that last could be broken in to two: when all stats were
> last reset, when some stats were last reset.

Believe it or not, we don't currently have a function to reset all stats. We
should definitely add that though, because the invocation to reset all stats
gets more ridiculous^Wcomplicated with each release.

I think the minimal invocation currently is something like:

-- reset all stats shared between databases
SELECT pg_stat_reset_shared('archiver');
SELECT pg_stat_reset_shared('bgwriter');
SELECT pg_stat_reset_shared('wal');
SELECT pg_stat_reset_replication_slot(NULL);
SELECT pg_stat_reset_slru(NULL);
SELECT pg_stat_reset_subscription_stats(NULL);

-- connect to each database and reset the stats in that database
SELECT pg_stat_reset();


I've protested against replication slot, slru, subscription stats not being
resettable via pg_stat_reset_shared(), nobody else seemed to care.


> > > Does redo update the stats?
> >
> > With "update" do you mean generate new stats? In the shared memory stats patch
> > it triggers stats to be dropped, on HEAD it just resets all stats at startup.
> >
> > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.
> 
> Well, I guess what I'm trying to figure out is what happens if we run
> in recovery for a long time -- say, a year -- and then get promoted.
> Do we have reasons to expect that the stats will be accurate enough to
> use at that point, or will they be way off?

What do you mean with 'accurate enough'?

With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze,
n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay
process doesn't update them.

In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read,
idx_tup_fetch, ...} will be accurate, with one exception below.

pg_stat_bgwriter, pg_stat_wal, etc will always be accurate.


On HEAD, there may be a lot of dead stats for dropped databases / tables /
functions that have been dropped since the start of the cluster. They will
eventually get removed, once autovacuum starts running in the respective
database (i.e. pgstat_vacuum_stat() gets run).

The exception noted above is that because pg_stat_all_tables contents are
never removed during recovery, it becomes a lot more plausible for oid
conflicts to occur. So the stats for two different tables might get added up
accidentally - but that'll just affect the non-zero columns, of course.


With the shared memory stats patch, stats for dropped objects (i.e. databases,
tables, ... ) are removed shortly after they have been dropped, so that
conflict risk doesn't exist anymore.


So I don't think increasing inaccuracy is a reason to throw away stats on
replica startup. Particularly because we already don't throw them away when
promoting the replica, just when having started it last.



> I don't have a great understanding of how this all works, but if
> running recovery for a long time is going to lead to a situation where
> the stats progressively diverge from reality, then preserving them
> doesn't seem as valuable as if they're going to be more or less
> accurate.

Minus the oid wraparound risk on HEAD, the only way they increasingly diverge
is that the '0' in a bunch of pg_stat_all_tables columns might get less and
less accurate. But that's not the type of divergence you're talking about, I
think.


Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> Separate from the minutia in [1] I'd like to discuss a few questions of more
> general interest. I'll post another question or two later.

4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define /
   pg_stats_temp directory?

   With shared memory stats patch, the stats system itself doesn't need it
   anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store
   pgss_query_texts.stat. That file can be fairly hot, so there's benefit in
   having something like stats_temp_directory.

   I'm inclined to just leave the guc / define / directory around, with a
   note saying that it's just used by extensions?

Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-30 16:35:50 -0700, Andres Freund wrote:
> On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> > Separate from the minutia in [1] I'd like to discuss a few questions of more
> > general interest. I'll post another question or two later.
>
> 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define /
>    pg_stats_temp directory?
>
>    With shared memory stats patch, the stats system itself doesn't need it
>    anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store
>    pgss_query_texts.stat. That file can be fairly hot, so there's benefit in
>    having something like stats_temp_directory.
>
>    I'm inclined to just leave the guc / define / directory around, with a
>    note saying that it's just used by extensions?

I had searched before on codesearch.debian.net whether there are external
extensions using it, without finding one (just a copy of pgstat.h). Now I
searched using https://cs.github.com/ ([1]) and found

https://github.com/powa-team/pg_sortstats
https://github.com/uptimejp/sql_firewall
https://github.com/legrandlegrand/pg_stat_sql_plans
https://github.com/ossc-db/pg_store_plans

Which seems to weigh in favor of at least keeping the directory and
define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR.


We currently have code removing files both in pg_stat and the configured
pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching
global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed.

With the shared memory stats patch there's only a single file, so we don't
need that anymore. I guess some extension could rely on files being removed
somehow, but it's hard to believe, because it'd conflict with the stats
collector's files.

Greetings,

Andres Freund


[1]
https://cs.github.com/?scopeName=All+repos&scope=&q=PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c



Re: Higher level questions around shared memory stats

From
Kyotaro Horiguchi
Date:
At Wed, 30 Mar 2022 17:09:44 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2022-03-30 16:35:50 -0700, Andres Freund wrote:
> > On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> > > Separate from the minutia in [1] I'd like to discuss a few questions of more
> > > general interest. I'll post another question or two later.
> >
> > 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define /
> >    pg_stats_temp directory?
> >
> >    With shared memory stats patch, the stats system itself doesn't need it
> >    anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store
> >    pgss_query_texts.stat. That file can be fairly hot, so there's benefit in
> >    having something like stats_temp_directory.
> >
> >    I'm inclined to just leave the guc / define / directory around, with a
> >    note saying that it's just used by extensions?
> 
> I had searched before on codesearch.debian.net whether there are external
> extensions using it, without finding one (just a copy of pgstat.h). Now I
> searched using https://cs.github.com/ ([1]) and found
> 
> https://github.com/powa-team/pg_sortstats
> https://github.com/uptimejp/sql_firewall
> https://github.com/legrandlegrand/pg_stat_sql_plans
> https://github.com/ossc-db/pg_store_plans
> 
> Which seems to weigh in favor of at least keeping the directory and
> define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR.

The varialbe is not being officially exposed to extensions not even in
the core. That is, it is defined (non-static) in guc.c but does not
appear in header files. I'm not sure why pg_stat_statements decided to
use PG_STAT_TMP_DIR instead of trying to use stats_temp_directory
(also known in the core as pgstast_temp_directory). I guess all
extensions above are just following the pg_stat_statements' practice.
At least pg_store_plans does.

After moving to shared stats, we might want to expose the GUC variable
itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
extensions but it is better than keeping using PG_STAT_TMP_DIR for
uncertain reasons. The existence of the macro can be used as the
marker of the feature change.  This is the chance to break the (I
think) bad practice shared among the extensions.  At least I am okay
with that.

#ifdef PG_STAT_TMP_DIR
#define PGSP_TEXT_FILE    PG_STAT_TMP_DIR "/pgsp_plan_texts.stat"
#endif

> We currently have code removing files both in pg_stat and the configured
> pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching
> global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed.
> 
> With the shared memory stats patch there's only a single file, so we don't
> need that anymore. I guess some extension could rely on files being removed
> somehow, but it's hard to believe, because it'd conflict with the stats
> collector's files.

Yes, I intentionally avoided using the file names that are removed by
the logic in pg_store_plans.  But, it is a kind of rare to use such
names inadvertently, though..

> Greetings,
> 
> Andres Freund
> 
> 
> [1]
https://cs.github.com/?scopeName=All+repos&scope=&q=PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> After moving to shared stats, we might want to expose the GUC variable
> itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> extensions but it is better than keeping using PG_STAT_TMP_DIR for
> uncertain reasons. The existence of the macro can be used as the
> marker of the feature change.  This is the chance to break the (I
> think) bad practice shared among the extensions.  At least I am okay
> with that.

I don't really understand why we'd want to do it that way round? Why not just
leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
GUC, we're not loosing anything, and we'd not break extensions unnecessarily?

Obviously there's no strong demand for pg_stat_statements et al to use the
user-configurable stats_temp_directory, given they've not done so for years
without complaints?  The code to support the configurable stats_temp_dir isn't
huge, but it's not small either.

Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Kyotaro Horiguchi
Date:
At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> > After moving to shared stats, we might want to expose the GUC variable
> > itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> > extensions but it is better than keeping using PG_STAT_TMP_DIR for
> > uncertain reasons. The existence of the macro can be used as the
> > marker of the feature change.  This is the chance to break the (I
> > think) bad practice shared among the extensions.  At least I am okay
> > with that.
> 
> I don't really understand why we'd want to do it that way round? Why not just
> leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
> GUC, we're not loosing anything, and we'd not break extensions unnecessarily?

Yeah, I'm two-sided here.

I think so and did so in the past versions of this patch.  But when
asked anew, I came to think I might want to keep (and make use more
of) the configuarable aspect of the dbserver's dedicated temporary
directory.  The change is reliably detectable on extensions and the
requried change is easy.

> Obviously there's no strong demand for pg_stat_statements et al to use the
> user-configurable stats_temp_directory, given they've not done so for years
> without complaints?  The code to support the configurable stats_temp_dir isn't
> huge, but it's not small either.

I even doubt anyone have stats_temp_directory changed in a cluster.
Thus I agree that it is reasonable that we take advantage of the
chance to remove the feature of little significance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Higher level questions around shared memory stats

From
"David G. Johnston"
Date:
On Thu, Mar 31, 2022 at 7:33 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund <andres@anarazel.de> wrote in
> Hi,
>
> On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> > After moving to shared stats, we might want to expose the GUC variable
> > itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> > extensions but it is better than keeping using PG_STAT_TMP_DIR for
> > uncertain reasons. The existence of the macro can be used as the
> > marker of the feature change.  This is the chance to break the (I
> > think) bad practice shared among the extensions.  At least I am okay
> > with that.
>
> I don't really understand why we'd want to do it that way round? Why not just
> leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
> GUC, we're not loosing anything, and we'd not break extensions unnecessarily?

Yeah, I'm two-sided here.

I think so and did so in the past versions of this patch.  But when
asked anew, I came to think I might want to keep (and make use more
of) the configuarable aspect of the dbserver's dedicated temporary
directory.  The change is reliably detectable on extensions and the
requried change is easy.

> Obviously there's no strong demand for pg_stat_statements et al to use the
> user-configurable stats_temp_directory, given they've not done so for years
> without complaints?  The code to support the configurable stats_temp_dir isn't
> huge, but it's not small either.

I even doubt anyone have stats_temp_directory changed in a cluster.
Thus I agree that it is reasonable that we take advantage of the
chance to remove the feature of little significance.


Do we really think no one has taken our advice in the documentation and moved their stats_temp_directory to a RAM-based file system?


It doesn't seem right that extensions are making the decision of where to place their temporary statistics files.  If the user has specified a directory for them the system should be placing them there.

The question is whether current uses of PG_STAT_TMP_DIR can be made to use the value of the GUC without them knowing or caring about the fact we changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever the current value of stats_temp_directory is.  I take it from the compiler directive of #define that this isn't doable.

Given all that I'd say we remove stats_temp_directory (noting it as being obsolete as only core code was ever given leave to use it and that core code now uses shared memory instead - basically forcing the choice of RAM-based file system onto the user).

If we later want to coerce extensions (and even our own code) to use a user-supplied directory we can then remove the define and give them an API to use instead.  I suspect such an API would look different than just "here is a directory name" anyway (e.g., force extensions to use a subdirectory under the base directory for their data).  We would name the base directory GUC something like "stats_temp_base_directory" and be happy that stats_temp_directory isn't sitting there already giving us the stink eye.

David J.

Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-31 20:06:07 -0700, David G. Johnston wrote:
> Do we really think no one has taken our advice in the documentation and
> moved their stats_temp_directory to a RAM-based file system?

I'm pretty sure some have, I've seen it in the field in the past.


> The question is whether current uses of PG_STAT_TMP_DIR can be made to use
> the value of the GUC without them knowing or caring about the fact we
> changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever
> the current value of stats_temp_directory is.  I take it from the compiler
> directive of #define that this isn't doable.

Correct, we can't.


> If we later want to coerce extensions (and even our own code) to use a
> user-supplied directory we can then remove the define and give them an API
> to use instead.

FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit
further work, a number of such extensions could use the shared memory stats
interface to store their additional stats. And they wouldn't have to care
about where the files get stored.

Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Kyotaro Horiguchi
Date:
At Thu, 31 Mar 2022 22:12:25 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2022-03-31 20:06:07 -0700, David G. Johnston wrote:
> > Do we really think no one has taken our advice in the documentation and
> > moved their stats_temp_directory to a RAM-based file system?
> 
> I'm pretty sure some have, I've seen it in the field in the past.

Oh. But no problem if no extensions enumerated upthread are not used
there.  If one of them is used, the DBA saw some files generated under
pg_stat_tmp..

> > The question is whether current uses of PG_STAT_TMP_DIR can be made to use
> > the value of the GUC without them knowing or caring about the fact we
> > changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever
> > the current value of stats_temp_directory is.  I take it from the compiler
> > directive of #define that this isn't doable.
> 
> Correct, we can't.
> 
> 
> > If we later want to coerce extensions (and even our own code) to use a
> > user-supplied directory we can then remove the define and give them an API
> > to use instead.
> 
> FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit
> further work, a number of such extensions could use the shared memory stats
> interface to store their additional stats. And they wouldn't have to care
> about where the files get stored.

pg_stat_statements has moved stored queries from shared memory to file
in comparison between memory efficiency and speed.  So fast storage
could give benefit. I'm not sure the files is flushed so frequently,
though.  And in the first place 

But, as Andres saying, currently it *is* stored in the data directory
and no one seems complaing.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> Separate from the minutia in [1] I'd like to discuss a few questions of more
> general interest. I'll post another question or two later.

5) What does track_counts = off mean?

I just was trying to go through the shared memory stats patch to ensure
pgstat_track_counts has the same effect as before.  It's kinda hard to discern
what exactly it supposed to be doing because it's quite inconsistently
applied.

- all "global" stats ignore it (archiver, bgwriter, checkpointer, slru wal)
- subscription, replication slot stats ignore it
- *some* database level stats pay heed
  - pgstat_report_autovac, pgstat_report_connect don't
  - pgstat_report_recovery_conflict, pgstat_report_deadlock,
    pgstat_report_checksum_failures_in_db do respect it
- function stats have their own setting

Unless we conclude something else I'll make sure the patch is "bug
compatible".

Greetings,

Andres Freund



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

Alvaro, added you because you were the original author for a lot of that
code. Fujii, you touched it last...


6) Should any part of the "reuse_stats" logic in table_recheck_autovac() be
kept?

With the shared memory stats patch, autovacuum can cheaply access individual
stats, so the whole scheme for avoiding stats accesses is moot.

So far the patchset had touched autovacuum.c a bit too lightly, removing the
autovac_refresh_stats() call and rephrasing a few comments, but not removing
e.g. the reuse_stats variable / branches in table_recheck_autovac. Which
doesn't seem great.  I've just tried to go through and update the autovacuum.c
code and comments in light of the shared memory stats patch..

I don't really see a point in keeping any of it - but I was curious whether
anybody else does?

I'm still polishing, so I didn't want to send a whole new version with these
adjustments to the list yet, but you can see the state as of the time of
sending this email at [1].

Greetings,

Andres Freund

[1]
https://github.com/anarazel/postgres/commit/276c053110cfe71bf134519e8e4ab053e6d2a7f0#diff-3035fb5dace7bcd77f0eeafe32458cd808c5adb83d62ebdf54f0170cf7db93e7



Re: Higher level questions around shared memory stats

From
Alvaro Herrera
Date:
On 2022-Apr-02, Andres Freund wrote:

> 6) Should any part of the "reuse_stats" logic in table_recheck_autovac() be
> kept?
> 
> With the shared memory stats patch, autovacuum can cheaply access individual
> stats, so the whole scheme for avoiding stats accesses is moot.

Agreed, I don't think there's need to keep any of that.

> I don't really see a point in keeping any of it - but I was curious whether
> anybody else does?

I don't either.

> I'm still polishing, so I didn't want to send a whole new version with these
> adjustments to the list yet, but you can see the state as of the time of
> sending this email at [1].

I'll have a look, thanks.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Higher level questions around shared memory stats

From
Andres Freund
Date:
Hi,

On 2022-03-30 14:08:41 -0700, Andres Freund wrote:
> On 2022-03-30 14:42:23 -0400, Robert Haas wrote:
> > On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote:
> > > I can think of these different times:
> > >
> > > - Last time stats were removed due to starting up in crash recovery
> > > - Last time stats were created from scratch, because no stats data file was
> > >   present at startup
> > > - Last time stats were thrown away due to corruption
> > > - Last time a subset of stats were reset using one of the pg_reset* functions
> > >
> > > Makes sense?
> > 
> > Yes. Possibly that last could be broken in to two: when all stats were
> > last reset, when some stats were last reset.
> 
> Believe it or not, we don't currently have a function to reset all stats. We
> should definitely add that though, because the invocation to reset all stats
> gets more ridiculous^Wcomplicated with each release.

I assume we'd want all of these to be persistent (until the next time stats
are lost, of course)?

We currently have the following SQL functions showing reset times:
  pg_stat_get_bgwriter_stat_reset_time
  pg_stat_get_db_stat_reset_time
and kind of related:
  pg_stat_get_snapshot_timestamp

There's also a few functions showing the last time something has happened,
like pg_stat_get_last_analyze_time().

Trying to fit those patterns we could add functions like:

pg_stat_get_stats_last_cleared_recovery_time
pg_stat_get_stats_last_cleared_corrupted_time
pg_stat_get_stats_last_not_present_time
pg_stat_get_stats_last_partially_reset_time

Not great, but I don't immediately have a better idea.

Maybe it'd look better as a view / SRF? pg_stat_stats / pg_stat_get_stats()?
Potential column names:
- cleared_recovery_time
- cleared_corrupted_time
- not_preset_time
- partially_reset_time

It's not a lot of time to code these up. However, it's late in cycle, and
they're not suddenly needed due to shared memory stats, so I have a few doubts
about adding them now?

Greetings,

Andres Freund