Thread: Higher level questions around shared memory stats
Hi, Separate from the minutia in [1] I'd like to discuss a few questions of more general interest. I'll post another question or two later. 1) How to react to corrupted statsfiles? In HEAD we stop reading stats at the point we detect the stats file to be corrupted. The contents of the currently read stats "entry" are zeroed out, but prior entries stay loaded. This means that we can get an entirely inconsisten state of the stats, without really knowing: E.g. if a per-db stats file fails to load halfway through, we'll have already loaded the pg_stat_database entry. Thus pg_stat_database.stats_reset will not be reset, but at the same time we'll only have part of the data in pg_stat_database. That seems like a mess? IMO it'd make more sense to just throw away all stats in that case. Either works for the shmstats patch, it's just a few lines to change. Unless there's support for throwing away stats, I'll change the shmstats patch to match the current behaviour. Currently it resets all "global" stats like bgwriter, checkpointer etc whenever there's a failure, but keeps already loaded database / table / function stats, which doesn't make much sense. 2) What do we want to do with stats when starting up in recovery? Today we throw out stats whenever crash recovery is needed. Which includes starting up a replica DB_SHUTDOWNED_IN_RECOVERY. The shared memory stats patch loads the stats in the startup process (whether there's recovery or not). It's obviously no problem to just mirror the current behaviour, we just need to decide what we want? The reason we throw stats away before recovery seem to originate in concerns around old stats being confused for new stats. Of course, "now" that we have streaming replication, throwing away stats *before* recovery, doesn't provide any sort of protection against that. In fact, currently there's no automatic mechanism whatsoever to get rid of stats for dropped objects on a standby. In the shared memory stats patch, dropped catalog objects cause the stats to also be dropped on the standby. So that whole concern is largely gone. I'm inclined to, for now, either leave the behaviour exactly as it currently is, or to continuing throwing away stats during normal crash recovery, but to stop doing so for DB_SHUTDOWNED_IN_RECOVERY. I think it'd now be feasible to just never throw stats away during crash recovery, by writing out stats during checkpoint/restartpoints, but that's clearly work for later. The alternatives in the prior paragraph in contrast is just a question of when to call the "restore" and when the "throw away" function, a call that has to be made anyway. 3) Does anybody care about preserving the mishmash style of function comments present in pgstat? There's at least: /* ---------- * pgstat_write_db_statsfile() - * Write the stat file for a single database. * * If writing to the permanent file (happens when the collector is /* ---------- * pgstat_get_replslot_entry * * Return the entry of replication slot stats with the given name. Return * NULL if not found and the caller didn't request to create it. /* * Lookup the hash table entry for the specified table. If no hash * table entry exists, initialize it, if the create parameter is true. * Else, return NULL. */ /* ---------- * pgstat_send() - * * Send out one statistics message to the collector * ---------- */ /* * PostPrepare_PgStat * Clean up after successful PREPARE. * * Note: AtEOXact_PgStat is not called during PREPARE. */ ---- or not. Summary indented with two tabs. Longer comment indented with a tab. The function name in the comment or not. Function parens or not. And quite a few more differences. I find these a *pain* to maintain. Most function comments have to be touched to remove references to the stats collector and invariably such changes end up changing formatting as well. Whenever adding a new function I have an internal debate about which comment style to follow. I've already spent a considerable amount of time going through the patch to reduce incidental "format" changes, but there's quite a few more instances that need to be cleaned up. I'm inclined to just do a pass through the files and normalize the comment styles, in a separate commit. Personally I'd go for no ---, no copy of the function name, no tabs - but I don't really care as long as it's consistent. Greetings, Andres Freund [1] https://postgr.es/m/20220329072417.er26q5pxc4pbldn7%40alap3.anarazel.de
On Tue, Mar 29, 2022 at 3:17 PM Andres Freund <andres@anarazel.de> wrote: > 1) How to react to corrupted statsfiles? > > IMO it'd make more sense to just throw away all stats > in that case. That seems reasonable to me. I think there's some downside, in that stats are important, and having some of them might be better than having none of them. On the other hand, stats file corruption should be very rare, and if it's not, we need to fix that. I think what's actually most important here is the error reporting. We need to make it clear, at least via log messages, that something bad has happened. And maybe we should have, inside the stats system, something that keeps track of when the stats file was last recreated from scratch because of a corruption event, separately from when it was last intentionally reset. > 2) What do we want to do with stats when starting up in recovery? > > Today we throw out stats whenever crash recovery is needed. Which includes > starting up a replica DB_SHUTDOWNED_IN_RECOVERY. > > The shared memory stats patch loads the stats in the startup process (whether > there's recovery or not). It's obviously no problem to just mirror the current > behaviour, we just need to decide what we want? > > The reason we throw stats away before recovery seem to originate in concerns > around old stats being confused for new stats. Of course, "now" that we have > streaming replication, throwing away stats *before* recovery, doesn't provide > any sort of protection against that. In fact, currently there's no automatic > mechanism whatsoever to get rid of stats for dropped objects on a standby. Does redo update the stats? > 3) Does anybody care about preserving the mishmash style of function comments > present in pgstat? There's at least: I can't speak for everyone in the universe, but I think it should be fine to clean that up. > I'm inclined to just do a pass through the files and normalize the comment > styles, in a separate commit. Personally I'd go for no ---, no copy of the > function name, no tabs - but I don't really care as long as it's consistent. +1 for that style. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2022-03-29 16:24:02 -0400, Robert Haas wrote: > On Tue, Mar 29, 2022 at 3:17 PM Andres Freund <andres@anarazel.de> wrote: > > 1) How to react to corrupted statsfiles? > > > > IMO it'd make more sense to just throw away all stats > > in that case. > > That seems reasonable to me. I think there's some downside, in that > stats are important, and having some of them might be better than > having none of them. On the other hand, stats file corruption should > be very rare, and if it's not, we need to fix that. I think it's reasonably rare because in cases there'd be corruption, we'd typically not even have written them out / throw them away explicitly - we only read stats when starting without crash recovery. So the "expected" case of corruption afaicts solely is a OS crash just after the shutdown checkpoint completed? > I think what's actually most important here is the error reporting. We need > to make it clear, at least via log messages, that something bad has > happened. The message currently (on HEAD, but similarly on the path) is: ereport(pgStatRunningInCollector ? LOG : WARNING, (errmsg("corrupted statistics file \"%s\"", statfile))); > And maybe we should have, inside the stats system, something that > keeps track of when the stats file was last recreated from scratch because > of a corruption event, separately from when it was last intentionally reset. That would be easy to add. Don't think we have a good place to show the information right now - perhaps just new functions not part of any view? I can think of these different times: - Last time stats were removed due to starting up in crash recovery - Last time stats were created from scratch, because no stats data file was present at startup - Last time stats were thrown away due to corruption - Last time a subset of stats were reset using one of the pg_reset* functions Makes sense? > > 2) What do we want to do with stats when starting up in recovery? > > > > Today we throw out stats whenever crash recovery is needed. Which includes > > starting up a replica DB_SHUTDOWNED_IN_RECOVERY. > > > > The shared memory stats patch loads the stats in the startup process (whether > > there's recovery or not). It's obviously no problem to just mirror the current > > behaviour, we just need to decide what we want? > > > > The reason we throw stats away before recovery seem to originate in concerns > > around old stats being confused for new stats. Of course, "now" that we have > > streaming replication, throwing away stats *before* recovery, doesn't provide > > any sort of protection against that. In fact, currently there's no automatic > > mechanism whatsoever to get rid of stats for dropped objects on a standby. > > Does redo update the stats? With "update" do you mean generate new stats? In the shared memory stats patch it triggers stats to be dropped, on HEAD it just resets all stats at startup. Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. Thanks, Andres
On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote: > I think it's reasonably rare because in cases there'd be corruption, we'd > typically not even have written them out / throw them away explicitly - we > only read stats when starting without crash recovery. > > So the "expected" case of corruption afaicts solely is a OS crash just after > the shutdown checkpoint completed? Can we prevent that case from occurring, so that there are no expected cases? > > And maybe we should have, inside the stats system, something that > > keeps track of when the stats file was last recreated from scratch because > > of a corruption event, separately from when it was last intentionally reset. > > That would be easy to add. Don't think we have a good place to show the > information right now - perhaps just new functions not part of any view? I defer to you on where to put it. > I can think of these different times: > > - Last time stats were removed due to starting up in crash recovery > - Last time stats were created from scratch, because no stats data file was > present at startup > - Last time stats were thrown away due to corruption > - Last time a subset of stats were reset using one of the pg_reset* functions > > Makes sense? Yes. Possibly that last could be broken in to two: when all stats were last reset, when some stats were last reset. > > Does redo update the stats? > > With "update" do you mean generate new stats? In the shared memory stats patch > it triggers stats to be dropped, on HEAD it just resets all stats at startup. > > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. Well, I guess what I'm trying to figure out is what happens if we run in recovery for a long time -- say, a year -- and then get promoted. Do we have reasons to expect that the stats will be accurate enough to use at that point, or will they be way off? I don't have a great understanding of how this all works, but if running recovery for a long time is going to lead to a situation where the stats progressively diverge from reality, then preserving them doesn't seem as valuable as if they're going to be more or less accurate. -- Robert Haas EDB: http://www.enterprisedb.com
On 29.03.22 23:01, Andres Freund wrote: >> I think what's actually most important here is the error reporting. We need >> to make it clear, at least via log messages, that something bad has >> happened. > The message currently (on HEAD, but similarly on the path) is: > ereport(pgStatRunningInCollector ? LOG : WARNING, > (errmsg("corrupted statistics file \"%s\"", > statfile))); Corrupted how? How does it know? Is there a checksum, was the file the wrong length, what happened? I think this could use more detail.
Hi, On 2022-03-30 21:44:20 +0200, Peter Eisentraut wrote: > On 29.03.22 23:01, Andres Freund wrote: > > > I think what's actually most important here is the error reporting. We need > > > to make it clear, at least via log messages, that something bad has > > > happened. > > The message currently (on HEAD, but similarly on the path) is: > > ereport(pgStatRunningInCollector ? LOG : WARNING, > > (errmsg("corrupted statistics file \"%s\"", > > statfile))); > > Corrupted how? We can't parse it. Which can mean that it's truncated (we notice this because we expect an 'E' as the last byte), bits flipped in the wrong place (there's different bytes indicating different types of stats). Corruption within individual stats aren't detected. Note that this is very old code / behaviour, not meaningfully affected by shared memory stats patch. > How does it know? Is there a checksum, was the file the wrong length, what > happened? I think this could use more detail. I agree. But it's independent of the shared memory stats patch, so I don't want to tie improving it to that already huge patch. Greetings, Andres Freund
Hi, On 2022-03-30 14:42:23 -0400, Robert Haas wrote: > On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote: > > I think it's reasonably rare because in cases there'd be corruption, we'd > > typically not even have written them out / throw them away explicitly - we > > only read stats when starting without crash recovery. > > > > So the "expected" case of corruption afaicts solely is a OS crash just after > > the shutdown checkpoint completed? > > Can we prevent that case from occurring, so that there are no expected cases? We likely can, at least for the causes of corruption I know of. We already write the statsfile into a temporary filename and then rename into place. I think all we'd need to do is to use durable_rename() to make sure it's durable once renamed into place. It's really unrelated to the shared memory stats patch though, so I'd prefer not to tie it to that. > > I can think of these different times: > > > > - Last time stats were removed due to starting up in crash recovery > > - Last time stats were created from scratch, because no stats data file was > > present at startup > > - Last time stats were thrown away due to corruption > > - Last time a subset of stats were reset using one of the pg_reset* functions > > > > Makes sense? > > Yes. Possibly that last could be broken in to two: when all stats were > last reset, when some stats were last reset. Believe it or not, we don't currently have a function to reset all stats. We should definitely add that though, because the invocation to reset all stats gets more ridiculous^Wcomplicated with each release. I think the minimal invocation currently is something like: -- reset all stats shared between databases SELECT pg_stat_reset_shared('archiver'); SELECT pg_stat_reset_shared('bgwriter'); SELECT pg_stat_reset_shared('wal'); SELECT pg_stat_reset_replication_slot(NULL); SELECT pg_stat_reset_slru(NULL); SELECT pg_stat_reset_subscription_stats(NULL); -- connect to each database and reset the stats in that database SELECT pg_stat_reset(); I've protested against replication slot, slru, subscription stats not being resettable via pg_stat_reset_shared(), nobody else seemed to care. > > > Does redo update the stats? > > > > With "update" do you mean generate new stats? In the shared memory stats patch > > it triggers stats to be dropped, on HEAD it just resets all stats at startup. > > > > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. > > Well, I guess what I'm trying to figure out is what happens if we run > in recovery for a long time -- say, a year -- and then get promoted. > Do we have reasons to expect that the stats will be accurate enough to > use at that point, or will they be way off? What do you mean with 'accurate enough'? With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze, n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay process doesn't update them. In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read, idx_tup_fetch, ...} will be accurate, with one exception below. pg_stat_bgwriter, pg_stat_wal, etc will always be accurate. On HEAD, there may be a lot of dead stats for dropped databases / tables / functions that have been dropped since the start of the cluster. They will eventually get removed, once autovacuum starts running in the respective database (i.e. pgstat_vacuum_stat() gets run). The exception noted above is that because pg_stat_all_tables contents are never removed during recovery, it becomes a lot more plausible for oid conflicts to occur. So the stats for two different tables might get added up accidentally - but that'll just affect the non-zero columns, of course. With the shared memory stats patch, stats for dropped objects (i.e. databases, tables, ... ) are removed shortly after they have been dropped, so that conflict risk doesn't exist anymore. So I don't think increasing inaccuracy is a reason to throw away stats on replica startup. Particularly because we already don't throw them away when promoting the replica, just when having started it last. > I don't have a great understanding of how this all works, but if > running recovery for a long time is going to lead to a situation where > the stats progressively diverge from reality, then preserving them > doesn't seem as valuable as if they're going to be more or less > accurate. Minus the oid wraparound risk on HEAD, the only way they increasingly diverge is that the '0' in a bunch of pg_stat_all_tables columns might get less and less accurate. But that's not the type of divergence you're talking about, I think. Greetings, Andres Freund
Hi, On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > Separate from the minutia in [1] I'd like to discuss a few questions of more > general interest. I'll post another question or two later. 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define / pg_stats_temp directory? With shared memory stats patch, the stats system itself doesn't need it anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store pgss_query_texts.stat. That file can be fairly hot, so there's benefit in having something like stats_temp_directory. I'm inclined to just leave the guc / define / directory around, with a note saying that it's just used by extensions? Greetings, Andres Freund
Hi, On 2022-03-30 16:35:50 -0700, Andres Freund wrote: > On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > > Separate from the minutia in [1] I'd like to discuss a few questions of more > > general interest. I'll post another question or two later. > > 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define / > pg_stats_temp directory? > > With shared memory stats patch, the stats system itself doesn't need it > anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store > pgss_query_texts.stat. That file can be fairly hot, so there's benefit in > having something like stats_temp_directory. > > I'm inclined to just leave the guc / define / directory around, with a > note saying that it's just used by extensions? I had searched before on codesearch.debian.net whether there are external extensions using it, without finding one (just a copy of pgstat.h). Now I searched using https://cs.github.com/ ([1]) and found https://github.com/powa-team/pg_sortstats https://github.com/uptimejp/sql_firewall https://github.com/legrandlegrand/pg_stat_sql_plans https://github.com/ossc-db/pg_store_plans Which seems to weigh in favor of at least keeping the directory and define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR. We currently have code removing files both in pg_stat and the configured pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed. With the shared memory stats patch there's only a single file, so we don't need that anymore. I guess some extension could rely on files being removed somehow, but it's hard to believe, because it'd conflict with the stats collector's files. Greetings, Andres Freund [1] https://cs.github.com/?scopeName=All+repos&scope=&q=PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c
At Wed, 30 Mar 2022 17:09:44 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-03-30 16:35:50 -0700, Andres Freund wrote: > > On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > > > Separate from the minutia in [1] I'd like to discuss a few questions of more > > > general interest. I'll post another question or two later. > > > > 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define / > > pg_stats_temp directory? > > > > With shared memory stats patch, the stats system itself doesn't need it > > anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store > > pgss_query_texts.stat. That file can be fairly hot, so there's benefit in > > having something like stats_temp_directory. > > > > I'm inclined to just leave the guc / define / directory around, with a > > note saying that it's just used by extensions? > > I had searched before on codesearch.debian.net whether there are external > extensions using it, without finding one (just a copy of pgstat.h). Now I > searched using https://cs.github.com/ ([1]) and found > > https://github.com/powa-team/pg_sortstats > https://github.com/uptimejp/sql_firewall > https://github.com/legrandlegrand/pg_stat_sql_plans > https://github.com/ossc-db/pg_store_plans > > Which seems to weigh in favor of at least keeping the directory and > define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR. The varialbe is not being officially exposed to extensions not even in the core. That is, it is defined (non-static) in guc.c but does not appear in header files. I'm not sure why pg_stat_statements decided to use PG_STAT_TMP_DIR instead of trying to use stats_temp_directory (also known in the core as pgstast_temp_directory). I guess all extensions above are just following the pg_stat_statements' practice. At least pg_store_plans does. After moving to shared stats, we might want to expose the GUC variable itself. Then hide/remove the macro PG_STAT_TMP_DIR. This breaks the extensions but it is better than keeping using PG_STAT_TMP_DIR for uncertain reasons. The existence of the macro can be used as the marker of the feature change. This is the chance to break the (I think) bad practice shared among the extensions. At least I am okay with that. #ifdef PG_STAT_TMP_DIR #define PGSP_TEXT_FILE PG_STAT_TMP_DIR "/pgsp_plan_texts.stat" #endif > We currently have code removing files both in pg_stat and the configured > pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching > global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed. > > With the shared memory stats patch there's only a single file, so we don't > need that anymore. I guess some extension could rely on files being removed > somehow, but it's hard to believe, because it'd conflict with the stats > collector's files. Yes, I intentionally avoided using the file names that are removed by the logic in pg_store_plans. But, it is a kind of rare to use such names inadvertently, though.. > Greetings, > > Andres Freund > > > [1] https://cs.github.com/?scopeName=All+repos&scope=&q=PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote: > After moving to shared stats, we might want to expose the GUC variable > itself. Then hide/remove the macro PG_STAT_TMP_DIR. This breaks the > extensions but it is better than keeping using PG_STAT_TMP_DIR for > uncertain reasons. The existence of the macro can be used as the > marker of the feature change. This is the chance to break the (I > think) bad practice shared among the extensions. At least I am okay > with that. I don't really understand why we'd want to do it that way round? Why not just leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the GUC, we're not loosing anything, and we'd not break extensions unnecessarily? Obviously there's no strong demand for pg_stat_statements et al to use the user-configurable stats_temp_directory, given they've not done so for years without complaints? The code to support the configurable stats_temp_dir isn't huge, but it's not small either. Greetings, Andres Freund
At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote: > > After moving to shared stats, we might want to expose the GUC variable > > itself. Then hide/remove the macro PG_STAT_TMP_DIR. This breaks the > > extensions but it is better than keeping using PG_STAT_TMP_DIR for > > uncertain reasons. The existence of the macro can be used as the > > marker of the feature change. This is the chance to break the (I > > think) bad practice shared among the extensions. At least I am okay > > with that. > > I don't really understand why we'd want to do it that way round? Why not just > leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the > GUC, we're not loosing anything, and we'd not break extensions unnecessarily? Yeah, I'm two-sided here. I think so and did so in the past versions of this patch. But when asked anew, I came to think I might want to keep (and make use more of) the configuarable aspect of the dbserver's dedicated temporary directory. The change is reliably detectable on extensions and the requried change is easy. > Obviously there's no strong demand for pg_stat_statements et al to use the > user-configurable stats_temp_directory, given they've not done so for years > without complaints? The code to support the configurable stats_temp_dir isn't > huge, but it's not small either. I even doubt anyone have stats_temp_directory changed in a cluster. Thus I agree that it is reasonable that we take advantage of the chance to remove the feature of little significance. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Mar 31, 2022 at 7:33 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund <andres@anarazel.de> wrote in
> Hi,
>
> On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> > After moving to shared stats, we might want to expose the GUC variable
> > itself. Then hide/remove the macro PG_STAT_TMP_DIR. This breaks the
> > extensions but it is better than keeping using PG_STAT_TMP_DIR for
> > uncertain reasons. The existence of the macro can be used as the
> > marker of the feature change. This is the chance to break the (I
> > think) bad practice shared among the extensions. At least I am okay
> > with that.
>
> I don't really understand why we'd want to do it that way round? Why not just
> leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
> GUC, we're not loosing anything, and we'd not break extensions unnecessarily?
Yeah, I'm two-sided here.
I think so and did so in the past versions of this patch. But when
asked anew, I came to think I might want to keep (and make use more
of) the configuarable aspect of the dbserver's dedicated temporary
directory. The change is reliably detectable on extensions and the
requried change is easy.
> Obviously there's no strong demand for pg_stat_statements et al to use the
> user-configurable stats_temp_directory, given they've not done so for years
> without complaints? The code to support the configurable stats_temp_dir isn't
> huge, but it's not small either.
I even doubt anyone have stats_temp_directory changed in a cluster.
Thus I agree that it is reasonable that we take advantage of the
chance to remove the feature of little significance.
Do we really think no one has taken our advice in the documentation and moved their stats_temp_directory to a RAM-based file system?
It doesn't seem right that extensions are making the decision of where to place their temporary statistics files. If the user has specified a directory for them the system should be placing them there.
The question is whether current uses of PG_STAT_TMP_DIR can be made to use the value of the GUC without them knowing or caring about the fact we changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever the current value of stats_temp_directory is. I take it from the compiler directive of #define that this isn't doable.
Given all that I'd say we remove stats_temp_directory (noting it as being obsolete as only core code was ever given leave to use it and that core code now uses shared memory instead - basically forcing the choice of RAM-based file system onto the user).
If we later want to coerce extensions (and even our own code) to use a user-supplied directory we can then remove the define and give them an API to use instead. I suspect such an API would look different than just "here is a directory name" anyway (e.g., force extensions to use a subdirectory under the base directory for their data). We would name the base directory GUC something like "stats_temp_base_directory" and be happy that stats_temp_directory isn't sitting there already giving us the stink eye.
David J.
Hi, On 2022-03-31 20:06:07 -0700, David G. Johnston wrote: > Do we really think no one has taken our advice in the documentation and > moved their stats_temp_directory to a RAM-based file system? I'm pretty sure some have, I've seen it in the field in the past. > The question is whether current uses of PG_STAT_TMP_DIR can be made to use > the value of the GUC without them knowing or caring about the fact we > changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever > the current value of stats_temp_directory is. I take it from the compiler > directive of #define that this isn't doable. Correct, we can't. > If we later want to coerce extensions (and even our own code) to use a > user-supplied directory we can then remove the define and give them an API > to use instead. FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit further work, a number of such extensions could use the shared memory stats interface to store their additional stats. And they wouldn't have to care about where the files get stored. Greetings, Andres Freund
At Thu, 31 Mar 2022 22:12:25 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-03-31 20:06:07 -0700, David G. Johnston wrote: > > Do we really think no one has taken our advice in the documentation and > > moved their stats_temp_directory to a RAM-based file system? > > I'm pretty sure some have, I've seen it in the field in the past. Oh. But no problem if no extensions enumerated upthread are not used there. If one of them is used, the DBA saw some files generated under pg_stat_tmp.. > > The question is whether current uses of PG_STAT_TMP_DIR can be made to use > > the value of the GUC without them knowing or caring about the fact we > > changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever > > the current value of stats_temp_directory is. I take it from the compiler > > directive of #define that this isn't doable. > > Correct, we can't. > > > > If we later want to coerce extensions (and even our own code) to use a > > user-supplied directory we can then remove the define and give them an API > > to use instead. > > FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit > further work, a number of such extensions could use the shared memory stats > interface to store their additional stats. And they wouldn't have to care > about where the files get stored. pg_stat_statements has moved stored queries from shared memory to file in comparison between memory efficiency and speed. So fast storage could give benefit. I'm not sure the files is flushed so frequently, though. And in the first place But, as Andres saying, currently it *is* stored in the data directory and no one seems complaing. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > Separate from the minutia in [1] I'd like to discuss a few questions of more > general interest. I'll post another question or two later. 5) What does track_counts = off mean? I just was trying to go through the shared memory stats patch to ensure pgstat_track_counts has the same effect as before. It's kinda hard to discern what exactly it supposed to be doing because it's quite inconsistently applied. - all "global" stats ignore it (archiver, bgwriter, checkpointer, slru wal) - subscription, replication slot stats ignore it - *some* database level stats pay heed - pgstat_report_autovac, pgstat_report_connect don't - pgstat_report_recovery_conflict, pgstat_report_deadlock, pgstat_report_checksum_failures_in_db do respect it - function stats have their own setting Unless we conclude something else I'll make sure the patch is "bug compatible". Greetings, Andres Freund
Hi, Alvaro, added you because you were the original author for a lot of that code. Fujii, you touched it last... 6) Should any part of the "reuse_stats" logic in table_recheck_autovac() be kept? With the shared memory stats patch, autovacuum can cheaply access individual stats, so the whole scheme for avoiding stats accesses is moot. So far the patchset had touched autovacuum.c a bit too lightly, removing the autovac_refresh_stats() call and rephrasing a few comments, but not removing e.g. the reuse_stats variable / branches in table_recheck_autovac. Which doesn't seem great. I've just tried to go through and update the autovacuum.c code and comments in light of the shared memory stats patch.. I don't really see a point in keeping any of it - but I was curious whether anybody else does? I'm still polishing, so I didn't want to send a whole new version with these adjustments to the list yet, but you can see the state as of the time of sending this email at [1]. Greetings, Andres Freund [1] https://github.com/anarazel/postgres/commit/276c053110cfe71bf134519e8e4ab053e6d2a7f0#diff-3035fb5dace7bcd77f0eeafe32458cd808c5adb83d62ebdf54f0170cf7db93e7
On 2022-Apr-02, Andres Freund wrote: > 6) Should any part of the "reuse_stats" logic in table_recheck_autovac() be > kept? > > With the shared memory stats patch, autovacuum can cheaply access individual > stats, so the whole scheme for avoiding stats accesses is moot. Agreed, I don't think there's need to keep any of that. > I don't really see a point in keeping any of it - but I was curious whether > anybody else does? I don't either. > I'm still polishing, so I didn't want to send a whole new version with these > adjustments to the list yet, but you can see the state as of the time of > sending this email at [1]. I'll have a look, thanks. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi, On 2022-03-30 14:08:41 -0700, Andres Freund wrote: > On 2022-03-30 14:42:23 -0400, Robert Haas wrote: > > On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote: > > > I can think of these different times: > > > > > > - Last time stats were removed due to starting up in crash recovery > > > - Last time stats were created from scratch, because no stats data file was > > > present at startup > > > - Last time stats were thrown away due to corruption > > > - Last time a subset of stats were reset using one of the pg_reset* functions > > > > > > Makes sense? > > > > Yes. Possibly that last could be broken in to two: when all stats were > > last reset, when some stats were last reset. > > Believe it or not, we don't currently have a function to reset all stats. We > should definitely add that though, because the invocation to reset all stats > gets more ridiculous^Wcomplicated with each release. I assume we'd want all of these to be persistent (until the next time stats are lost, of course)? We currently have the following SQL functions showing reset times: pg_stat_get_bgwriter_stat_reset_time pg_stat_get_db_stat_reset_time and kind of related: pg_stat_get_snapshot_timestamp There's also a few functions showing the last time something has happened, like pg_stat_get_last_analyze_time(). Trying to fit those patterns we could add functions like: pg_stat_get_stats_last_cleared_recovery_time pg_stat_get_stats_last_cleared_corrupted_time pg_stat_get_stats_last_not_present_time pg_stat_get_stats_last_partially_reset_time Not great, but I don't immediately have a better idea. Maybe it'd look better as a view / SRF? pg_stat_stats / pg_stat_get_stats()? Potential column names: - cleared_recovery_time - cleared_corrupted_time - not_preset_time - partially_reset_time It's not a lot of time to code these up. However, it's late in cycle, and they're not suddenly needed due to shared memory stats, so I have a few doubts about adding them now? Greetings, Andres Freund