Thread: Location for pgstat.stat
Per this thread: http://archives.postgresql.org/pgsql-general/2007-12/msg00255.php it was pretty much (again, IIRC) concluded that we want "some better way" to transfer the stats data. But pending that we have that, how about we just move it into it's own subdirectory? AFAICS that would be a simple change of two #defines moving it from "global/pgstat.stat" to "pgstat/pgstat.stat" or something like that. Might also need some code to create the directory if it doesn't exist, but that shouldn't be hard. This would make it possible to symlink or mount that directory off to a ramdrive (for example). It's not a perfect solution, but it would at least give a better tool than we have today, no? //Magnus
Magnus Hagander <magnus@hagander.net> writes: > But pending that we have that, how about we just move it into it's own > subdirectory? > This would make it possible to symlink or mount that directory off to a > ramdrive (for example). Hmm ... that would almost certainly result in the stats being lost over a system shutdown. How much do we care? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> But pending that we have that, how about we just move it into it's own >> subdirectory? >> This would make it possible to symlink or mount that directory off to a >> ramdrive (for example). > > Hmm ... that would almost certainly result in the stats being lost over > a system shutdown. How much do we care? Only for those who put it on a ramdrive. The default, unless you move/sync it off, would still be the same as it is today. While not perfect, the performance difference of going to a ramdrive might easily be enough to offset that in some cases, I think. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> Hmm ... that would almost certainly result in the stats being lost over >> a system shutdown. How much do we care? > Only for those who put it on a ramdrive. The default, unless you > move/sync it off, would still be the same as it is today. While not > perfect, the performance difference of going to a ramdrive might easily > be enough to offset that in some cases, I think. Well, what I was wondering about is whether it'd be worth adding logic to copy the file to/from a "safer" location at startup/shutdown. regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> Hmm ... that would almost certainly result in the stats being lost over >>> a system shutdown. How much do we care? > >> Only for those who put it on a ramdrive. The default, unless you >> move/sync it off, would still be the same as it is today. While not >> perfect, the performance difference of going to a ramdrive might easily >> be enough to offset that in some cases, I think. > > Well, what I was wondering about is whether it'd be worth adding logic > to copy the file to/from a "safer" location at startup/shutdown. Oh, I see. I should think more before I answer sometimes :-) Not sure. I guess my own personal concern would be how badly is autovacuum affected by having to start off a blank set of stats? Any other uses I have I think are capable of dealing with reset-to-zero states. //Magnus
Magnus Hagander wrote: > Not sure. I guess my own personal concern would be how badly is > autovacuum affected by having to start off a blank set of stats? Any > other uses I have I think are capable of dealing with reset-to-zero states. Well, it doesn't :-) No database or table will be processed until stat entries are created, and then I think it will first wait until enough activity gathers to take any actions at all. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Magnus Hagander wrote: > >> Not sure. I guess my own personal concern would be how badly is >> autovacuum affected by having to start off a blank set of stats? Any >> other uses I have I think are capable of dealing with reset-to-zero states. > > Well, it doesn't :-) No database or table will be processed until stat > entries are created, and then I think it will first wait until enough > activity gathers to take any actions at all. That's not actualliy not affected, but it does seem like it wouldn't be a very big issue. If one table was just about to be vacuumed or analyzed, this would just push it up to twice the threshold, right? //Magnus
On Tue, 1 Jul 2008, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> Hmm ... that would almost certainly result in the stats being lost over >>> a system shutdown. How much do we care? > >> Only for those who put it on a ramdrive. The default, unless you >> move/sync it off, would still be the same as it is today. While not >> perfect, the performance difference of going to a ramdrive might easily >> be enough to offset that in some cases, I think. > > Well, what I was wondering about is whether it'd be worth adding logic > to copy the file to/from a "safer" location at startup/shutdown. Anyone who needs fast stats storage enough that they're going to symlink it to RAM should be perfectly capable of scripting server startup/shutdown to shuffle that to/from a more permanent location. Compared to the admin chores you're likely to encounter before reaching that scale it's a pretty easy job, and it's not like losing that data file is a giant loss in any case. The only thing I could see putting into the server code to help support this situation is rejecting an old stats file and starting from scratch instead if they restored a previous version after a crash that didn't save an updated copy. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Magnus Hagander <magnus@hagander.net> writes: > Alvaro Herrera wrote: >> Well, it doesn't :-) No database or table will be processed until stat >> entries are created, and then I think it will first wait until enough >> activity gathers to take any actions at all. > That's not actualliy not affected, but it does seem like it wouldn't be > a very big issue. If one table was just about to be vacuumed or > analyzed, this would just push it up to twice the threshold, right? Except you could lather, rinse, repeat indefinitely. The stats system started out with the idea that the stats were disposable, but I don't really think that's an acceptable behavior today. We don't even have stats_reset_on_server_start anymore. It doesn't seem to me that it'd be hard to support two locations for the stats file --- it'd just take another parameter to the read and write routines. pgstat.c already knows the difference between a normal write and a shutdown write ... regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Alvaro Herrera wrote: >>> Well, it doesn't :-) No database or table will be processed until stat >>> entries are created, and then I think it will first wait until enough >>> activity gathers to take any actions at all. > >> That's not actualliy not affected, but it does seem like it wouldn't be >> a very big issue. If one table was just about to be vacuumed or >> analyzed, this would just push it up to twice the threshold, right? > > Except you could lather, rinse, repeat indefinitely. Yeha, but if you do that, you certainly have other problems as well.... > The stats system started out with the idea that the stats were > disposable, but I don't really think that's an acceptable behavior > today. We don't even have stats_reset_on_server_start anymore. Good point. > It doesn't seem to me that it'd be hard to support two locations for the > stats file --- it'd just take another parameter to the read and write > routines. pgstat.c already knows the difference between a normal write > and a shutdown write ... Right. Should it be removed from the permanent location when the server starts? Otherwise, if it crashes, we'll pick up the old, stale, version of the file since it didn't have a chance to get saved away. Better to start from an empty file, or to start from one that has old data in it? //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> It doesn't seem to me that it'd be hard to support two locations for the >> stats file --- it'd just take another parameter to the read and write >> routines. pgstat.c already knows the difference between a normal write >> and a shutdown write ... > Right. Should it be removed from the permanent location when the server > starts? Yes, I would say so. There are two possible exit paths: normal shutdown (where we'd write a new file) and crash. In a crash we'd wish to delete the file anyway for fear that it's corrupted. Startup: read permanent file, then delete it. Post-crash: remove any permanent file (same as now) Shutdown: write permanent file. Normal stats collector write: write temp file. Backend stats fetch: read temp file. regards, tom lane
On Jul 1, 2008, at 3:02 PM, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Alvaro Herrera wrote: >>> Well, it doesn't :-) No database or table will be processed >>> until stat >>> entries are created, and then I think it will first wait until >>> enough >>> activity gathers to take any actions at all. > >> That's not actualliy not affected, but it does seem like it >> wouldn't be >> a very big issue. If one table was just about to be vacuumed or >> analyzed, this would just push it up to twice the threshold, right? > > Except you could lather, rinse, repeat indefinitely. > > The stats system started out with the idea that the stats were > disposable, but I don't really think that's an acceptable behavior > today. We don't even have stats_reset_on_server_start anymore. > > It doesn't seem to me that it'd be hard to support two locations > for the > stats file --- it'd just take another parameter to the read and write > routines. pgstat.c already knows the difference between a normal > write > and a shutdown write ... Leaving the realm of "an easy change", what about periodically (once a minute?) writing stats to a real table? That means we should never have to suffer corrupted or lost stats on a crash. Along the same lines, perhaps we can just keep updates in shared memory instead of in a file, since that's proven to cause issues for some people. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Decibel! <decibel@decibel.org> writes: > Leaving the realm of "an easy change", what about periodically (once > a minute?) writing stats to a real table? The ensuing vacuum overhead seems a sufficient reason why not. regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> It doesn't seem to me that it'd be hard to support two locations for the >>> stats file --- it'd just take another parameter to the read and write >>> routines. pgstat.c already knows the difference between a normal write >>> and a shutdown write ... > >> Right. Should it be removed from the permanent location when the server >> starts? > > Yes, I would say so. There are two possible exit paths: normal shutdown > (where we'd write a new file) and crash. In a crash we'd wish to delete > the file anyway for fear that it's corrupted. > > Startup: read permanent file, then delete it. > > Post-crash: remove any permanent file (same as now) > > Shutdown: write permanent file. > > Normal stats collector write: write temp file. > > Backend stats fetch: read temp file. Attached is a patch that implements this. I went with the option of just storing it in a temporary directory that can be symlinked, and not bothering with a GUC for it. Comments? (documentation updates are also needed, but I'll wait with those until I hear patch comments :-P) //Magnus Index: backend/postmaster/pgstat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.176 diff -c -r1.176 pgstat.c *** backend/postmaster/pgstat.c 30 Jun 2008 10:58:47 -0000 1.176 --- backend/postmaster/pgstat.c 4 Aug 2008 09:39:23 -0000 *************** *** 67,74 **** * Paths for the statistics files (relative to installation's $PGDATA). * ---------- */ ! #define PGSTAT_STAT_FILENAME "global/pgstat.stat" ! #define PGSTAT_STAT_TMPFILE "global/pgstat.tmp" /* ---------- * Timer definitions. --- 67,76 ---- * Paths for the statistics files (relative to installation's $PGDATA). * ---------- */ ! #define PGSTAT_STAT_PERMANENT_FILENAME "global/pgstat.stat" ! #define PGSTAT_STAT_PERMANENT_TMPFILE "global/pgstat.tmp" ! #define PGSTAT_STAT_FILENAME "pgstat_tmp/pgstat.stat" ! #define PGSTAT_STAT_TMPFILE "pgstat_tmp/pgstat.tmp" /* ---------- * Timer definitions. *************** *** 218,225 **** static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); ! static void pgstat_write_statsfile(void); ! static HTAB *pgstat_read_statsfile(Oid onlydb); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); --- 220,227 ---- static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); ! static void pgstat_write_statsfile(bool permanent); ! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); *************** *** 509,514 **** --- 511,517 ---- pgstat_reset_all(void) { unlink(PGSTAT_STAT_FILENAME); + unlink(PGSTAT_STAT_PERMANENT_FILENAME); } #ifdef EXEC_BACKEND *************** *** 2595,2601 **** * zero. */ pgStatRunningInCollector = true; ! pgStatDBHash = pgstat_read_statsfile(InvalidOid); /* * Setup the descriptor set for select(2). Since only one bit in the set --- 2598,2604 ---- * zero. */ pgStatRunningInCollector = true; ! pgStatDBHash = pgstat_read_statsfile(InvalidOid, true); /* * Setup the descriptor set for select(2). Since only one bit in the set *************** *** 2635,2641 **** if (!PostmasterIsAlive(true)) break; ! pgstat_write_statsfile(); need_statwrite = false; need_timer = true; } --- 2638,2644 ---- if (!PostmasterIsAlive(true)) break; ! pgstat_write_statsfile(false); need_statwrite = false; need_timer = true; } *************** *** 2803,2809 **** /* * Save the final stats to reuse at next startup. */ ! pgstat_write_statsfile(); exit(0); } --- 2806,2812 ---- /* * Save the final stats to reuse at next startup. */ ! pgstat_write_statsfile(true); exit(0); } *************** *** 2891,2897 **** * ---------- */ static void ! pgstat_write_statsfile(void) { HASH_SEQ_STATUS hstat; HASH_SEQ_STATUS tstat; --- 2894,2900 ---- * ---------- */ static void ! pgstat_write_statsfile(bool permanent) { HASH_SEQ_STATUS hstat; HASH_SEQ_STATUS tstat; *************** *** 2901,2917 **** PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; /* * Open the statistics temp file to write out the current values. */ ! fpout = fopen(PGSTAT_STAT_TMPFILE, PG_BINARY_W); if (fpout == NULL) { ereport(LOG, (errcode_for_file_access(), errmsg("could not open temporary statistics file \"%s\": %m", ! PGSTAT_STAT_TMPFILE))); return; } --- 2904,2922 ---- PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; + const char *tmpfile = permanent?PGSTAT_STAT_PERMANENT_TMPFILE:PGSTAT_STAT_TMPFILE; + const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME; /* * Open the statistics temp file to write out the current values. */ ! fpout = fopen(tmpfile, PG_BINARY_W); if (fpout == NULL) { ereport(LOG, (errcode_for_file_access(), errmsg("could not open temporary statistics file \"%s\": %m", ! tmpfile))); return; } *************** *** 2978,3002 **** ereport(LOG, (errcode_for_file_access(), errmsg("could not write temporary statistics file \"%s\": %m", ! PGSTAT_STAT_TMPFILE))); fclose(fpout); ! unlink(PGSTAT_STAT_TMPFILE); } else if (fclose(fpout) < 0) { ereport(LOG, (errcode_for_file_access(), errmsg("could not close temporary statistics file \"%s\": %m", ! PGSTAT_STAT_TMPFILE))); ! unlink(PGSTAT_STAT_TMPFILE); } ! else if (rename(PGSTAT_STAT_TMPFILE, PGSTAT_STAT_FILENAME) < 0) { ereport(LOG, (errcode_for_file_access(), errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m", ! PGSTAT_STAT_TMPFILE, PGSTAT_STAT_FILENAME))); ! unlink(PGSTAT_STAT_TMPFILE); } } --- 2983,3007 ---- ereport(LOG, (errcode_for_file_access(), errmsg("could not write temporary statistics file \"%s\": %m", ! tmpfile))); fclose(fpout); ! unlink(tmpfile); } else if (fclose(fpout) < 0) { ereport(LOG, (errcode_for_file_access(), errmsg("could not close temporary statistics file \"%s\": %m", ! tmpfile))); ! unlink(tmpfile); } ! else if (rename(tmpfile, statfile) < 0) { ereport(LOG, (errcode_for_file_access(), errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m", ! tmpfile, statfile))); ! unlink(tmpfile); } } *************** *** 3006,3015 **** * * Reads in an existing statistics collector file and initializes the * databases' hash table (whose entries point to the tables' hash tables). * ---------- */ static HTAB * ! pgstat_read_statsfile(Oid onlydb) { PgStat_StatDBEntry *dbentry; PgStat_StatDBEntry dbbuf; --- 3011,3025 ---- * * Reads in an existing statistics collector file and initializes the * databases' hash table (whose entries point to the tables' hash tables). + * + * If reading from the permanent file (which happens during collector + * startup, but never from backends), the file is removed once it's been + * successfully read. The temporary file is also removed at this time, + * to make sure backends don't read data from previous runs. * ---------- */ static HTAB * ! pgstat_read_statsfile(Oid onlydb, bool permanent) { PgStat_StatDBEntry *dbentry; PgStat_StatDBEntry dbbuf; *************** *** 3024,3029 **** --- 3034,3040 ---- FILE *fpin; int32 format_id; bool found; + const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME; /* * The tables will live in pgStatLocalContext. *************** *** 3052,3058 **** * return zero for anything and the collector simply starts from scratch * with empty counters. */ ! if ((fpin = AllocateFile(PGSTAT_STAT_FILENAME, PG_BINARY_R)) == NULL) return dbhash; /* --- 3063,3069 ---- * return zero for anything and the collector simply starts from scratch * with empty counters. */ ! if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL) return dbhash; /* *************** *** 3241,3246 **** --- 3252,3263 ---- done: FreeFile(fpin); + if (permanent) + { + unlink(PGSTAT_STAT_PERMANENT_FILENAME); + unlink(PGSTAT_STAT_FILENAME); + } + return dbhash; } *************** *** 3259,3267 **** /* Autovacuum launcher wants stats about all databases */ if (IsAutoVacuumLauncherProcess()) ! pgStatDBHash = pgstat_read_statsfile(InvalidOid); else ! pgStatDBHash = pgstat_read_statsfile(MyDatabaseId); } --- 3276,3284 ---- /* Autovacuum launcher wants stats about all databases */ if (IsAutoVacuumLauncherProcess()) ! pgStatDBHash = pgstat_read_statsfile(InvalidOid, false); else ! pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false); } Index: bin/initdb/initdb.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.158 diff -c -r1.158 initdb.c *** bin/initdb/initdb.c 19 Jul 2008 04:01:29 -0000 1.158 --- bin/initdb/initdb.c 4 Aug 2008 09:39:23 -0000 *************** *** 2461,2467 **** "pg_multixact/offsets", "base", "base/1", ! "pg_tblspc" }; progname = get_progname(argv[0]); --- 2461,2468 ---- "pg_multixact/offsets", "base", "base/1", ! "pg_tblspc", ! "pgstat_tmp" }; progname = get_progname(argv[0]);
Magnus Hagander <magnus@hagander.net> writes: > Attached is a patch that implements this. I went with the option of just > storing it in a temporary directory that can be symlinked, and not > bothering with a GUC for it. Comments? (documentation updates are also > needed, but I'll wait with those until I hear patch comments :-P) Looks alright in a fast once-over (I didn't test it). Two comments: Treating the directory as something to create in initdb means you'll need to bump catversion when you apply it. I'm not sure where you are planning to document, but there should at least be a mention in the "database physical layout" chapter, since that's supposed to enumerate all the subdirectories of $PGDATA. regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Attached is a patch that implements this. I went with the option of just >> storing it in a temporary directory that can be symlinked, and not >> bothering with a GUC for it. Comments? (documentation updates are also >> needed, but I'll wait with those until I hear patch comments :-P) > > Looks alright in a fast once-over (I didn't test it). That's what I was after. I tested it myself, obviously :-) Not promising zero bugs, but I was looking for the comment on the approach. So thanks! > Two comments: > Treating the directory as something to create in initdb means you'll > need to bump catversion when you apply it. Yeah, i meant to do that as part of the commit. But thanks for the reminder anyway! > I'm not sure where you are > planning to document, but there should at least be a mention in the > "database physical layout" chapter, since that's supposed to enumerate > all the subdirectories of $PGDATA. I'm putting it under "configuring the statistics collector". And I'll add a directory in that section - had missed that. //Magnus