Thread: Location for pgstat.stat

Location for pgstat.stat

From
Magnus Hagander
Date:
Per this thread:
http://archives.postgresql.org/pgsql-general/2007-12/msg00255.php

it was pretty much (again, IIRC) concluded that we want "some better
way" to transfer the stats data.

But pending that we have that, how about we just move it into it's own
subdirectory? AFAICS that would be a simple change of two #defines
moving it from "global/pgstat.stat" to "pgstat/pgstat.stat" or something
like that. Might also need some code to create the directory if it
doesn't exist, but that shouldn't be hard.

This would make it possible to symlink or mount that directory off to a
ramdrive (for example).

It's not a perfect solution, but it would at least give a better tool
than we have today, no?

//Magnus


Re: Location for pgstat.stat

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> But pending that we have that, how about we just move it into it's own
> subdirectory?
> This would make it possible to symlink or mount that directory off to a
> ramdrive (for example).

Hmm ... that would almost certainly result in the stats being lost over
a system shutdown.  How much do we care?
        regards, tom lane


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> But pending that we have that, how about we just move it into it's own
>> subdirectory?
>> This would make it possible to symlink or mount that directory off to a
>> ramdrive (for example).
> 
> Hmm ... that would almost certainly result in the stats being lost over
> a system shutdown.  How much do we care?

Only for those who put it on a ramdrive. The default, unless you
move/sync it off, would still be the same as it is today. While not
perfect, the performance difference of going to a ramdrive might easily
be enough to offset that in some cases, I think.

//Magnus



Re: Location for pgstat.stat

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> Hmm ... that would almost certainly result in the stats being lost over
>> a system shutdown.  How much do we care?

> Only for those who put it on a ramdrive. The default, unless you
> move/sync it off, would still be the same as it is today. While not
> perfect, the performance difference of going to a ramdrive might easily
> be enough to offset that in some cases, I think.

Well, what I was wondering about is whether it'd be worth adding logic
to copy the file to/from a "safer" location at startup/shutdown.
        regards, tom lane


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Tom Lane wrote:
>>> Hmm ... that would almost certainly result in the stats being lost over
>>> a system shutdown.  How much do we care?
> 
>> Only for those who put it on a ramdrive. The default, unless you
>> move/sync it off, would still be the same as it is today. While not
>> perfect, the performance difference of going to a ramdrive might easily
>> be enough to offset that in some cases, I think.
> 
> Well, what I was wondering about is whether it'd be worth adding logic
> to copy the file to/from a "safer" location at startup/shutdown.

Oh, I see. I should think more before I answer sometimes :-)

Not sure. I guess my own personal concern would be how badly is
autovacuum affected by having to start off a blank set of stats? Any
other uses I have I think are capable of dealing with reset-to-zero states.

//Magnus



Re: Location for pgstat.stat

From
Alvaro Herrera
Date:
Magnus Hagander wrote:

> Not sure. I guess my own personal concern would be how badly is
> autovacuum affected by having to start off a blank set of stats? Any
> other uses I have I think are capable of dealing with reset-to-zero states.

Well, it doesn't :-)  No database or table will be processed until stat
entries are created, and then I think it will first wait until enough
activity gathers to take any actions at all.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Alvaro Herrera wrote:
> Magnus Hagander wrote:
> 
>> Not sure. I guess my own personal concern would be how badly is
>> autovacuum affected by having to start off a blank set of stats? Any
>> other uses I have I think are capable of dealing with reset-to-zero states.
> 
> Well, it doesn't :-)  No database or table will be processed until stat
> entries are created, and then I think it will first wait until enough
> activity gathers to take any actions at all.

That's not actualliy not affected, but it does seem like it wouldn't be
a very big issue. If one table was just about to be vacuumed or
analyzed, this would just push it up to twice the threshold, right?

//Magnus



Re: Location for pgstat.stat

From
Greg Smith
Date:
On Tue, 1 Jul 2008, Tom Lane wrote:

> Magnus Hagander <magnus@hagander.net> writes:
>> Tom Lane wrote:
>>> Hmm ... that would almost certainly result in the stats being lost over
>>> a system shutdown.  How much do we care?
>
>> Only for those who put it on a ramdrive. The default, unless you
>> move/sync it off, would still be the same as it is today. While not
>> perfect, the performance difference of going to a ramdrive might easily
>> be enough to offset that in some cases, I think.
>
> Well, what I was wondering about is whether it'd be worth adding logic
> to copy the file to/from a "safer" location at startup/shutdown.

Anyone who needs fast stats storage enough that they're going to symlink 
it to RAM should be perfectly capable of scripting server startup/shutdown 
to shuffle that to/from a more permanent location.  Compared to the admin 
chores you're likely to encounter before reaching that scale it's a pretty 
easy job, and it's not like losing that data file is a giant loss in any 
case.  The only thing I could see putting into the server code to help 
support this situation is rejecting an old stats file and starting from 
scratch instead if they restored a previous version after a crash that 
didn't save an updated copy.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Location for pgstat.stat

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Alvaro Herrera wrote:
>> Well, it doesn't :-)  No database or table will be processed until stat
>> entries are created, and then I think it will first wait until enough
>> activity gathers to take any actions at all.

> That's not actualliy not affected, but it does seem like it wouldn't be
> a very big issue. If one table was just about to be vacuumed or
> analyzed, this would just push it up to twice the threshold, right?

Except you could lather, rinse, repeat indefinitely.

The stats system started out with the idea that the stats were
disposable, but I don't really think that's an acceptable behavior
today.  We don't even have stats_reset_on_server_start anymore.

It doesn't seem to me that it'd be hard to support two locations for the
stats file --- it'd just take another parameter to the read and write
routines.  pgstat.c already knows the difference between a normal write
and a shutdown write ...
        regards, tom lane


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Alvaro Herrera wrote:
>>> Well, it doesn't :-)  No database or table will be processed until stat
>>> entries are created, and then I think it will first wait until enough
>>> activity gathers to take any actions at all.
> 
>> That's not actualliy not affected, but it does seem like it wouldn't be
>> a very big issue. If one table was just about to be vacuumed or
>> analyzed, this would just push it up to twice the threshold, right?
> 
> Except you could lather, rinse, repeat indefinitely.

Yeha, but if you do that, you certainly have other problems as well....


> The stats system started out with the idea that the stats were
> disposable, but I don't really think that's an acceptable behavior
> today.  We don't even have stats_reset_on_server_start anymore.

Good point.


> It doesn't seem to me that it'd be hard to support two locations for the
> stats file --- it'd just take another parameter to the read and write
> routines.  pgstat.c already knows the difference between a normal write
> and a shutdown write ...

Right. Should it be removed from the permanent location when the server
starts? Otherwise, if it crashes, we'll pick up the old, stale, version
of the file since it didn't have a chance to get saved away. Better to
start from an empty file, or to start from one that has old data in it?

//Magnus


Re: Location for pgstat.stat

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> It doesn't seem to me that it'd be hard to support two locations for the
>> stats file --- it'd just take another parameter to the read and write
>> routines.  pgstat.c already knows the difference between a normal write
>> and a shutdown write ...

> Right. Should it be removed from the permanent location when the server
> starts?

Yes, I would say so.  There are two possible exit paths: normal shutdown
(where we'd write a new file) and crash.  In a crash we'd wish to delete
the file anyway for fear that it's corrupted.
Startup: read permanent file, then delete it.
Post-crash: remove any permanent file (same as now)
Shutdown: write permanent file.
Normal stats collector write: write temp file.
Backend stats fetch: read temp file.
        regards, tom lane


Re: Location for pgstat.stat

From
Decibel!
Date:
On Jul 1, 2008, at 3:02 PM, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Alvaro Herrera wrote:
>>> Well, it doesn't :-)  No database or table will be processed  
>>> until stat
>>> entries are created, and then I think it will first wait until  
>>> enough
>>> activity gathers to take any actions at all.
>
>> That's not actualliy not affected, but it does seem like it  
>> wouldn't be
>> a very big issue. If one table was just about to be vacuumed or
>> analyzed, this would just push it up to twice the threshold, right?
>
> Except you could lather, rinse, repeat indefinitely.
>
> The stats system started out with the idea that the stats were
> disposable, but I don't really think that's an acceptable behavior
> today.  We don't even have stats_reset_on_server_start anymore.
>
> It doesn't seem to me that it'd be hard to support two locations  
> for the
> stats file --- it'd just take another parameter to the read and write
> routines.  pgstat.c already knows the difference between a normal  
> write
> and a shutdown write ...

Leaving the realm of "an easy change", what about periodically (once  
a minute?) writing stats to a real table? That means we should never  
have to suffer corrupted or lost stats on a crash. Along the same  
lines, perhaps we can just keep updates in shared memory instead of  
in a file, since that's proven to cause issues for some people.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: Location for pgstat.stat

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> Leaving the realm of "an easy change", what about periodically (once  
> a minute?) writing stats to a real table?

The ensuing vacuum overhead seems a sufficient reason why not.
        regards, tom lane


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Tom Lane wrote:
>>> It doesn't seem to me that it'd be hard to support two locations for the
>>> stats file --- it'd just take another parameter to the read and write
>>> routines.  pgstat.c already knows the difference between a normal write
>>> and a shutdown write ...
>
>> Right. Should it be removed from the permanent location when the server
>> starts?
>
> Yes, I would say so.  There are two possible exit paths: normal shutdown
> (where we'd write a new file) and crash.  In a crash we'd wish to delete
> the file anyway for fear that it's corrupted.
>
>     Startup: read permanent file, then delete it.
>
>     Post-crash: remove any permanent file (same as now)
>
>     Shutdown: write permanent file.
>
>     Normal stats collector write: write temp file.
>
>     Backend stats fetch: read temp file.

Attached is a patch that implements this. I went with the option of just
storing it in a temporary directory that can be symlinked, and not
bothering with a GUC for it. Comments? (documentation updates are also
needed, but I'll wait with those until I hear patch comments :-P)


//Magnus
Index: backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -c -r1.176 pgstat.c
*** backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
--- backend/postmaster/pgstat.c    4 Aug 2008 09:39:23 -0000
***************
*** 67,74 ****
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_FILENAME    "global/pgstat.stat"
! #define PGSTAT_STAT_TMPFILE        "global/pgstat.tmp"

  /* ----------
   * Timer definitions.
--- 67,76 ----
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME        "global/pgstat.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE        "global/pgstat.tmp"
! #define PGSTAT_STAT_FILENAME                "pgstat_tmp/pgstat.stat"
! #define PGSTAT_STAT_TMPFILE                    "pgstat_tmp/pgstat.tmp"

  /* ----------
   * Timer definitions.
***************
*** 218,225 ****
  static void pgstat_beshutdown_hook(int code, Datum arg);

  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
! static void pgstat_write_statsfile(void);
! static HTAB *pgstat_read_statsfile(Oid onlydb);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);

--- 220,227 ----
  static void pgstat_beshutdown_hook(int code, Datum arg);

  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);

***************
*** 509,514 ****
--- 511,517 ----
  pgstat_reset_all(void)
  {
      unlink(PGSTAT_STAT_FILENAME);
+     unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  }

  #ifdef EXEC_BACKEND
***************
*** 2595,2601 ****
       * zero.
       */
      pgStatRunningInCollector = true;
!     pgStatDBHash = pgstat_read_statsfile(InvalidOid);

      /*
       * Setup the descriptor set for select(2).    Since only one bit in the set
--- 2598,2604 ----
       * zero.
       */
      pgStatRunningInCollector = true;
!     pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);

      /*
       * Setup the descriptor set for select(2).    Since only one bit in the set
***************
*** 2635,2641 ****
              if (!PostmasterIsAlive(true))
                  break;

!             pgstat_write_statsfile();
              need_statwrite = false;
              need_timer = true;
          }
--- 2638,2644 ----
              if (!PostmasterIsAlive(true))
                  break;

!             pgstat_write_statsfile(false);
              need_statwrite = false;
              need_timer = true;
          }
***************
*** 2803,2809 ****
      /*
       * Save the final stats to reuse at next startup.
       */
!     pgstat_write_statsfile();

      exit(0);
  }
--- 2806,2812 ----
      /*
       * Save the final stats to reuse at next startup.
       */
!     pgstat_write_statsfile(true);

      exit(0);
  }
***************
*** 2891,2897 ****
   * ----------
   */
  static void
! pgstat_write_statsfile(void)
  {
      HASH_SEQ_STATUS hstat;
      HASH_SEQ_STATUS tstat;
--- 2894,2900 ----
   * ----------
   */
  static void
! pgstat_write_statsfile(bool permanent)
  {
      HASH_SEQ_STATUS hstat;
      HASH_SEQ_STATUS tstat;
***************
*** 2901,2917 ****
      PgStat_StatFuncEntry *funcentry;
      FILE       *fpout;
      int32        format_id;

      /*
       * Open the statistics temp file to write out the current values.
       */
!     fpout = fopen(PGSTAT_STAT_TMPFILE, PG_BINARY_W);
      if (fpout == NULL)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                   errmsg("could not open temporary statistics file \"%s\": %m",
!                         PGSTAT_STAT_TMPFILE)));
          return;
      }

--- 2904,2922 ----
      PgStat_StatFuncEntry *funcentry;
      FILE       *fpout;
      int32        format_id;
+     const char *tmpfile = permanent?PGSTAT_STAT_PERMANENT_TMPFILE:PGSTAT_STAT_TMPFILE;
+     const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME;

      /*
       * Open the statistics temp file to write out the current values.
       */
!     fpout = fopen(tmpfile, PG_BINARY_W);
      if (fpout == NULL)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                   errmsg("could not open temporary statistics file \"%s\": %m",
!                         tmpfile)));
          return;
      }

***************
*** 2978,3002 ****
          ereport(LOG,
                  (errcode_for_file_access(),
                 errmsg("could not write temporary statistics file \"%s\": %m",
!                       PGSTAT_STAT_TMPFILE)));
          fclose(fpout);
!         unlink(PGSTAT_STAT_TMPFILE);
      }
      else if (fclose(fpout) < 0)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                 errmsg("could not close temporary statistics file \"%s\": %m",
!                       PGSTAT_STAT_TMPFILE)));
!         unlink(PGSTAT_STAT_TMPFILE);
      }
!     else if (rename(PGSTAT_STAT_TMPFILE, PGSTAT_STAT_FILENAME) < 0)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                   errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
!                         PGSTAT_STAT_TMPFILE, PGSTAT_STAT_FILENAME)));
!         unlink(PGSTAT_STAT_TMPFILE);
      }
  }

--- 2983,3007 ----
          ereport(LOG,
                  (errcode_for_file_access(),
                 errmsg("could not write temporary statistics file \"%s\": %m",
!                       tmpfile)));
          fclose(fpout);
!         unlink(tmpfile);
      }
      else if (fclose(fpout) < 0)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                 errmsg("could not close temporary statistics file \"%s\": %m",
!                       tmpfile)));
!         unlink(tmpfile);
      }
!     else if (rename(tmpfile, statfile) < 0)
      {
          ereport(LOG,
                  (errcode_for_file_access(),
                   errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
!                         tmpfile, statfile)));
!         unlink(tmpfile);
      }
  }

***************
*** 3006,3015 ****
   *
   *    Reads in an existing statistics collector file and initializes the
   *    databases' hash table (whose entries point to the tables' hash tables).
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb)
  {
      PgStat_StatDBEntry *dbentry;
      PgStat_StatDBEntry dbbuf;
--- 3011,3025 ----
   *
   *    Reads in an existing statistics collector file and initializes the
   *    databases' hash table (whose entries point to the tables' hash tables).
+  *
+  *    If reading from the permanent file (which happens during collector
+  *    startup, but never from backends), the file is removed once it's been
+  *    successfully read. The temporary file is also removed at this time,
+  *    to make sure backends don't read data from previous runs.
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent)
  {
      PgStat_StatDBEntry *dbentry;
      PgStat_StatDBEntry dbbuf;
***************
*** 3024,3029 ****
--- 3034,3040 ----
      FILE       *fpin;
      int32        format_id;
      bool        found;
+     const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME;

      /*
       * The tables will live in pgStatLocalContext.
***************
*** 3052,3058 ****
       * return zero for anything and the collector simply starts from scratch
       * with empty counters.
       */
!     if ((fpin = AllocateFile(PGSTAT_STAT_FILENAME, PG_BINARY_R)) == NULL)
          return dbhash;

      /*
--- 3063,3069 ----
       * return zero for anything and the collector simply starts from scratch
       * with empty counters.
       */
!     if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
          return dbhash;

      /*
***************
*** 3241,3246 ****
--- 3252,3263 ----
  done:
      FreeFile(fpin);

+     if (permanent)
+     {
+         unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+         unlink(PGSTAT_STAT_FILENAME);
+     }
+
      return dbhash;
  }

***************
*** 3259,3267 ****

      /* Autovacuum launcher wants stats about all databases */
      if (IsAutoVacuumLauncherProcess())
!         pgStatDBHash = pgstat_read_statsfile(InvalidOid);
      else
!         pgStatDBHash = pgstat_read_statsfile(MyDatabaseId);
  }


--- 3276,3284 ----

      /* Autovacuum launcher wants stats about all databases */
      if (IsAutoVacuumLauncherProcess())
!         pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
      else
!         pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
  }


Index: bin/initdb/initdb.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/initdb/initdb.c,v
retrieving revision 1.158
diff -c -r1.158 initdb.c
*** bin/initdb/initdb.c    19 Jul 2008 04:01:29 -0000    1.158
--- bin/initdb/initdb.c    4 Aug 2008 09:39:23 -0000
***************
*** 2461,2467 ****
          "pg_multixact/offsets",
          "base",
          "base/1",
!         "pg_tblspc"
      };

      progname = get_progname(argv[0]);
--- 2461,2468 ----
          "pg_multixact/offsets",
          "base",
          "base/1",
!         "pg_tblspc",
!         "pgstat_tmp"
      };

      progname = get_progname(argv[0]);

Re: Location for pgstat.stat

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Attached is a patch that implements this. I went with the option of just
> storing it in a temporary directory that can be symlinked, and not
> bothering with a GUC for it. Comments? (documentation updates are also
> needed, but I'll wait with those until I hear patch comments :-P)

Looks alright in a fast once-over (I didn't test it).  Two comments:
Treating the directory as something to create in initdb means you'll
need to bump catversion when you apply it.  I'm not sure where you are
planning to document, but there should at least be a mention in the
"database physical layout" chapter, since that's supposed to enumerate
all the subdirectories of $PGDATA.
        regards, tom lane


Re: Location for pgstat.stat

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Attached is a patch that implements this. I went with the option of just
>> storing it in a temporary directory that can be symlinked, and not
>> bothering with a GUC for it. Comments? (documentation updates are also
>> needed, but I'll wait with those until I hear patch comments :-P)
> 
> Looks alright in a fast once-over (I didn't test it). 

That's what I was after. I tested it myself, obviously :-) Not promising
zero bugs, but I was looking for the comment on the approach. So thanks!


> Two comments:
> Treating the directory as something to create in initdb means you'll
> need to bump catversion when you apply it. 

Yeah, i meant to do that as part of the commit. But thanks for the
reminder anyway!

> I'm not sure where you are
> planning to document, but there should at least be a mention in the
> "database physical layout" chapter, since that's supposed to enumerate
> all the subdirectories of $PGDATA.

I'm putting it under "configuring the statistics collector". And I'll
add a directory in that section - had missed that.

//Magnus