Thread: Moving pgstat.stat and pgstat.tmp

Moving pgstat.stat and pgstat.tmp

From
Erik Jones
Date:
Hi, I'm currently doctoring a situation wherein we've got table
inheritance scheme that over the years that has ballooned like only
in your nightmares (think well over 100K tables + indexes on those).
The obvious solution is to re-design the schema with a better
partitioning scheme in mind (see another msg from me later today on
that) but that's a big project that's just getting underway and an
immediate concern is the I/O on out data partition due in large part
to the stats file(s) getting hammered.  We can verify this by looking
at our write volume 45+ Mbits/s and watching it drop to well below 10
on average when we disable stat_row_level as well as watching the
insane amounts of writes to pgstat.tmp when running the rwsnoop
dtrace script.

So, for the interim we're looking to move where the stats files are
written to.  I've made the changes to the file paths for pgstat.stat
and pgstat.tmp in src/backend/postmaster/pgstat.c, recompiled and
verified that everything seems to be working ok on our test machine.
However, seeing as how I'm not all that familiar with the code base,
I'm asking here:  is that all I need to do?  Is there anything I've
missed?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Moving pgstat.stat and pgstat.tmp

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> Hi, I'm currently doctoring a situation wherein we've got table
> inheritance scheme that over the years that has ballooned like only
> in your nightmares (think well over 100K tables + indexes on those).
> The obvious solution is to re-design the schema with a better
> partitioning scheme in mind (see another msg from me later today on
> that) but that's a big project that's just getting underway and an
> immediate concern is the I/O on out data partition due in large part
> to the stats file(s) getting hammered.

Which PG version?  Early 8.2.x releases had a nasty bug that caused
excessive stats file writes.

            regards, tom lane

Re: Moving pgstat.stat and pgstat.tmp

From
Erik Jones
Date:
On Dec 3, 2007, at 4:16 PM, Tom Lane wrote:

> Erik Jones <erik@myemma.com> writes:
>> Hi, I'm currently doctoring a situation wherein we've got table
>> inheritance scheme that over the years that has ballooned like only
>> in your nightmares (think well over 100K tables + indexes on those).
>> The obvious solution is to re-design the schema with a better
>> partitioning scheme in mind (see another msg from me later today on
>> that) but that's a big project that's just getting underway and an
>> immediate concern is the I/O on out data partition due in large part
>> to the stats file(s) getting hammered.
>
> Which PG version?  Early 8.2.x releases had a nasty bug that caused
> excessive stats file writes.

8.2.5 on Solaris 10.  Before we upgraded to 8.2.4 it was doing about
65 Mbs/sec.  Interestingly, a while back we were running with the
data directory mounted with forcedirectio and saw none of this, I'm
guessing that fsync calls would have something to do with that?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Moving pgstat.stat and pgstat.tmp

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> 8.2.5 on Solaris 10.  Before we upgraded to 8.2.4 it was doing about
> 65 Mbs/sec.  Interestingly, a while back we were running with the
> data directory mounted with forcedirectio and saw none of this, I'm
> guessing that fsync calls would have something to do with that?

Hmm ... no, because the stats file never gets fsync'd.  I should think
that forcedirectio would have made things worse.

            regards, tom lane

Re: Moving pgstat.stat and pgstat.tmp

From
Erik Jones
Date:
On Dec 3, 2007, at 6:10 PM, Tom Lane wrote:

> Erik Jones <erik@myemma.com> writes:
>> 8.2.5 on Solaris 10.  Before we upgraded to 8.2.4 it was doing about
>> 65 Mbs/sec.  Interestingly, a while back we were running with the
>> data directory mounted with forcedirectio and saw none of this, I'm
>> guessing that fsync calls would have something to do with that?
>
> Hmm ... no, because the stats file never gets fsync'd.  I should think
> that forcedirectio would have made things worse.

Interesting.  If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size.  For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them?  Basically, we'd like to move them onto a RAM  disk to
give our disks a break.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Moving pgstat.stat and pgstat.tmp

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> For now, is renaming the
> #define'd paths for the stats file and temp file sufficient for
> moving them?

I would think so, but haven't tried it.  There definitely shouldn't be
anything outside pgstat.c that's touching them.

            regards, tom lane

Re: Moving pgstat.stat and pgstat.tmp

From
Robert Treat
Date:
On Monday 03 December 2007 20:22, Erik Jones wrote:
> On Dec 3, 2007, at 6:10 PM, Tom Lane wrote:
> > Erik Jones <erik@myemma.com> writes:
> >> 8.2.5 on Solaris 10.  Before we upgraded to 8.2.4 it was doing about
> >> 65 Mbs/sec.  Interestingly, a while back we were running with the
> >> data directory mounted with forcedirectio and saw none of this, I'm
> >> guessing that fsync calls would have something to do with that?
> >
> > Hmm ... no, because the stats file never gets fsync'd.  I should think
> > that forcedirectio would have made things worse.
>
> Interesting.  If this is anything you'd like to look into I can
> provide whatever diagnostic output you need (iostat, vmstat, dtrace
> script outputs, etc...) but I do have to reiterate that we are an
> extreme corner case due to out schema size.  For now, is renaming the
> #define'd paths for the stats file and temp file sufficient for
> moving them?  Basically, we'd like to move them onto a RAM  disk to
> give our disks a break.
>

Yeah, we've noticed the same problem (pgstat is the most active file on the
system... uncovered in much the same way... go solaris).  Actually I was
wondering if it could be done with symlinks, a la moving xlogs. Since we do
custom builds, that's not a real issue, but I was curious.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: Moving pgstat.stat and pgstat.tmp

From
Alvaro Herrera
Date:
Robert Treat wrote:
> On Monday 03 December 2007 20:22, Erik Jones wrote:

> > Interesting.  If this is anything you'd like to look into I can
> > provide whatever diagnostic output you need (iostat, vmstat, dtrace
> > script outputs, etc...) but I do have to reiterate that we are an
> > extreme corner case due to out schema size.  For now, is renaming the
> > #define'd paths for the stats file and temp file sufficient for
> > moving them?  Basically, we'd like to move them onto a RAM  disk to
> > give our disks a break.
>
> Yeah, we've noticed the same problem (pgstat is the most active file on the
> system... uncovered in much the same way... go solaris).  Actually I was
> wondering if it could be done with symlinks, a la moving xlogs.

Not really, because a new file is created and renamed in place each time
it's going to be rewritten.  So the symlink would be lost in the first
file rewrite.

The first idea that comes to mind is to make the path configurable via
GUC, so the user could set it to be written to an in-memory filesystem
(/tmp in Solaris?).  But then I thought, why do we need it to be a file
at all?  Why not use a mmap'ed memory area or something like that, and
only write it to a file on postmaster shutdown?  (Losing the file on
unclean shutdown is not a problem, because the file is removed anyway.)

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"Prefiero omelette con amigos que caviar con tontos"
                                                  (Alain Nonnet)

Re: Moving pgstat.stat and pgstat.tmp

From
Robert Treat
Date:
On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:
> Robert Treat wrote:
> > On Monday 03 December 2007 20:22, Erik Jones wrote:
> > > Interesting.  If this is anything you'd like to look into I can
> > > provide whatever diagnostic output you need (iostat, vmstat, dtrace
> > > script outputs, etc...) but I do have to reiterate that we are an
> > > extreme corner case due to out schema size.  For now, is renaming the
> > > #define'd paths for the stats file and temp file sufficient for
> > > moving them?  Basically, we'd like to move them onto a RAM  disk to
> > > give our disks a break.
> >
> > Yeah, we've noticed the same problem (pgstat is the most active file on
> > the system... uncovered in much the same way... go solaris).  Actually I
> > was wondering if it could be done with symlinks, a la moving xlogs.
>
> Not really, because a new file is created and renamed in place each time
> it's going to be rewritten.  So the symlink would be lost in the first
> file rewrite.
>

Ah yeah, thats what I concluded back then.

> The first idea that comes to mind is to make the path configurable via
> GUC, so the user could set it to be written to an in-memory filesystem
> (/tmp in Solaris?).

Yep, thought of that to, though it was after feature freeze so I didn't
propose it. Course if someone wants to sneak that in it would be cool :-)

> But then I thought, why do we need it to be a file
> at all?  Why not use a mmap'ed memory area or something like that, and
> only write it to a file on postmaster shutdown?  (Losing the file on
> unclean shutdown is not a problem, because the file is removed anyway.)

I suppose you need some facility to spill to disk, so maybe being in a file is
better? Seems it might not be in most cases... I wonder how big a memory
space we (or Erik) need.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: Moving pgstat.stat and pgstat.tmp

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> But then I thought, why do we need it to be a file
> at all?  Why not use a mmap'ed memory area or something like that, and
> only write it to a file on postmaster shutdown?

Yeah, we definitely need some other technology for this.  The difficulty
is in dealing with a highly variably sized chunk of data --- our
existing shmem approach won't work well, and once you get away from that
the old portability question raises its head.

There's also a synchronization issue: how can the stats collector make
updates appear atomic?  mmap by itself doesn't solve that AFAIK.

            regards, tom lane

Re: Moving pgstat.stat and pgstat.tmp

From
Erik Jones
Date:
On Dec 5, 2007, at 7:50 AM, Robert Treat wrote:

> On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:
>> Robert Treat wrote:
>>> On Monday 03 December 2007 20:22, Erik Jones wrote:
>>>> Interesting.  If this is anything you'd like to look into I can
>>>> provide whatever diagnostic output you need (iostat, vmstat, dtrace
>>>> script outputs, etc...) but I do have to reiterate that we are an
>>>> extreme corner case due to out schema size.  For now, is
>>>> renaming the
>>>> #define'd paths for the stats file and temp file sufficient for
>>>> moving them?  Basically, we'd like to move them onto a RAM  disk to
>>>> give our disks a break.
>>>
>>> Yeah, we've noticed the same problem (pgstat is the most active
>>> file on
>>> the system... uncovered in much the same way... go solaris).
>>> Actually I
>>> was wondering if it could be done with symlinks, a la moving xlogs.
>>
>> Not really, because a new file is created and renamed in place
>> each time
>> it's going to be rewritten.  So the symlink would be lost in the
>> first
>> file rewrite.
>>
>
> Ah yeah, thats what I concluded back then.
>
>> The first idea that comes to mind is to make the path configurable
>> via
>> GUC, so the user could set it to be written to an in-memory
>> filesystem
>> (/tmp in Solaris?).
>
> Yep, thought of that to, though it was after feature freeze so I
> didn't
> propose it. Course if someone wants to sneak that in it would be
> cool :-)
>
>> But then I thought, why do we need it to be a file
>> at all?  Why not use a mmap'ed memory area or something like that,
>> and
>> only write it to a file on postmaster shutdown?  (Losing the file on
>> unclean shutdown is not a problem, because the file is removed
>> anyway.)
>
> I suppose you need some facility to spill to disk, so maybe being
> in a file is
> better? Seems it might not be in most cases... I wonder how big a
> memory
> space we (or Erik) need.

What I've done and tested on our test db server is to change lines 65
& 66 in pg_stat.c from

#define PGSTAT_STAT_FILENAME    "global/pgstat.stat"
#define PGSTAT_STAT_TMPFILE     "global/pgstat.tmp"

to

#define PGSTAT_STAT_FILENAME    "global/pg_stats/pgstat.stat"
#define PGSTAT_STAT_TMPFILE     "global/pg_stats/pgstat.tmp"

recompile and then create that pg_stats directory as a symlink to a
directory with a swapfs mounted in it.  Everything seems to be
kosher.  Of course, this adds a bit to our shutdown procedure in the
case where we're going to bounce the actual server in that we need to
make sure to copy the stats file(s) out of the swapfs directory in
order to preserve stats in that case (and back in afterwards, of
course).

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Moving pgstat.stat and pgstat.tmp

From
Erik Jones
Date:
On Dec 5, 2007, at 7:50 AM, Robert Treat wrote:

> On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:
>> Robert Treat wrote:
>>> On Monday 03 December 2007 20:22, Erik Jones wrote:
>>>> Interesting.  If this is anything you'd like to look into I can
>>>> provide whatever diagnostic output you need (iostat, vmstat, dtrace
>>>> script outputs, etc...) but I do have to reiterate that we are an
>>>> extreme corner case due to out schema size.  For now, is
>>>> renaming the
>>>> #define'd paths for the stats file and temp file sufficient for
>>>> moving them?  Basically, we'd like to move them onto a RAM  disk to
>>>> give our disks a break.
>>>
>>> Yeah, we've noticed the same problem (pgstat is the most active
>>> file on
>>> the system... uncovered in much the same way... go solaris).
>>> Actually I
>>> was wondering if it could be done with symlinks, a la moving xlogs.
>>
>> Not really, because a new file is created and renamed in place
>> each time
>> it's going to be rewritten.  So the symlink would be lost in the
>> first
>> file rewrite.
>>
>
> Ah yeah, thats what I concluded back then.
>
>> The first idea that comes to mind is to make the path configurable
>> via
>> GUC, so the user could set it to be written to an in-memory
>> filesystem
>> (/tmp in Solaris?).
>
> Yep, thought of that to, though it was after feature freeze so I
> didn't
> propose it. Course if someone wants to sneak that in it would be
> cool :-)
>
>> But then I thought, why do we need it to be a file
>> at all?  Why not use a mmap'ed memory area or something like that,
>> and
>> only write it to a file on postmaster shutdown?  (Losing the file on
>> unclean shutdown is not a problem, because the file is removed
>> anyway.)
>
> I suppose you need some facility to spill to disk, so maybe being
> in a file is
> better? Seems it might not be in most cases... I wonder how big a
> memory
> space we (or Erik) need.

We made the swapfs 300MB which is actually way more than we need as I
don't think I've seen our pgstat.stat file crack 10MB using the
entirely scientific method of spot-checking :)

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Moving pgstat.stat and pgstat.tmp

From
Bruce Momjian
Date:
Added to TODO:

* Reduce file system activity overhead of statistics file pgstat.stat

  http://archives.postgresql.org/pgsql-general/2007-12/msg00106.php


---------------------------------------------------------------------------

Erik Jones wrote:
> Hi, I'm currently doctoring a situation wherein we've got table
> inheritance scheme that over the years that has ballooned like only
> in your nightmares (think well over 100K tables + indexes on those).
> The obvious solution is to re-design the schema with a better
> partitioning scheme in mind (see another msg from me later today on
> that) but that's a big project that's just getting underway and an
> immediate concern is the I/O on out data partition due in large part
> to the stats file(s) getting hammered.  We can verify this by looking
> at our write volume 45+ Mbits/s and watching it drop to well below 10
> on average when we disable stat_row_level as well as watching the
> insane amounts of writes to pgstat.tmp when running the rwsnoop
> dtrace script.
>
> So, for the interim we're looking to move where the stats files are
> written to.  I've made the changes to the file paths for pgstat.stat
> and pgstat.tmp in src/backend/postmaster/pgstat.c, recompiled and
> verified that everything seems to be working ok on our test machine.
> However, seeing as how I'm not all that familiar with the code base,
> I'm asking here:  is that all I need to do?  Is there anything I've
> missed?
>
> Erik Jones
>
> Software Developer | Emma?
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +