Pgstat.tmp file activity - Mailing list pgsql-admin

From Dan Austin
Subject Pgstat.tmp file activity
Date
Msg-id ED242F3A9037824A952B71B480A5D7ED6DFAA4@ausyd-exmb01.au.ad.westfield.com
Whole thread Raw
Responses Re: Pgstat.tmp file activity
List pgsql-admin
Hi,

Have a major reliability problem with out postgres installation here.
Basically the backends seem to "hang" when the server is under
non-trivial load. It can cope with ~200 concurrent connections (max is
1024) but anything over that and all the backends start to hang. By hang
I mean they answer connection attempts and accept a SQL statement but
take a LONG time to reply.

During this time if I do a "df" on the server the df command will block
as well when it comes to list the partition that postgres's database is
stored on. The box does not appear to be CPU bound and there is little
or no ioswap activity. There is however a LOT of disk writing activity
on the postgres partition.

Vitals:
* Solaris 10 (SPARC)
* Postgres 8.0.2

Solaris 10 has a tool called dtrace which reveals that pid #687 (a
postgres backend) is doing a LOT of writing to a single file (iosnoop.d
for you dtrace hackers):

  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 131072   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4522160  81920   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 131072   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4522160  81920   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 212992   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
[...]

# ps -fp 687
     UID   PID  PPID   C    STIME TTY         TIME CMD
   pgsql   687   686   1   Dec 20 ?          57:09
/usr/local/pgsql/bin/postmaster -D /export/data/pgsql/data

(Hmmm... Interesting. The PID of the process doing all that writing is
always "+1" of its parent PID. I take it that the first child spawned by
postmaster is responsible for updating the pg_stats_activity table and
others?)

This database serves a website and most of its queries (almost all) will
be read-only. So it's not clear why so much disk writing is going on. So
we started to look at exactly what was doing the writing and that's when
we found this file. We have postgres running on two other boxes, both
Solaris, same version of Postgres, and they show similar activity.

We think that maybe Postgres is saturating the disk and this is causing
the "blocking". Mind you the iostat.d program that produced the output
above was running at a time when there was low load and things were
fine.

So my questions are:

1. What is the pgstat.tmp file for and why does a single PID on each
server write almost continuously to it?

2. Why does that file not really exist in the file system according to
"ls"?

3. Where should I look to determine why Postgres is writing so much to
that disk?

Dan

--
Daniel Austin (+61 2 9358 7983)
Internet Technology Manager


--
Confidential: This email is intended only for the person or entity to
which it is addressed. The information contained in this email may be
confidential and privileged. Any retransmission, review, dissemination or
other use of, or taking of any action in reliance upon, this information
by persons or entities other than the intended recipient is prohibited.
If you received this email in error, please let us know by reply and
destroy all copies of it. Thank you.


pgsql-admin by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: How to get OID return value(pls help me)
Next
From: Benjamin Arai
Date:
Subject: Running with fsync=off