Home > mailing lists

Thread: WARNING: pgstat wait timeout

WARNING: pgstat wait timeout

From

Таир Сабыргалиев

Date:

02 November 2011, 15:29:44

Hi all!

First of all thank you for this awesome database! We are successfully using PG on top of Linux

in our projects for 2 years so far and are very happy with the performance and observability

features of the database.

Recently we deployed PG 9.0.5-x64 on Windows Server 2008 R2 and noticed very strange behavior:

at some time after firing up the DB it starts logging many 'pgstat wait timeout' messages.

We monitor the IO, and it never goes higher than 10MB/s, whereas the total throughput of DB disks is ~200MB/s.

I wouldn't bother, but the performance degrades soon after the message starts showing up.

After some digging around I found that the reason is that $PGDATA/pg_stat_tmp/pgstat.stat stops updating.

The workaround for me was to SIGHUP the DB -- pgstat.stat starts updating and the warning stops.

Is this expected behavior?

If this is a bug what information will be helpful in this particular situation to include in the bug-report?

Thanks!

Tair Sabirgaliev

Re: WARNING: pgstat wait timeout

From

"Jean-Yves F. Barbier"

Date:

02 November 2011, 16:07:41

On Thu, 3 Nov 2011 00:05:58 +0600
Таир Сабыргалиев <tair.sabirgaliev@bee.kz> wrote:

> We monitor the IO, and it never goes higher than 10MB/s, whereas the total
> throughput of DB disks is ~200MB/s.

This isn't normal: it should be around 190MB/s.

...
> If this is a bug what information will be helpful in this particular
> situation to include in the bug-report?

This is a w$ feature.

--
When a girl marries she exchanges the attentions of many men for the
inattentions of one.
        -- Helen Rowland

Re: WARNING: pgstat wait timeout

From

Tair Sabirgaliev

Date:

04 November 2011, 07:49:11

Sorry for replying to my own message! I'm very novice not only in PG
but in using
mailing-lists also..

> On Thu, 3 Nov 2011 00:05:58 +0600
> Таир Сабыргалиев <tair(dot)sabirgaliev(at)bee(dot)kz> wrote:
>
>> We monitor the IO, and it never goes higher than 10MB/s, whereas the total
>> throughput of DB disks is ~200MB/s.
>
> This isn't normal: it should be around 190MB/s.

Do you mean that my real throughput is actually lower that what I've measured?
Anyway I don't think the warning is a result of too high IO

>
> ...
>> If this is a bug what information will be helpful in this particular
>> situation to include in the bug-report?
>
> This is a w$ feature.
>
> --
> When a girl marries she exchanges the attentions of many men for the
> inattentions of one.
>         -- Helen Rowland

Re: WARNING: pgstat wait timeout

From

"Jean-Yves F. Barbier"

Date:

04 November 2011, 11:09:28

On Fri, 4 Nov 2011 16:49:02 +0600
Tair Sabirgaliev <tair.sabirgaliev@bee.kz> wrote:

> Sorry for replying to my own message! I'm very novice not only in PG
> but in using
> mailing-lists also..

Everybody needs a beginning :)

> > On Thu, 3 Nov 2011 00:05:58 +0600
> > Таир Сабыргалиев <tair(dot)sabirgaliev(at)bee(dot)kz> wrote:
> >
> >> We monitor the IO, and it never goes higher than 10MB/s, whereas the total
> >> throughput of DB disks is ~200MB/s.
> >
> > This isn't normal: it should be around 190MB/s.
>
> Do you mean that my real throughput is actually lower that what I've measured?
> Anyway I don't think the warning is a result of too high IO

No, I was only ironic (toward w$) - the problem you face isn't very easy to
fix because w$ lacks *nix usual tools.
You should search the web for such tools (iotop, analyse system i/o, etc) in
order to be able to identify which program(s) is creating this disk flow.

At first you could take a look into taskmgr: may be the program is using
some CPU resource and you'll be able to identify it while it writes to the disk.

--
"I'd love to go out with you, but I'm converting my calendar watch from
Julian to Gregorian."

Re: WARNING: pgstat wait timeout

From

Tair Sabirgaliev

Date:

05 November 2011, 11:10:57

On Fri, Nov 4, 2011 at 8:09 PM, Jean-Yves F. Barbier <12ukwn@gmail.com> wrote:
> On Fri, 4 Nov 2011 16:49:02 +0600
> Tair Sabirgaliev <tair.sabirgaliev@bee.kz> wrote:
>
>> Sorry for replying to my own message! I'm very novice not only in PG
>> but in using
>> mailing-lists also..
>
> Everybody needs a beginning :)
>
>> > On Thu, 3 Nov 2011 00:05:58 +0600
>> > Таир Сабыргалиев <tair(dot)sabirgaliev(at)bee(dot)kz> wrote:
>> >
>> >> We monitor the IO, and it never goes higher than 10MB/s, whereas the total
>> >> throughput of DB disks is ~200MB/s.
>> >
>> > This isn't normal: it should be around 190MB/s.
>>
>> Do you mean that my real throughput is actually lower that what I've measured?
>> Anyway I don't think the warning is a result of too high IO
>
> No, I was only ironic (toward w$) - the problem you face isn't very easy to
> fix because w$ lacks *nix usual tools.
> You should search the web for such tools (iotop, analyse system i/o, etc) in
> order to be able to identify which program(s) is creating this disk flow.
>
> At first you could take a look into taskmgr: may be the program is using
> some CPU resource and you'll be able to identify it while it writes to the disk.

Thanks!
That's indeed how we found that the system's overall disk IO never
exceeds 10MB/s
at peak times.

The server is 32-core Xeon X7550 with 64GB RAM,
storage: 140GB internal SAS + 1TB FC SAN, all dedicated to PG only.

postgresql.conf modifications:
max_connections = 500
effective_cache_size = 32GB
maintenance_work_mem = 64MB
shared_buffers = 512MB
temp_buffers = 16MB
work_mem = 8MB
shared_preload_libraries = $libdir/pg_stat_statements
checkpoint_segments = 30


We also used SQLIO to do some benchmarking. I'm no expert, I chose SQLIO
because it was simple and didn't need any DB setup. The problem
is that there's no SQLIO guideline specific to PG around, that's why
I'm not sure
my results are valid at all :)

Here are my SQLIO results of writing 8kB blocks using 32 threads,
each thread writing in its own file sequentially for 60 seconds:
$ sqlio.exe -kW -t32 -s60 -b8  -fsequential -Ffiles32.txt
.. snip initialization ..
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 10031.31
MBs/sec: 78.36

The above results made me believe in that the problem is not disk IO.


>
> --
> "I'd love to go out with you, but I'm converting my calendar watch from
> Julian to Gregorian."
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>



--
с уважением,
Таир Сабыргалиев
ТОО "BEE Software"
Республика Казахстан, 010000
г.Астана, ул.Сарайшык 34, ВП-27
Тел.: +7 (7172) 56-89-31
Сот.: +7 (702) 2173359
e-mail: tair.sabirgaliev@bee.kz
Tair Sabirgaliev
"BEE Software" Ltd.
Republic of Kazakhstan, 010000
Astana, Sarayshyk str. 34, sect. 27
Tel.: +7 (7172) 56-89-31
Mob.: +7 (702) 2173359
e-mail: tair.sabirgaliev@bee.kz