Thread: No space left on device log message storm

No space left on device log message storm

From

Bryan Henderson

Date:

05 July 2011, 11:21:00

When the filesystem containing my database fills up, the server repeats
the following log message about as fast as it can log:

  Jun 29 23:00:55 src@giraffe postgres: LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No
spaceleft on device 

Is this an infinite loop or the server just trying a lot of independent
temporary statistics file updates?  This server handles about 5 transactions
per minute.

If it's an infinite loop, does anyone have a feeling whether it's recursive
(the act of logging the error stimulates another attempt) or iterative (the
server just can't take no for an answer)?

--
Bryan Henderson                                   San Jose, California

No space left on device log message storm

From

Bryan Henderson

Date:

01 August 2011, 02:37:07

>When the filesystem containing my database fills up, the server repeats
>the following log message about as fast as it can log:
>
>  Jun 29 23:00:55 src@giraffe postgres: LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No
spaceleft on device 

In case anyone finds this in the archives, I'd like to add additional
information I found out:

This is not recursive and it doesn't repeat as fast as the program can run
either.  The failed open attempt and syslog message happen 100 times a
second ad infinitum.

The problem is that something wants the statistics file pg_stat_tmp/pgstat.tmp
updated, so it calls backend_read_statsfile() in pgstat.c.
backend_read_statsfile() signals the statistics collector process to do it,
waits a while, and then looks at the file to see if it got updated and is
still close enough to current to use.  If not, the requester signals the
statistics collector again.  It does this up to 500 times 10 milliseconds
apart (5 seconds).  After that, it returns.  It has no way of returning a
failure.

And for reasons I don't understand, backend_read_statsfile() gets called ad
infinitum.

The statistics collector process updates the statistics file by creating a new
one, then atomically renaming.  With the filesystem full, it can't ever create
the new one, so all the signalling of the statistics collector to update is in
vain.

Though there's evidence in the code that failure to update the statistics is
not a critical problem, I can't see any way to make the server stop trying in
vain to update it, so I plan just to change the code to make the statistics
collector terminate the server when it can't update the statistics file.

--
Bryan Henderson                                   San Jose, California