Thread: Notify system doesn't recover from "No space" error

Notify system doesn't recover from "No space" error

From

"Kevin Grittner"

Date:

09 February 2012, 12:18:53

On a development system which wasn't being monitored very closely
I think we've uncovered a bug in the listen/notify behavior.  The
system was struggling with lack of disk space for about a week
before this surfaced, so it's probably not the most critical sort
of bug we could have, but it probably merits a fix.
We were getting all sorts of logging about the space problem, and
PostgreSQL soldiered bravely on, doing the best it could under such
adverse conditions.  (That's pretty impressive, really.)  A lot of
messages like these:
[2012-02-05 01:24:36.816 CST] 14950 <cc cc 127.0.0.1(35321)> ERROR: 
could not extend file "base/16401/46507734": wrote only 4096 of 8192
bytes at block 0
[2012-02-05 01:24:36.816 CST] 14950 <cc cc 127.0.0.1(35321)> HINT: 
Check free disk space.

[2012-02-05 01:25:40.643 CST] 5323 LOG:  could not write temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device

[2012-02-05 01:26:38.971 CST] 15065 <cc cc 127.0.0.1(53258)> ERROR: 
could not write to hash-join temporary file: No space left on device

I don't know how much of a clue this is, but less than a minute
before our first message about pg_notify problems, the statistics
messages went from complaining about not being able to *write* the
file to also complaining about not being able to *close* the file --
like this:
[2012-02-05 01:26:41.159 CST] 5323 LOG:  could not close temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
[2012-02-05 01:26:44.663 CST] 5323 LOG:  could not write temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
[2012-02-05 01:26:45.705 CST] 16252 WARNING:  pgstat wait timeout
We got three errors in the log with exactly this DETAIL (file,
offset and message):
[2012-02-05 01:27:36.878 CST] 14892 <cc cc 127.0.0.1(35320)> ERROR: 
could not access status of transaction 0
[2012-02-05 01:27:36.878 CST] 14892 <cc cc 127.0.0.1(35320)> DETAIL:Could not write to file "pg_notify/03A5" at offset
188416:No space
 
left on device.
[2012-02-05 01:27:36.878 CST] 14892 <cc cc 127.0.0.1(35320)>
STATEMENT:  COMMIT
[2012-02-05 01:27:36.882 CST] 14892 <cc cc 127.0.0.1(35320)> LOG: 
disconnection: session time: 0:07:00.600 user=cc database=cc
host=127.0.0.1 port=35320
After that the message changed to a new offset and a message of
"Success." for the failure:
[2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)> ERROR: 
could not access status of transaction 0
[2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)> DETAIL:Could not read from file "pg_notify/03A5" at offset
253952:
Success.
[2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)>
STATEMENT:  COMMIT
[2012-02-05 01:30:36.956 CST] 16383 <cc cc 127.0.0.1(38931)> LOG: 
disconnection: session time: 0:01:59.537 user=cc database=cc
host=127.0.0.1 port=38931
There were 4591 messages over the course of five days with exactly
that filename and offset and a message of "Success.", all for
transaction 0.  About midway through that time disk space was freed
and all other symptoms cleared.  It really is pretty impressive that
PostgreSQL survived that and was able to resume normal operations in
all other respects once disk space was freed for it.  Only this one
feature stayed in a bad state.
-Kevin

Re: Notify system doesn't recover from "No space" error

From

Tom Lane

Date:

09 February 2012, 13:32:46

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> We got three errors in the log with exactly this DETAIL (file,
> offset and message):
> [2012-02-05 01:27:36.878 CST] 14892 <cc cc 127.0.0.1(35320)> ERROR: 
> could not access status of transaction 0
> [2012-02-05 01:27:36.878 CST] 14892 <cc cc 127.0.0.1(35320)> DETAIL:
>  Could not write to file "pg_notify/03A5" at offset 188416: No space
> left on device.
> After that the message changed to a new offset and a message of
> "Success." for the failure:
> [2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)> ERROR: 
> could not access status of transaction 0
> [2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)> DETAIL:
>  Could not read from file "pg_notify/03A5" at offset 253952:
> Success.

IIRC, the "could not access status of transaction" bits are artifacts
stemming from the use of the SLRU infrastructure for pg_notify access.
Sometime we ought to think a bit harder about how to specialize SLRU's
error messages for the cases where what it's manipulating aren't XIDs.

Also, the "Could not read ...: Success" bit presumably ought to be
spelled "Could not read ...: unexpected EOF".  I thought we'd fixed
that most places, but we must've missed a spot in SLRU.

However, both of those points are just cosmetic.  The substantive issue
is that you say it didn't resume normal behavior once space became
available again.  Can you provide more info about that?  In particular,
what behavior was being seen by the application?
        regards, tom lane

Re: Notify system doesn't recover from "No space" error

From

"Kevin Grittner"

Date:

09 February 2012, 14:13:56

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> [2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)>
>> ERROR:  could not access status of transaction 0
>> [2012-02-05 01:30:36.952 CST] 16383 <cc cc 127.0.0.1(38931)>
>> DETAIL:  Could not read from file "pg_notify/03A5" at offset
>> 253952: Success.
> The substantive issue is that you say it didn't resume normal
> behavior once space became available again.  Can you provide more
> info about that?  In particular, what behavior was being seen by
> the application?
The application LISTENs on channel tcn and a trigger function is
attached to most permanent tables to NOTIFY for DML on that channel.
(See the recently committed triggered_change_notification patch if
you want to see the trigger function.)  At this point it might be
hard to determine (at least without re-creating the problem) whether
it was only transactions which issued a NOTIFY, but the application
never really got very far because of COMMIT statements failing with
the above message.
The report to us was that testers were unable to start the
application.  I believe that the above error on COMMIT kept the
application from getting past initial tests that the connection was
good.
-Kevin

Re: Notify system doesn't recover from "No space" error

From

Tom Lane

Date:

09 February 2012, 14:23:01

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The application LISTENs on channel tcn and a trigger function is
> attached to most permanent tables to NOTIFY for DML on that channel.
> ...
> The report to us was that testers were unable to start the
> application.  I believe that the above error on COMMIT kept the
> application from getting past initial tests that the connection was
> good.

OK, so it was issuing a LISTEN and the LISTEN was failing at commit?
(It's a bit hard to believe that you'd be triggering one of the NOTIFYs
during "initial connection tests".)
        regards, tom lane

Re: Notify system doesn't recover from "No space" error

From

"Kevin Grittner"

Date:

09 February 2012, 14:37:45

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> The application LISTENs on channel tcn and a trigger function is
>> attached to most permanent tables to NOTIFY for DML on that
>> channel.
>> ...
>> The report to us was that testers were unable to start the
>> application.  I believe that the above error on COMMIT kept the
>> application from getting past initial tests that the connection
>> was good.
> 
> OK, so it was issuing a LISTEN and the LISTEN was failing at
> commit?  (It's a bit hard to believe that you'd be triggering one
> of the NOTIFYs during "initial connection tests".)
In continued digging through logs I found something to indicate the
transaction on which COMMIT failed for 2528 of the failures.  In all
cases the transaction made a change which would have fired a NOTIFY
at commit (in a deferred trigger).
So that initial assumption about the cause doesn't hold up.  I'm not
sure at this point why the tester perception was that they couldn't
get in.
-Kevin

Re: Notify system doesn't recover from "No space" error

From

"Kevin Grittner"

Date:

09 February 2012, 14:44:05

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> In continued digging through logs I found something to indicate
> the transaction on which COMMIT failed for 2528 of the failures. 
> In all cases the transaction made a change which would have fired
> a NOTIFY at commit (in a deferred trigger).
Arg.  Sorry, I got confused there.  No deferred trigger involved in
the triggered change notification; that is only in the replication
code.  I haven't seen anything to suggest that the replication
trigger is contributing to this, but I guess I couldn't rule it out,
either.
-Kevin

Re: Notify system doesn't recover from "No space" error

From

Tom Lane

Date:

28 June 2012, 22:36:25

Christoph Berg <christoph.berg@credativ.de> writes:
> Warming up an old thread here - we ran into the same problem.

Thanks for the example proving it is repeatable.

I dug into this a bit and found the problem.  When a page of pg_notify
is filled, asyncQueueAddEntries() advances the global QUEUE_HEAD to
point to the next page before it calls SimpleLruZeroPage() to make that
page exist in slru.c.  But SimpleLruZeroPage could fail --- even though
it doesn't attempt to write out the new page, it might need to dump
some existing dirty page from shared memory before it can allocate a
slot to the new page.  If it does fail, then QUEUE_HEAD is pointing
at a page number that doesn't exist either on disk or in slru.c buffers.
Subsequent attempts to execute asyncQueueAddEntries() will fail because
SimpleLruReadPage() says "page? what page?".  And there is noplace else
in async.c that would try to create the missing page.  This error will
persist until the server is restarted (thus resetting the state of
pg_notify), even if the underlying disk-full condition is cleared.

The solution is to make sure we don't advance QUEUE_HEAD into the new
page until SimpleLruZeroPage has created it.  There are a couple of ways
the code could be fixed, but what I chose to do in the attached proposed
patch is to make asyncQueueAddEntries work with a local queue_head
pointer, which is only copied back to shared memory on successful exit.

Comments, objections?

            regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index fcb087ed15dfbb4d567ee0c742e1db8ec1353a2d..6fcd9093268e7bc5aaf6102fa9d6adcdcced1d6d 100644
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
*************** static ListCell *
*** 1285,1290 ****
--- 1285,1291 ----
  asyncQueueAddEntries(ListCell *nextNotify)
  {
      AsyncQueueEntry qe;
+     QueuePosition queue_head;
      int            pageno;
      int            offset;
      int            slotno;
*************** asyncQueueAddEntries(ListCell *nextNotif
*** 1292,1299 ****
      /* We hold both AsyncQueueLock and AsyncCtlLock during this operation */
      LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);

      /* Fetch the current page */
!     pageno = QUEUE_POS_PAGE(QUEUE_HEAD);
      slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
      /* Note we mark the page dirty before writing in it */
      AsyncCtl->shared->page_dirty[slotno] = true;
--- 1293,1313 ----
      /* We hold both AsyncQueueLock and AsyncCtlLock during this operation */
      LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);

+     /*
+      * We work with a local copy of QUEUE_HEAD, which we write back to shared
+      * memory upon exiting.  The reason for this is that if we have to advance
+      * to a new page, SimpleLruZeroPage might fail (out of disk space, for
+      * instance), and we must not advance QUEUE_HEAD if it does.  (Otherwise,
+      * subsequent insertions would try to put entries into a page that slru.c
+      * thinks doesn't exist yet.)  So, use a local position variable.  Note
+      * that if we do fail, any already-inserted queue entries are forgotten;
+      * this is okay, since they'd be useless anyway after our transaction
+      * rolls back.
+      */
+     queue_head = QUEUE_HEAD;
+
      /* Fetch the current page */
!     pageno = QUEUE_POS_PAGE(queue_head);
      slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
      /* Note we mark the page dirty before writing in it */
      AsyncCtl->shared->page_dirty[slotno] = true;
*************** asyncQueueAddEntries(ListCell *nextNotif
*** 1305,1311 ****
          /* Construct a valid queue entry in local variable qe */
          asyncQueueNotificationToEntry(n, &qe);

!         offset = QUEUE_POS_OFFSET(QUEUE_HEAD);

          /* Check whether the entry really fits on the current page */
          if (offset + qe.length <= QUEUE_PAGESIZE)
--- 1319,1325 ----
          /* Construct a valid queue entry in local variable qe */
          asyncQueueNotificationToEntry(n, &qe);

!         offset = QUEUE_POS_OFFSET(queue_head);

          /* Check whether the entry really fits on the current page */
          if (offset + qe.length <= QUEUE_PAGESIZE)
*************** asyncQueueAddEntries(ListCell *nextNotif
*** 1331,1338 ****
                 &qe,
                 qe.length);

!         /* Advance QUEUE_HEAD appropriately, and note if page is full */
!         if (asyncQueueAdvance(&(QUEUE_HEAD), qe.length))
          {
              /*
               * Page is full, so we're done here, but first fill the next page
--- 1345,1352 ----
                 &qe,
                 qe.length);

!         /* Advance queue_head appropriately, and detect if page is full */
!         if (asyncQueueAdvance(&(queue_head), qe.length))
          {
              /*
               * Page is full, so we're done here, but first fill the next page
*************** asyncQueueAddEntries(ListCell *nextNotif
*** 1342,1353 ****
               * asyncQueueIsFull() ensured that there is room to create this
               * page without overrunning the queue.
               */
!             slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD));
              /* And exit the loop */
              break;
          }
      }

      LWLockRelease(AsyncCtlLock);

      return nextNotify;
--- 1356,1370 ----
               * asyncQueueIsFull() ensured that there is room to create this
               * page without overrunning the queue.
               */
!             slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(queue_head));
              /* And exit the loop */
              break;
          }
      }

+     /* Success, so update the global QUEUE_HEAD */
+     QUEUE_HEAD = queue_head;
+
      LWLockRelease(AsyncCtlLock);

      return nextNotify;

Re: Notify system doesn't recover from "No space" error

From

Christoph Berg

Date:

29 June 2012, 05:25:05

[Resending as the original post didn't get through to the list]

Warming up an old thread here - we ran into the same problem.

Database is 9.1.4/x86_64 from Debian/testing. The client application
is bucardo hammering the database with NOTIFYs (including some
master-master replication conflicts, that might add to the parallel
NOTIFY load).

The problem is reproducible with the attached instructions (several
ENOSPC cycles might be requried). When the filesystem is filled using
dd, the bucardo and psql processes will die with this error:

FEHLER:  53100: konnte auf den Status von Transaktion 0 nicht zugreifen
DETAIL:  Konnte nicht in Datei »pg_notify/0000« bei Position 180224 schreiben: Auf dem Gerät ist kein Speicherplatz
mehrverfügbar. 
ORT:  SlruReportIOError, slru.c:861

The line number might be different, sometimes its ENOENT, sometimes even
"Success".

Even after disk space is available again, subsequent "NOTIFY foobar"
calls will die, without any other clients connected:

ERROR:  XX000: could not access status of transaction 0
DETAIL:  Could not read from file "pg_notify/0000" at offset 245760: Success.
ORT:  SlruReportIOError, slru.c:854

Here's a backtrace, caught at slru.c:430:

430                SlruReportIOError(ctl, pageno, xid);
(gdb) bt
#0  SimpleLruReadPage (ctl=ctl@entry=0xb192a0, pageno=30, write_ok=write_ok@entry=1 '\001', xid=xid@entry=0)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/slru.c:430
#1  0x0000000000520d2f in asyncQueueAddEntries (nextNotify=nextNotify@entry=0x29b60c8)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:1318
#2  0x000000000052187f in PreCommit_Notify ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:869
#3  0x00000000004973d3 in CommitTransaction ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:1827
#4  0x0000000000497a8d in CommitTransactionCommand ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:2562
#5  0x0000000000649497 in finish_xact_command ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2452
#6  finish_xact_command ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2441
#7  0x000000000064c875 in exec_simple_query (query_string=0x2a99d70 "notify foobar;")
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:1037
#8  PostgresMain (argc=<optimized out>, argv=argv@entry=0x29b1df8, username=<optimized out>)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:3968
#9  0x000000000060e731 in BackendRun (port=0x2a14800)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3611
#10 BackendStartup (port=0x2a14800)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3296
#11 ServerLoop ()
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1460
#12 0x000000000060f451 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x29b1170)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1121
#13 0x0000000000464bc9 in main (argc=5, argv=0x29b1170)
    at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/main/main.c:199


Restarting the cluster seems to fix the condition in some cases, but
I've seen the error persist over restarts, or reappear after some time
even without disk full. (That's also what the customer on the live
system is seeing.)

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

Attachment

Re: Notify system doesn't recover from "No space" error

From

Christoph Berg

Date:

02 July 2012, 12:56:00

Re: Tom Lane 2012-06-29 <8308.1340933751@sss.pgh.pa.us>
> in async.c that would try to create the missing page.  This error will
> persist until the server is restarted (thus resetting the state of
> pg_notify), even if the underlying disk-full condition is cleared.

Thanks for digging into this and providing a patch. I've applied it to
9.1.4-2 here, and now the server recovers more gracefully from the
error: (sorry for the German locale, the error is ENOSPC)

2012-07-02 17:45:59 CEST FEHLER:  konnte auf den Status von Transaktion 0 nicht zugreifen
2012-07-02 17:45:59 CEST DETAIL:  Konnte nicht in Datei »pg_notify/0000« bei Position 32768 schreiben: Auf dem Gerät
istkein Speicherplatz mehr verfügbar. 
2012-07-02 17:45:59 CEST ANWEISUNG:  commit

A subsequent "notify foobar" command did succeed after clearing the
disk space.

What is still puzzling me is that the customer is repeatedly reporting
these issues, even after rebooting the system. I am not 100% sure at
which point the disk full situation was resolved, but at least it
sounds like that problem would reappear after restarts. I will need to
get more detailed reports on the order of events there to make sure.

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

Re: Notify system doesn't recover from "No space" error

From

Tom Lane

Date:

02 July 2012, 13:04:20

Christoph Berg <christoph.berg@credativ.de> writes:
> What is still puzzling me is that the customer is repeatedly reporting
> these issues, even after rebooting the system.

Hm.  A server restart really ought to clear any problem in this area,
since pg_notify gets reset to starting conditions.  Please check into
that more closely.  (It might be worth looking at $PGDATA/pg_notify/
after a restart just to verify there are no files there except 0000.)
        regards, tom lane

Re: Notify system doesn't recover from "No space" error

From

Christoph Berg

Date:

25 July 2012, 11:05:12

Re: Tom Lane 2012-07-02 <18088.1341245035@sss.pgh.pa.us>
> Christoph Berg <christoph.berg@credativ.de> writes:
> > What is still puzzling me is that the customer is repeatedly reporting
> > these issues, even after rebooting the system.
>
> Hm.  A server restart really ought to clear any problem in this area,
> since pg_notify gets reset to starting conditions.  Please check into
> that more closely.  (It might be worth looking at $PGDATA/pg_notify/
> after a restart just to verify there are no files there except 0000.)

Hi,

we deployed an updated 9.1.4 package with your patch there. After
restarting the server, pg_notify was empty except for 0000. fsck.ext3
reported no errors.

Now the error has happened again, 58P01, could not read status of
transaction 0, could not open pg_notify/0016, ENOENT, location
SlruReportIOError, slru.c:840. (Roughly translated from the German
message.)

We'll try to dig into the code here, maybe we can spot something.

Mit freundlichen Grüßen,
Christoph Berg
--
Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz