Re: Notify system doesn't recover from "No space" error - Mailing list pgsql-hackers
From | Christoph Berg |
---|---|
Subject | Re: Notify system doesn't recover from "No space" error |
Date | |
Msg-id | 20120629082430.GA905@msgid.df7cb.de Whole thread Raw |
In response to | Notify system doesn't recover from "No space" error ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
List | pgsql-hackers |
[Resending as the original post didn't get through to the list] Warming up an old thread here - we ran into the same problem. Database is 9.1.4/x86_64 from Debian/testing. The client application is bucardo hammering the database with NOTIFYs (including some master-master replication conflicts, that might add to the parallel NOTIFY load). The problem is reproducible with the attached instructions (several ENOSPC cycles might be requried). When the filesystem is filled using dd, the bucardo and psql processes will die with this error: FEHLER: 53100: konnte auf den Status von Transaktion 0 nicht zugreifen DETAIL: Konnte nicht in Datei »pg_notify/0000« bei Position 180224 schreiben: Auf dem Gerät ist kein Speicherplatz mehrverfügbar. ORT: SlruReportIOError, slru.c:861 The line number might be different, sometimes its ENOENT, sometimes even "Success". Even after disk space is available again, subsequent "NOTIFY foobar" calls will die, without any other clients connected: ERROR: XX000: could not access status of transaction 0 DETAIL: Could not read from file "pg_notify/0000" at offset 245760: Success. ORT: SlruReportIOError, slru.c:854 Here's a backtrace, caught at slru.c:430: 430 SlruReportIOError(ctl, pageno, xid); (gdb) bt #0 SimpleLruReadPage (ctl=ctl@entry=0xb192a0, pageno=30, write_ok=write_ok@entry=1 '\001', xid=xid@entry=0) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/slru.c:430 #1 0x0000000000520d2f in asyncQueueAddEntries (nextNotify=nextNotify@entry=0x29b60c8) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:1318 #2 0x000000000052187f in PreCommit_Notify () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:869 #3 0x00000000004973d3 in CommitTransaction () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:1827 #4 0x0000000000497a8d in CommitTransactionCommand () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:2562 #5 0x0000000000649497 in finish_xact_command () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2452 #6 finish_xact_command () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2441 #7 0x000000000064c875 in exec_simple_query (query_string=0x2a99d70 "notify foobar;") at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:1037 #8 PostgresMain (argc=<optimized out>, argv=argv@entry=0x29b1df8, username=<optimized out>) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:3968 #9 0x000000000060e731 in BackendRun (port=0x2a14800) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3611 #10 BackendStartup (port=0x2a14800) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3296 #11 ServerLoop () at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1460 #12 0x000000000060f451 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x29b1170) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1121 #13 0x0000000000464bc9 in main (argc=5, argv=0x29b1170) at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/main/main.c:199 Restarting the cluster seems to fix the condition in some cases, but I've seen the error persist over restarts, or reappear after some time even without disk full. (That's also what the customer on the live system is seeing.) Christoph -- cb@df7cb.de | http://www.df7cb.de/
Attachment
pgsql-hackers by date: