Thread: Out of space
I've been running a postgres for 2 or 3 years without a problem. This morning my disk space for the database filled up. I need to know what transaction/log files I can truncate or delete without compromising the system. These files are located under /var/lib/pgsql/data/ Many of them have dates of more than a year ago. I'm kind of rusty with this. Postgres works too well to keep me fluent with troubleshooting. Tom Bakken Information Resource Manager Texas USDA, Rural Development
Attachment
Tom Bakken wrote: > I've been running a postgres for 2 or 3 years without a problem. > This morning my disk space for the database filled up. I need to > know what transaction/log files I can truncate or delete without > compromising the system. These files are located under > /var/lib/pgsql/data/ The answer is normally "none" unless you have experienced crashes or other problems that might have left stale files lying around. But you say have had no problem ... If you have any suspicion in that direction, please show us the exact files you're thinking about. A note about which PG version you are running would also help.
"Tom Bakken" <tom.bakken@tx.usda.gov> writes: > I've been running a postgres for 2 or 3 years without a problem. This > morning my disk space for the database filled up. I need to know what > transaction/log files I can truncate or delete without compromising the > system. These files are located under /var/lib/pgsql/data/ I wouldn't recommend deleting *any* files manually --- unless you find core files or old files underneath a pgsql_tmp subdirectory. Those you could zap at little risk. The best approach is to free up a small amount of space elsewhere, enough so you can get through a CHECKPOINT without failing. The checkpoint will hopefully free up some space in pg_xlog. After that you can look at dropping tables you don't need any more, VACUUM FULL, etc. regards, tom lane
I'm running version 7.1.2. I was able to drop several tables. That cleared up some disk space, but for some reason now, the database won't restart. How can you determine where the problem is when you're running /etc/rc.d/init.d/postgresql restart? Any ideas on that would be appreciated. I've got database dumps so I can always start over. Here's a listing of /var/lib/pgsql/data: .: total 1494316 -rw------- 1 postgres postgres 4 Jun 21 2001 PG_VERSION drwx------ 6 postgres postgres 4096 Sep 17 2003 base drwx------ 2 postgres postgres 4096 Oct 27 13:40 global -rw-r--r-- 1 root root 7640 Jun 29 2001 h -rw------- 1 postgres postgres 9070 Mar 2 10:56 pg_hba.conf -rw------- 1 postgres postgres 1118 Jun 21 2001 pg_ident.conf -rw------- 1 postgres postgres 1528627890 Apr 7 12:32 pg_log drwx------ 2 postgres postgres 4096 Apr 7 12:32 pg_xlog -rw------- 1 postgres postgres 3137 Jun 21 2001 postgresql.conf -rw------- 1 postgres postgres 52 Apr 7 12:32 postmaster.opts As far as log files to delete, here are some I thought might be safe to delete under ./base: total 20 drwx------ 2 postgres postgres 4096 Jul 13 2001 1 drwx------ 2 postgres postgres 8192 Apr 7 12:11 185174 drwx------ 2 postgres postgres 4096 Jun 21 2001 18719 drwx------ 2 postgres postgres 4096 Apr 7 12:11 213304 ./base/1: total 1556 -rw------- 1 postgres postgres 0 Jun 21 2001 1215 -rw------- 1 postgres postgres 0 Jun 21 2001 1216 -rw------- 1 postgres postgres 8192 Jun 21 2001 1219 -rw------- 1 postgres postgres 16384 Jul 16 2001 1247 -rw------- 1 postgres postgres 73728 Jul 13 2001 1249 -rw------- 1 postgres postgres 229376 Jun 21 2001 1255 -rw------- 1 postgres postgres 16384 Jul 16 2001 1259 -rw------- 1 postgres postgres 0 Jun 21 2001 16567 -rw------- 1 postgres postgres 8192 Jun 21 2001 16579 -rw------- 1 postgres postgres 16384 Jun 21 2001 16600 -rw------- 1 postgres postgres 73728 Jun 21 2001 16617 -rw------- 1 postgres postgres 8192 Jun 21 2001 16642 -rw------- 1 postgres postgres 8192 Jun 21 2001 16653 -rw------- 1 postgres postgres 16384 Jun 21 2001 16685 -rw------- 1 postgres postgres 8192 Jun 21 2001 16867 -rw------- 1 postgres postgres 8192 Jun 21 2001 16934 -rw------- 1 postgres postgres 0 Jun 21 2001 16948 -rw------- 1 postgres postgres 8192 Jun 21 2001 16960 -rw------- 1 postgres postgres 0 Jun 21 2001 17033 -rw------- 1 postgres postgres 0 Jun 21 2001 17045 -rw------- 1 postgres postgres 8192 Jun 21 2001 17058 . . . . I've got a couple of directories that I suspect have stale files. One of them: ./base/185174 contains what appears to be current information. Thanks Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Peter Eisentraut Sent: Wednesday, April 07, 2004 12:20 PM To: Tom Bakken; pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space Tom Bakken wrote: > I've been running a postgres for 2 or 3 years without a problem. > This morning my disk space for the database filled up. I need to > know what transaction/log files I can truncate or delete without > compromising the system. These files are located under > /var/lib/pgsql/data/ The answer is normally "none" unless you have experienced crashes or other problems that might have left stale files lying around. But you say have had no problem ... If you have any suspicion in that direction, please show us the exact files you're thinking about. A note about which PG version you are running would also help. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Attachment
Tom, I'm not finding any mention of CHECKPOINT in my references. Is that something from a version newer than 7.1.2? Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 12:22 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > I've been running a postgres for 2 or 3 years without a problem. This > morning my disk space for the database filled up. I need to know what > transaction/log files I can truncate or delete without compromising the > system. These files are located under /var/lib/pgsql/data/ I wouldn't recommend deleting *any* files manually --- unless you find core files or old files underneath a pgsql_tmp subdirectory. Those you could zap at little risk. The best approach is to free up a small amount of space elsewhere, enough so you can get through a CHECKPOINT without failing. The checkpoint will hopefully free up some space in pg_xlog. After that you can look at dropping tables you don't need any more, VACUUM FULL, etc. regards, tom lane
Attachment
I looked in the pg_log file and it's missing xlogtemp.1091.: [root@linux04 data]# tail pg_log DEBUG: Redo record at (1, 516646732); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 28728439; NextOid: 9098648 FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1033) failed: No such file or directory /usr/bin/postmaster: Startup proc 1033 exited with status 512 - abort DEBUG: database system was shut down at 2004-04-07 12:14:38 CDT DEBUG: CheckPoint record at (1, 516646732) DEBUG: Redo record at (1, 516646732); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 28728439; NextOid: 9098648 FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1091) failed: No such file or directory /usr/bin/postmaster: Startup proc 1091 exited with status 512 - abort I'm sure I didn't delete it. Regardless, hopefully based on this one of you might have a suggestion. [root@linux04 data]# Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 12:22 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > I've been running a postgres for 2 or 3 years without a problem. This > morning my disk space for the database filled up. I need to know what > transaction/log files I can truncate or delete without compromising the > system. These files are located under /var/lib/pgsql/data/ I wouldn't recommend deleting *any* files manually --- unless you find core files or old files underneath a pgsql_tmp subdirectory. Those you could zap at little risk. The best approach is to free up a small amount of space elsewhere, enough so you can get through a CHECKPOINT without failing. The checkpoint will hopefully free up some space in pg_xlog. After that you can look at dropping tables you don't need any more, VACUUM FULL, etc. regards, tom lane
Attachment
"Tom Bakken" <tom.bakken@tx.usda.gov> writes: > I'm not finding any mention of CHECKPOINT in my references. Is that > something from a version newer than 7.1.2? You're running 7.1.2? My, that *is* an old installation. You really ought to think about an update, particularly if you might be approaching the 4-billion-transaction event horizon. You do not want to suffer XID wraparound in a 7.1 installation :-(. See this link for explanations: http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAROUND 7.1 does have the CHECKPOINT command, though, whether you see it documented or not. regards, tom lane
"Tom Bakken" <tom.bakken@tx.usda.gov> writes: > FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1091) failed: No > such file or directory > /usr/bin/postmaster: Startup proc 1091 exited with status 512 - abort > I'm sure I didn't delete it. This is just trying to make a new, empty xlog file. I don't quite understand why the errno is "No such file or directory" --- you wouldn't think that write() could return that errno. But the most likely bet is that you don't yet have enough free space on the disk. These files are 16MB each, and it could be that more than one needs to be made. How much stuff is there in /var/lib/pgsql/data/pg_xlog anyway? I think that 7.1.2 predates some changes we made to keep down the number of xlog files that would be kept around. regards, tom lane
Of course, I was planning to upgrade but as with most things, too little, too late... At this point, I just want to keep it running until I can move to my planned new platform. Can you tell me where to start with CHECKPOINT? If it's any help, my problem appears to be a missing file. This is from my pg_log: DEBUG: database system was shut down at 2004-04-07 12:14:38 CDT DEBUG: CheckPoint record at (1, 516646732) DEBUG: Redo record at (1, 516646732); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 28728439; NextOid: 9098648 FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1091) failed: No such file or directory /usr/bin/postmaster: Startup proc 1091 exited with status 512 - abort Thanks Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 2:46 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > I'm not finding any mention of CHECKPOINT in my references. Is that > something from a version newer than 7.1.2? You're running 7.1.2? My, that *is* an old installation. You really ought to think about an update, particularly if you might be approaching the 4-billion-transaction event horizon. You do not want to suffer XID wraparound in a 7.1 installation :-(. See this link for explanations: http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAR OUND 7.1 does have the CHECKPOINT command, though, whether you see it documented or not. regards, tom lane
Attachment
Tom, Here's the situation: [root@linux04 init.d]# cd /var/lib/pgsql/data/ [root@linux04 data]# ls -l total 1494316 -rw------- 1 postgres postgres 4 Jun 21 2001 PG_VERSION drwx------ 6 postgres postgres 4096 Sep 17 2003 base drwx------ 2 postgres postgres 4096 Oct 27 13:40 global -rw-r--r-- 1 root root 7640 Jun 29 2001 h -rw------- 1 postgres postgres 9070 Mar 2 10:56 pg_hba.conf -rw------- 1 postgres postgres 1118 Jun 21 2001 pg_ident.conf -rw------- 1 postgres postgres 1528630320 Apr 7 14:26 pg_log drwx------ 2 postgres postgres 4096 Apr 7 14:26 pg_xlog -rw------- 1 postgres postgres 3137 Jun 21 2001 postgresql.conf -rw------- 1 postgres postgres 52 Apr 7 14:26 postmaster.opts [root@linux04 data]# ls -l pg_xlog/ total 16404 -rw------- 1 postgres postgres 16777216 Apr 7 12:14 000000010000001E I do have a limited amount of space in the partition but I'd like to get rid of more. Just not sure what to delete if anything. Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 2:57 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1091) failed: No > such file or directory > /usr/bin/postmaster: Startup proc 1091 exited with status 512 - abort > I'm sure I didn't delete it. This is just trying to make a new, empty xlog file. I don't quite understand why the errno is "No such file or directory" --- you wouldn't think that write() could return that errno. But the most likely bet is that you don't yet have enough free space on the disk. These files are 16MB each, and it could be that more than one needs to be made. How much stuff is there in /var/lib/pgsql/data/pg_xlog anyway? I think that 7.1.2 predates some changes we made to keep down the number of xlog files that would be kept around. regards, tom lane
Attachment
"Tom Bakken" <tom.bakken@tx.usda.gov> writes: > Here's the situation: > [root@linux04 init.d]# cd /var/lib/pgsql/data/ > [root@linux04 data]# ls -l > total 1494316 > -rw------- 1 postgres postgres 4 Jun 21 2001 PG_VERSION > drwx------ 6 postgres postgres 4096 Sep 17 2003 base > drwx------ 2 postgres postgres 4096 Oct 27 13:40 global > -rw-r--r-- 1 root root 7640 Jun 29 2001 h > -rw------- 1 postgres postgres 9070 Mar 2 10:56 pg_hba.conf > -rw------- 1 postgres postgres 1118 Jun 21 2001 pg_ident.conf > -rw------- 1 postgres postgres 1528630320 Apr 7 14:26 pg_log > drwx------ 2 postgres postgres 4096 Apr 7 14:26 pg_xlog > -rw------- 1 postgres postgres 3137 Jun 21 2001 postgresql.conf > -rw------- 1 postgres postgres 52 Apr 7 14:26 postmaster.opts > [root@linux04 data]# ls -l pg_xlog/ > total 16404 > -rw------- 1 postgres postgres 16777216 Apr 7 12:14 000000010000001E > I do have a limited amount of space in the partition but I'd like to get = > rid of more. Just not sure what to delete if anything. Hm, what is that pg_log file? It's not part of the normal Postgres fileset. Is it perhaps just the postmaster's stderr output? If so, you're in luck: truncate that as you see fit, and you'll have some breathing room. regards, tom lane
Tom, Doh!! I created that log file and let it get out of hand. OK, it's truncated and now I've got plenty of space, but it's still complaining that it can't find the xlogtemp.1405: DEBUG: database system was shut down at 2004-04-07 12:14:38 CDT DEBUG: CheckPoint record at (1, 516646732) DEBUG: Redo record at (1, 516646732); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 28728439; NextOid: 9098648 FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1405) failed: No such file or directory /usr/bin/postmaster: Startup proc 1405 exited with status 512 - abort Again, I know I didn't delete it, but regardless, I'm unsure where to go from here. Thanks for all your help. I hope we're close to a fix. Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 3:57 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > Here's the situation: > [root@linux04 init.d]# cd /var/lib/pgsql/data/ > [root@linux04 data]# ls -l > total 1494316 > -rw------- 1 postgres postgres 4 Jun 21 2001 PG_VERSION > drwx------ 6 postgres postgres 4096 Sep 17 2003 base > drwx------ 2 postgres postgres 4096 Oct 27 13:40 global > -rw-r--r-- 1 root root 7640 Jun 29 2001 h > -rw------- 1 postgres postgres 9070 Mar 2 10:56 pg_hba.conf > -rw------- 1 postgres postgres 1118 Jun 21 2001 pg_ident.conf > -rw------- 1 postgres postgres 1528630320 Apr 7 14:26 pg_log > drwx------ 2 postgres postgres 4096 Apr 7 14:26 pg_xlog > -rw------- 1 postgres postgres 3137 Jun 21 2001 postgresql.conf > -rw------- 1 postgres postgres 52 Apr 7 14:26 postmaster.opts > [root@linux04 data]# ls -l pg_xlog/ > total 16404 > -rw------- 1 postgres postgres 16777216 Apr 7 12:14 000000010000001E > I do have a limited amount of space in the partition but I'd like to get = > rid of more. Just not sure what to delete if anything. Hm, what is that pg_log file? It's not part of the normal Postgres fileset. Is it perhaps just the postmaster's stderr output? If so, you're in luck: truncate that as you see fit, and you'll have some breathing room. regards, tom lane
Attachment
"Tom Bakken" <tom.bakken@tx.usda.gov> writes: > OK, it's truncated and now I've got plenty of space, but it's still > complaining that it can't find the xlogtemp.1405: > FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1405) failed: No > such file or directory I think the "no such file" errno is probably actively misleading. I took another look at the CVS logs and realized that in 7.1.2, there is no guarantee that that message actually reflects the cause of the write failure --- if write() indicates it couldn't write all the bytes, but does not set errno, then the reported errno will be left over from the last failed operation. We had patched this by 7.1.3, which is the version I was looking at locally. Since ENOENT can't be returned by write() AFAIK, it seems certain that this is indeed a leftover errno setting. In short, I still think you are running into some kind of out-of-disk-space failure. I'm not sure what, but you might look to whether you've exceeded the postgres user's disk space quota, or anything along that line. Keep in mind also that an unprivileged user account normally can't fill the disk as full as root can. regards, tom lane
Tom, I'm not sure how to check about the postgres user disk space limit issue. I tried storing a rather large file on the postgres partition as the postgres user and had no problem. I'm more suspicious of the file stored in: [root@linux04 data]# ls -l /var/lib/pgsql/data/pg_xlog/ total 16404 -rw------- 1 postgres postgres 16777216 Apr 7 12:14 000000010000001E Is it typical? It doesn't look it. Anyway, I've got plenty of space but am unsure of the next step. Tom Bakken Information Resource Manager Texas USDA, Rural Development -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, April 07, 2004 5:25 PM To: Tom Bakken Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Out of space "Tom Bakken" <tom.bakken@tx.usda.gov> writes: > OK, it's truncated and now I've got plenty of space, but it's still > complaining that it can't find the xlogtemp.1405: > FATAL 2: ZeroFill(/var/lib/pgsql/data/pg_xlog/xlogtemp.1405) failed: No > such file or directory I think the "no such file" errno is probably actively misleading. I took another look at the CVS logs and realized that in 7.1.2, there is no guarantee that that message actually reflects the cause of the write failure --- if write() indicates it couldn't write all the bytes, but does not set errno, then the reported errno will be left over from the last failed operation. We had patched this by 7.1.3, which is the version I was looking at locally. Since ENOENT can't be returned by write() AFAIK, it seems certain that this is indeed a leftover errno setting. In short, I still think you are running into some kind of out-of-disk-space failure. I'm not sure what, but you might look to whether you've exceeded the postgres user's disk space quota, or anything along that line. Keep in mind also that an unprivileged user account normally can't fill the disk as full as root can. regards, tom lane
Attachment
Why not just get a bigger disk? Warmest regards, Ericson Smith Tracking Specialist/DBA +-----------------------+---------------------------------+ | http://www.did-it.com | "When you have to shoot, shoot, | | eric@did-it.com | don't talk! - Tuco | | 516-255-0500 | | +-----------------------+---------------------------------+ Tom Bakken wrote: >I've been running a postgres for 2 or 3 years without a problem. This >morning my disk space for the database filled up. I need to know what >transaction/log files I can truncate or delete without compromising the >system. These files are located under /var/lib/pgsql/data/ > >Many of them have dates of more than a year ago. I'm kind of rusty with >this. Postgres works too well to keep me fluent with troubleshooting. > >Tom Bakken >Information Resource Manager >Texas USDA, Rural Development > > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
Attachment
Just a brief summary of my problem and it's resolution: The postgres log file was gobbling up all my disk space and I wasn't paying attention. Postgres shut down and wouldn't restart. I truncated the log file and now had plenty of space but Postgres still wouldn't start. I noticed it wasn't posting to the log file. When I truncated the log file I inadvertently changed it's ownership to root. I corrected the situation, but it was still not writing to the log file. I changed the permissions but again, it still wasn't starting up. I had turned on more extensive debugging. When I removed the flag, I was surprised to see Postgres start normally. I must have had it set wrong. Boy do I feel dumb. I think the quote is, "Man proposes, but God disposes." It's been a humbling experience to air my ignorance. I really appreciate Postgres and this forum. Tom, you've helped me (and others) more than once. Many thanks. Tom Bakken Information Resource Manager Texas USDA, Rural Development