Thread: full data disk -- any chance of recovery
An enthusiastic person in out content department went and did a silly thing ... Well, he went and fired off an update that consumed all of the remaining disk space on two runtime servers. We've fallen back to a hot spare and I am faced with trying to retrieve these machines by Tuesday morning when we expectsome increase in traffic. Postgres version is 7.4; the only thing in the /data directory is postgres data and related files: $ du 3632 ./gex_runtime/base/1 4468 ./gex_runtime/base/17141 0 ./gex_runtime/base/138602992/pgsql_tmp 32682348 ./gex_runtime/base/138602992 32690448 ./gex_runtime/base 340 ./gex_runtime/global 492120 ./gex_runtime/pg_xlog 7660 ./gex_runtime/pg_clog 33190592 ./gex_runtime 0 ./bkup 33190592 . The log is saying: HINT: In a moment you should be able to reconnect to the database and repeat your command. 2006-01-01 23:20:19 WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. 2006-01-01 23:20:19 LOG: could not close temporary statistics file "/data/postgres/gex_runtime/global/pgstat.tmp.1413":No space left on device Availables space is: $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 32850580 3137552 28044280 11% / /dev/sdb1 35001508 33223500 16 100% /data Any suggestions ? Falling back to the last known state is fine, but just in case I am making a backup of the remaining databaseto build a replacement. And yes, I did forsee this and did warn management repeatedly and yet somehow the advice falls on deaf ears. Go figure. Iguess maybe because it isn't management that a hole kicked in a 3 day weekend. Greg Williamson DBA (for now at least) GlobeXplorer LLC
Greg, I'm not sure what you're looking for in the way of suggestions. Do you just want to be able to start this postgres server up and remove some data? Easiest way I see to accomplish that given the information you provided is to move pg_xlog to the sda disk and symlink it to the data dir. In general terms, it would go like this: Stop postmaster cd /data/gex_runtime mv pg_xlog / ln -s /pg_xlog Start postmaster The commands may vary depending on OS. That would also give you better performance if sda and sdb are actually separate physical disks. However, that's only going to give you about 500MB of free space, so I see bigger disks in your future. A vacuum full might recover a bit of space as well if you've got any bloat. The question I have is this: Is your database read-only? Otherwise, bringing these machines back up probably isn't too useful as they are now out of sync with the new primary (your old hot spare). Good luck! -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:28 PM To: pgsql-admin@postgresql.org Subject: [ADMIN] full data disk -- any chance of recovery Availables space is: $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 32850580 3137552 28044280 11% / /dev/sdb1 35001508 33223500 16 100% /data Any suggestions ? Falling back to the last known state is fine, but just in case I am making a backup of the remaining database to build a replacement. And yes, I did forsee this and did warn management repeatedly and yet somehow the advice falls on deaf ears. Go figure. I guess maybe because it isn't management that a hole kicked in a 3 day weekend.
Jeff -- Thanks for the suggestion -- I think this fills the bill except that the postmaster won't quit because it has no space (atleast that is how I interpet it). These are all linux boxes with the same architecture (2 CPUs, 2 gigs of RAM, disks notadequate for a database: QED). I had an urgent priority in November to upgrade these beasts, but the best laid plans o' mice and men, etc. etc. These servers are mostly read-only for spatial data so falling back to the last known state (e.g. before the current transaction)would work perfectly. But I'm still making a copy o' one of the two hot spares (one of which is now in play), juts in case. Have a good {day|afternoon|evening|night) ! Greg -----Original Message----- From: jeff@glacier.frostconsultingllc.com on behalf of Jeff Frost Sent: Sun 1/1/2006 11:49 PM To: Gregory S. Williamson; pgsql-admin@postgresql.org Cc: Subject: RE: [ADMIN] full data disk -- any chance of recovery Greg, I'm not sure what you're looking for in the way of suggestions. Do you just want to be able to start this postgres server up and remove some data? Easiest way I see to accomplish that given the information you provided is to move pg_xlog to the sda disk and symlink it to the data dir. In general terms, it would go like this: Stop postmaster cd /data/gex_runtime mv pg_xlog / ln -s /pg_xlog Start postmaster The commands may vary depending on OS. That would also give you better performance if sda and sdb are actually separate physical disks. However, that's only going to give you about 500MB of free space, so I see bigger disks in your future. A vacuum full might recover a bit of space as well if you've got any bloat. The question I have is this: Is your database read-only? Otherwise, bringing these machines back up probably isn't too useful as they are now out of sync with the new primary (your old hot spare). Good luck! -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:28 PM To: pgsql-admin@postgresql.org Subject: [ADMIN] full data disk -- any chance of recovery Availables space is: $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 32850580 3137552 28044280 11% / /dev/sdb1 35001508 33223500 16 100% /data Any suggestions ? Falling back to the last known state is fine, but just in case I am making a backup of the remaining database to build a replacement. And yes, I did forsee this and did warn management repeatedly and yet somehow the advice falls on deaf ears. Go figure. I guess maybe because it isn't management that a hole kicked in a 3 day weekend. !DSPAM:43b8db0031385555610062!
Greg, Does pg_ctl stop -m immediate stop the postmaster for you? ---- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:58 PM To: Jeff Frost; pgsql-admin@postgresql.org Subject: Re: [ADMIN] full data disk -- any chance of recovery Jeff -- Thanks for the suggestion -- I think this fills the bill except that the postmaster won't quit because it has no space (at least that is how I interpet it). These are all linux boxes with the same architecture (2 CPUs, 2 gigs of RAM, disks not adequate for a database: QED). I had an urgent priority in November to upgrade these beasts, but the best laid plans o' mice and men, etc. etc. These servers are mostly read-only for spatial data so falling back to the last known state (e.g. before the current transaction) would work perfectly. But I'm still making a copy o' one of the two hot spares (one of which is now in play), juts in case. Have a good {day|afternoon|evening|night) ! Greg -----Original Message----- From: jeff@glacier.frostconsultingllc.com on behalf of Jeff Frost Sent: Sun 1/1/2006 11:49 PM To: Gregory S. Williamson; pgsql-admin@postgresql.org Cc: Subject: RE: [ADMIN] full data disk -- any chance of recovery Greg, I'm not sure what you're looking for in the way of suggestions. Do you just want to be able to start this postgres server up and remove some data? Easiest way I see to accomplish that given the information you provided is to move pg_xlog to the sda disk and symlink it to the data dir. In general terms, it would go like this: Stop postmaster cd /data/gex_runtime mv pg_xlog / ln -s /pg_xlog Start postmaster The commands may vary depending on OS. That would also give you better performance if sda and sdb are actually separate physical disks. However, that's only going to give you about 500MB of free space, so I see bigger disks in your future. A vacuum full might recover a bit of space as well if you've got any bloat. The question I have is this: Is your database read-only? Otherwise, bringing these machines back up probably isn't too useful as they are now out of sync with the new primary (your old hot spare). Good luck! -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:28 PM To: pgsql-admin@postgresql.org Subject: [ADMIN] full data disk -- any chance of recovery Availables space is: $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 32850580 3137552 28044280 11% / /dev/sdb1 35001508 33223500 16 100% /data Any suggestions ? Falling back to the last known state is fine, but just in case I am making a backup of the remaining database to build a replacement. And yes, I did forsee this and did warn management repeatedly and yet somehow the advice falls on deaf ears. Go figure. I guess maybe because it isn't management that a hole kicked in a 3 day weekend. !DSPAM:43b8db0031385555610062! ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
You wrote: > > Greg, > > Does pg_ctl stop -m immediate stop the postmaster for you? I tried su - postgres -c '/apps/pgsql-7.4/bin/pg_ctl stop -D /data/postgres/gex_runtime -m immediate' on one of the two hozed servers and that's (I think) what got this: 2006-01-01 23:20:19 WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. And that's been like that for a while While the other server (unstopped) shows only: 2006-01-02 00:30:01 LOG: could not close temporary statistics file "/data/postgres/gex_runtime/global/pgstat.tmp.1453":No space left on device 2006-01-02 00:33:54 ERROR: could not access status of transaction 0 DETAIL: could not write to file "/data/postgres/gex_runtime/pg_clog/0AFA" at offset 196608: No space left on device G ---- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:58 PM To: Jeff Frost; pgsql-admin@postgresql.org Subject: Re: [ADMIN] full data disk -- any chance of recovery Jeff -- Thanks for the suggestion -- I think this fills the bill except that the postmaster won't quit because it has no space (at least that is how I interpet it). These are all linux boxes with the same architecture (2 CPUs, 2 gigs of RAM, disks not adequate for a database: QED). I had an urgent priority in November to upgrade these beasts, but the best laid plans o' mice and men, etc. etc. These servers are mostly read-only for spatial data so falling back to the last known state (e.g. before the current transaction) would work perfectly. But I'm still making a copy o' one of the two hot spares (one of which is now in play), juts in case. Have a good {day|afternoon|evening|night) ! Greg -----Original Message----- From: jeff@glacier.frostconsultingllc.com on behalf of Jeff Frost Sent: Sun 1/1/2006 11:49 PM To: Gregory S. Williamson; pgsql-admin@postgresql.org Cc: Subject: RE: [ADMIN] full data disk -- any chance of recovery Greg, I'm not sure what you're looking for in the way of suggestions. Do you just want to be able to start this postgres server up and remove some data? Easiest way I see to accomplish that given the information you provided is to move pg_xlog to the sda disk and symlink it to the data dir. In general terms, it would go like this: Stop postmaster cd /data/gex_runtime mv pg_xlog / ln -s /pg_xlog Start postmaster The commands may vary depending on OS. That would also give you better performance if sda and sdb are actually separate physical disks. However, that's only going to give you about 500MB of free space, so I see bigger disks in your future. A vacuum full might recover a bit of space as well if you've got any bloat. The question I have is this: Is your database read-only? Otherwise, bringing these machines back up probably isn't too useful as they are now out of sync with the new primary (your old hot spare). Good luck! -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Gregory S. Williamson Sent: Sunday, January 01, 2006 11:28 PM To: pgsql-admin@postgresql.org Subject: [ADMIN] full data disk -- any chance of recovery Availables space is: $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 32850580 3137552 28044280 11% / /dev/sdb1 35001508 33223500 16 100% /data Any suggestions ? Falling back to the last known state is fine, but just in case I am making a backup of the remaining database to build a replacement. And yes, I did forsee this and did warn management repeatedly and yet somehow the advice falls on deaf ears. Go figure. I guess maybe because it isn't management that a hole kicked in a 3 day weekend. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend !DSPAM:43b8e0cf33531348188260!
Seems like you're going to have to kill -9. On Mon, 2 Jan 2006, Gregory S. Williamson wrote: > I tried > su - postgres -c '/apps/pgsql-7.4/bin/pg_ctl stop -D /data/postgres/gex_runtime -m immediate' > > on one of the two hozed servers and that's (I think) what got this:
"Gregory S. Williamson" <gsw@globexplorer.com> writes: > 2006-01-02 00:30:01 LOG: could not close temporary statistics file "/data/postgres/gex_runtime/global/pgstat.tmp.1453":No space left on device > 2006-01-02 00:33:54 ERROR: could not access status of transaction 0 > DETAIL: could not write to file "/data/postgres/gex_runtime/pg_clog/0AFA" at offset 196608: No space left on device Just kill -9 all the postgres processes; everything you need should be safely down in the WAL files. You might not have to move pg_xlog --- the first thing to do is see if there are any large temp files hanging about in the pgsql_tmp subdirectories. Anything you see in there can be shot on sight once the postmaster is stopped (actually, recent versions of the postmaster will do it for you on restart, but don't remember about 7.4). Which PG release is this exactly (7.4.what)? This misbehavior reminds me of a bug that we fixed in 7.4.2. regards, tom lane
Just curious, I guess the problem is not simply the disk full now, but supposing the disk full is the only problem, what would happen if we move some old files temporarily from pg_xlog/* to somewhere else and free up some disk space? (On mine, I guess I can get about 75 MB, leaving the most recent ones: say, dated today.) >> I tried >> su - postgres -c '/apps/pgsql-7.4/bin/pg_ctl stop -D /data/postgres/gex_runtime -m immediate' >> >> on one of the two hozed servers and that's (I think) what got this: > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Regards, Ben Kim Developer http://benix.tamu.edu
Ben Kim wrote: >Just curious, I guess the problem is not simply the disk full now, but >supposing the disk full is the only problem, what would happen if we move >some old files temporarily from pg_xlog/* to somewhere else and free up >some disk space? (On mine, I guess I can get about 75 MB, leaving the most >recent ones: say, dated today.) > > Uhmmm don't do that :). You need to find something else. The pg_xlog is your transaction logs. > > >>>I tried >>>su - postgres -c '/apps/pgsql-7.4/bin/pg_ctl stop -D /data/postgres/gex_runtime -m immediate' >>> >>>on one of the two hozed servers and that's (I think) what got this: >>> >>> >>---------------------------(end of broadcast)--------------------------- >>TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >> >> >> > >Regards, > >Ben Kim >Developer >http://benix.tamu.edu > > > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: PLphp, PLperl - http://www.commandprompt.com/
I'll check into the temp files and the like in a bit -- the output from version() says: PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk) (1 row) so I am not sure if this 7.4.2 -- I have some documentation though that says it is 7.4.2 so I think this beast may be ofthat flavor. ' Thanks, Greg -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Mon 1/2/2006 9:45 AM To: Gregory S. Williamson Cc: Jeff Frost; pgsql-admin@postgresql.org Subject: Re: [ADMIN] full data disk -- any chance of recovery "Gregory S. Williamson" <gsw@globexplorer.com> writes: > 2006-01-02 00:30:01 LOG: could not close temporary statistics file "/data/postgres/gex_runtime/global/pgstat.tmp.1453":No space left on device > 2006-01-02 00:33:54 ERROR: could not access status of transaction 0 > DETAIL: could not write to file "/data/postgres/gex_runtime/pg_clog/0AFA" at offset 196608: No space left on device Just kill -9 all the postgres processes; everything you need should be safely down in the WAL files. You might not have to move pg_xlog --- the first thing to do is see if there are any large temp files hanging about in the pgsql_tmp subdirectories. Anything you see in there can be shot on sight once the postmaster is stopped (actually, recent versions of the postmaster will do it for you on restart, but don't remember about 7.4). Which PG release is this exactly (7.4.what)? This misbehavior reminds me of a bug that we fixed in 7.4.2. regards, tom lane !DSPAM:43b9668499131348188260!
"Gregory S. Williamson" <gsw@globexplorer.com> writes: > I'll check into the temp files and the like in a bit -- the output from version() says: > PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk) > (1 row) > so I am not sure if this 7.4.2 -- If it were 7.4.2 it would say so. You are in desperate need of an update, as there are half a dozen known data-loss issues that are corrected in the 7.4.x update series. The one that I now think bit you is just one of them. regards, tom lane
Ah well, figures. If only ops had listened to me, we'd be on 8.1 right now. Thanks anyway, as always, for the sage advice. G -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Mon 1/2/2006 2:11 PM To: Gregory S. Williamson Cc: Jeff Frost; pgsql-admin@postgresql.org Subject: Re: [ADMIN] full data disk -- any chance of recovery "Gregory S. Williamson" <gsw@globexplorer.com> writes: > I'll check into the temp files and the like in a bit -- the output from version() says: > PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk) > (1 row) > so I am not sure if this 7.4.2 -- If it were 7.4.2 it would say so. You are in desperate need of an update, as there are half a dozen known data-loss issues that are corrected in the 7.4.x update series. The one that I now think bit you is just one of them. regards, tom lane !DSPAM:43b9a4cb122511222944467!
Jeff Frost pravi: > Seems like you're going to have to kill -9. Yeah, this is bad :( Seems like kill -9 is needed when disk is full. Tested on *BSD jails. Tomaž
Tomaz Borstnar <tomaz.borstnar@over.net> writes: > Jeff Frost pravi: >> Seems like you're going to have to kill -9. > Yeah, this is bad :( Seems like kill -9 is needed when disk is full. Tested on *BSD jails. With what PG version? And what behavior did you see exactly? regards, tom lane
On Mon, Jan 02, 2006 at 12:45:29PM -0500, Tom Lane wrote: > "Gregory S. Williamson" <gsw@globexplorer.com> writes: > > 2006-01-02 00:30:01 LOG: could not close temporary statistics file "/data/postgres/gex_runtime/global/pgstat.tmp.1453":No space left on device > > 2006-01-02 00:33:54 ERROR: could not access status of transaction 0 > > DETAIL: could not write to file "/data/postgres/gex_runtime/pg_clog/0AFA" at offset 196608: No space left on device > > Just kill -9 all the postgres processes; everything you need should be > safely down in the WAL files. Another alternative: most unix filesistems actually set it up so that there is still some free space left even if it's reporting 100%. On FreeBSD, you can change the amount of reserved space with tunefs -m, but you should read the caveats in man tunefs. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
FWIW, I can at least report the resolution of the original problem. I went sleuthing and found some core files in the ./base/13860299 directory. Deleteing those freed up some gigabytes of space(each core was 1-2 gigs). The server that I had tried to stop with "-m immediate" command did in fact then go offline; it came up with a few complaints;I ran a vacuum on all of the databases in that instance and our content manager was able to do his update (theupdate was such that reapplying it to any given row didn't hurt anything; these were massive updates changing copyrightrelated info and the like). So far the database has passed all sanity checks and is back online. The server that I left alone was responsive, i.e. psql could connect and do queries, but there were a few tables it refusedto have anything to do with, complaining about missing xlog files. I brought it down with with "-m fast" mode, restartedit and it also seems now to be fine. (Knock on simulated woodgrain) Lessons learned: a) upgrade to current revisions whenever possible -- old software is a hand grenade waiting to go off. b) look for core files and delete them if you don't need them -- I was not expecting to find them in a data directory sothis was a bit of s surprise. c) don't run out of disk space (duh) Thanks to all who helped me. I might be able to get a server to test on with a different release of postgres if that wouldbe useful, although we are strictly a linux shop and Dell x86 servers are what I mostly can get my hands on (running2.4.21-0.13mdkenterprise). Greg W. -----Original Message----- From: pgsql-admin-owner@postgresql.org on behalf of Tom Lane Sent: Tue 1/3/2006 10:38 AM To: Tomaz Borstnar Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] full data disk -- any chance of recovery Tomaz Borstnar <tomaz.borstnar@over.net> writes: > Jeff Frost pravi: >> Seems like you're going to have to kill -9. > Yeah, this is bad :( Seems like kill -9 is needed when disk is full. Tested on *BSD jails. With what PG version? And what behavior did you see exactly? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org !DSPAM:43bac4af260671270013900!
On Tue, 3 Jan 2006, Jim C. Nasby wrote: > Another alternative: most unix filesistems actually set it up so that > there is still some free space left even if it's reporting 100%. On > FreeBSD, you can change the amount of reserved space with tunefs -m, but > you should read the caveats in man tunefs. Jim, excellent thought! And on Linux at least you can change it with the filesystem still mounted: tune2fs -m 0 /dev/sdb1 would probably do the trick. You might want to set it back after you're done though. :-) Default appears to be 5 on my machine. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Tue, Jan 03, 2006 at 05:17:45PM -0800, Gregory S. Williamson wrote: > FWIW, > > I can at least report the resolution of the original problem. > > I went sleuthing and found some core files in the ./base/13860299 directory. Deleteing those freed up some gigabytes ofspace (each core was 1-2 gigs). Might want to turn off dumping of core files; I believe man ulimit is the place to look. > a) upgrade to current revisions whenever possible -- old software is a hand grenade waiting to go off. Well, at least in the case of PostgreSQL, it's generally not critical to upgrade major (x.y) versions quickly. But you often do want to upgrade minor (x.y.z) versions, as they often contain bug fixes. But 7.4.x is getting pretty old. > c) don't run out of disk space (duh) There have actually been fixes to make it less of an issue when you do run out of disk space. See item a. :) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Tue, Jan 03, 2006 at 05:17:45PM -0800, Gregory S. Williamson wrote: >> I went sleuthing and found some core files in the ./base/13860299 directory. Deleteing those freed up some gigabytes ofspace (each core was 1-2 gigs). > Might want to turn off dumping of core files; I believe man ulimit is > the place to look. Actually, as a developer I would've first wanted to look into the core files and try to see why they showed up in the first place. A gdb stack trace would often tell something useful (... if not to you, then to someone on the -hackers list ...). Cleaning up after a problem is fine, but don't destroy the evidence until you've learned as much as you can towards preventing the problem from happening again. I spend a remarkably large fraction of my time advising people to enable core-dumping on platforms that disable it by default, so you'll certainly not ever see me advising anyone to turn it off on a platform where it is default ;-) Having said all that, +1 to the point about staying up-to-date in whichever PG release series you are using. We do not spend time on making dot-releases because we have nothing to do on a Saturday afternoon ... an update is put out because it fixes one or more pretty serious bugs. Sure, there is some risk of a regression in a dot-release, but it's small. As best I recall at the moment, we've had only one or two regressions in dot-releases in the eight or so years I've been around the project. regards, tom lane
Tom Lane conjured forth the following characters: > > > Might want to turn off dumping of core files; I believe man ulimit is > > the place to look. > > Actually, as a developer I would've first wanted to look into the core > files and try to see why they showed up in the first place. A gdb stack > trace would often tell something useful (... if not to you, then to > someone on the -hackers list ...). Cleaning up after a problem is fine, > but don't destroy the evidence until you've learned as much as you can > towards preventing the problem from happening again. We'll be a month or so to switching to 8.1, so I am sure that we'll have another core file which can kept. To the truth Ihaven't pursued this much because (a) it's an old revision and if time is to be spent swatting bugs it is better spent on current software, and besides itmay a result of something already fixed; (b) we're almost certain that this is a result of catastrophic failures in postGIS/GEOS under load. We typically see a fewconnections go crazy and eat up all the RAM and CPU time; sometimes we've had to reboot to get things calm again. Ourtesting of 8.0 w/ postGIS 1.0 led us to conclude that we will see far less of this, in that when we replayed a day's trafficto the databases we saw no errors versus dozens from the same traffic on 7.4. When I do find another core I'll people know if you care; the chances are quite good that we find the opportunity. Alas, moving large systems in a company sometimes requires the subtle skills of a cat herder combined with the social tactof an offensive linebacker. Thanks again, G
Tom Lane pravi: > Tomaz Borstnar <tomaz.borstnar@over.net> writes: >> Jeff Frost pravi: >>> Seems like you're going to have to kill -9. > >> Yeah, this is bad :( Seems like kill -9 is needed when disk is full. Tested on *BSD jails. > > With what PG version? 8.0.x for sure. > And what behavior did you see exactly? Postgresql was running inside jail. All was fine until partition filled up and at this point kill -9 was the only option to stop postgresql in jail. It said about stopping by administrative command, but it did not exit - kill -9 was the only solution without rebooting. Tomaž