Thread: Detecting database corruption
My company is developing a PostgreSQL 7.4 application. We don't want our customers to have to manage the database. If the database ever becomes corrupt, we have procedures for restoring the database from backup data. The question we've been wondering about is how to detect a corrupt database. False positives are acceptable (database isn't really corrupt but we think it is); false negatives are not, (database is corrupt but we can't tell). - Are any cases of corruption detected in the normal operation of PostgreSQL? - If there are, then how are applications notified when corruption is detected? - What symptoms should prompt us to suspect and check for corruption? - Are there any tools we can run to determine whether a database is corrupt? Jack Orenstein
On Tue, Jan 13, 2004 at 06:06:03PM -0500, Jack Orenstein wrote: > - Are any cases of corruption detected in the normal operation of > PostgreSQL? If this means, "Does the database usually check for corruption?" the answer is, "Not as a matter of course." If this means, "Will I know if the database is corrupt?" the answer is, "Probably." When database corruption occurs, it's serious enough that things stop working pretty fast. Note that this is an infrequent enough occurrence that it's not part of the standard toolkit of DBAs; each case tends to be unusual. It seems usually to be caused by faulty hardware. > - If there are, then how are applications notified when corruption > is detected? They get errors. > - What symptoms should prompt us to suspect and check for corruption? Refusal to start. Random crashes of the server. Strange errors during tuple access, VACUUM operations, or pg_dump. > - Are there any tools we can run to determine whether a database is > corrupt? This depends on the case. There is a utility called pgfsck floating around; it's dangerous, and for use where things are really totally hosed up so badly that you can't recover. The real question is, what have you been using that makes database corruption such a grave concern? If I had to worry that much about Postgres database corruption, I'd use something else. A -- Andrew Sullivan | ajs@crankycanuck.ca Those who know history get to watch other people repeat it. --Bruce Sterling
Andrew Sullivan wrote: > On Tue, Jan 13, 2004 at 06:06:03PM -0500, Jack Orenstein wrote: > >>- Are any cases of corruption detected in the normal operation of >>PostgreSQL? > > > If this means, "Does the database usually check for corruption?" the > answer is, "Not as a matter of course." Do you mean that this happens in a few select situations? Or that there are configuration flags that can be used to enable such checks? > ... > >>- Are there any tools we can run to determine whether a database is >>corrupt? > > > This depends on the case. There is a utility called pgfsck floating > around; it's dangerous, and for use where things are really totally > hosed up so badly that you can't recover. > > The real question is, what have you been using that makes database > corruption such a grave concern? If I had to worry that much about > Postgres database corruption, I'd use something else. The database is part of our product and we'd like it to be as unobtrusive as possible. Our product has an admin console, but our expectation is that it will be ignored most of the time. In general, we'd rather kick off a rebuild of the database more often that might really be necessary, if this meant that we could minimize the involvement of an admin. Database corruption is a concern for two reasons. First, if it ever does occur, we have to be able to deal with the situation gracefully, even if that means nothing beyond a clean shutdown of the application. Second, we are struggling with the IDE vs. fsync issue, that has come up on this mailing list. We definitely have to support IDE drives, and we're trying to determine how to balance performance against other concerns. If we do end up leaving IDE caching enabled, then my understanding is that corruption is a real possibility, (or have I drawn the wrong conclusion on this point?) Jack Orenstein
> Do you mean that this happens in a few select situations? Or that > there are configuration flags that can be used to enable such checks? > It is a few select situations and frankly I haven't had database corruption because of PostgreSQL since the 7.1 days. I have had however database corruption due to bad memory, and bad Linux kernels/filesystems. > >>- Are there any tools we can run to determine whether a database is > >>corrupt? Typically PostgreSQL will tell you via an error message pointing to a relation id. Also if you perform regular vacuums if vacuum fails it will typically tell you where. > > > The real question is, what have you been using that makes database > > corruption such a grave concern? If I had to worry that much about > > Postgres database corruption, I'd use something else. > > even if that means nothing beyond a clean shutdown of the > application. Second, we are struggling with the IDE vs. fsync issue, > that has come up on this mailing list. We definitely have to support > IDE drives, and we're trying to determine how to balance performance > against other concerns. If we do end up leaving IDE caching enabled, > then my understanding is that corruption is a real possibility, (or > have I drawn the wrong conclusion on this point?) > I think (at least personally) that you are placing a little too much emphasis on this problem. We have successfully done power plug tests over and over and over with IDE drives and not had the issue come about. Of course this entirely depends on many things, but that is what a UPS is for. Sincerely, Joshua D. Drake > Jack Orenstein > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
On Mon, Jan 19, 2004 at 02:45:27PM -0500, Jack Orenstein wrote: > > If this means, "Does the database usually check for corruption?" the > > answer is, "Not as a matter of course." > > Do you mean that this happens in a few select situations? Or that > there are configuration flags that can be used to enable such checks? There have been occasional reports of such corruption, but it seems always to come down to bad hardware. There are no flags to check for this as a part of regular operations, although you'd certainly get an error if you tried to retrieve bad data. > Database corruption is a concern for two reasons. First, if it ever > does occur, we have to be able to deal with the situation gracefully, > even if that means nothing beyond a clean shutdown of the > application. In the cases where people experience it, what usually shows up is some sort of inability to access data that is supposed to be in a place on the disk, but turns out not to be. You get error messages about missing tuples, mangled data, or a core dump. I think in such cases you probably would indeed want to shut down your application. > Second, we are struggling with the IDE vs. fsync issue, > that has come up on this mailing list. We definitely have to support > IDE drives, and we're trying to determine how to balance performance > against other concerns. If we do end up leaving IDE caching enabled, > then my understanding is that corruption is a real possibility, (or > have I drawn the wrong conclusion on this point?) This is a different problem. My best advice is, "get a UPS with a brain." A UPS which will keep your system up for 10 minutes and which will shut it down as soon as the battery kicks in is pretty cheap. That and some regular testing and maintenance of it is likely to prevent most problematic cases you might run into here. Most fsync worries actually have to do with losing data rather than data corruption: fsync is called when a transaction commits, and if the hardware is lying about whether the bits are actually on the disk, you might lose some things you think are committed. You can apparently tolerate some data loss anyway, so in this case it's not too big a deal. A -- Andrew Sullivan | ajs@crankycanuck.ca Music is no business of mine. --Marge Simpson