Thread: Durability?
Hi, I got an error like this: ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440 CONTEXT: writing block 529 of relation 1663/233690/1247 WARNING: could not write block 529 of 1663/233690/1247 DETAIL: Multiple failures --- write error might be permanent. The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing. I am not sure to understand clearly the consequences of such error since Postgres continues to accept new transactions. Ifmy WAL is corrupted, are my transactions still durable? If this is a violation of durability, is there a way to force Postgres to terminate on such error? Thanks in advance for the clarification. Emmanuel
Emmanuel Cecchet <manu@frogthinker.org> writes: > I got an error like this: > ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440 > CONTEXT: writing block 529 of relation 1663/233690/1247 > WARNING: could not write block 529 of 1663/233690/1247 > DETAIL: Multiple failures --- write error might be permanent. > The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing. It looks like you've got a corrupted page in shared buffers, and every time the system tries to flush it to disk for a checkpoint, it fails. What I'd try for getting out this is to kill -9 some backend in order to force a database restart. Of course, if you want to investigate what caused it, you should dig around in shared memory first and try to get a copy of that buffer's contents. regards, tom lane
Tom Lane wrote: > Emmanuel Cecchet <manu@frogthinker.org> writes: > >> I got an error like this: >> > > >> ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440 >> CONTEXT: writing block 529 of relation 1663/233690/1247 >> WARNING: could not write block 529 of 1663/233690/1247 >> DETAIL: Multiple failures --- write error might be permanent. >> > > >> The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing. >> > > It looks like you've got a corrupted page in shared buffers, and every > time the system tries to flush it to disk for a checkpoint, it fails. > > What I'd try for getting out this is to kill -9 some backend in order > to force a database restart. Of course, if you want to investigate > what caused it, you should dig around in shared memory first and try > to get a copy of that buffer's contents. > Will the database be able to restart with a corrupted WAL? If the database restarts, what transactions will be missing: - just the block that couldn't be flushed? - all transactions that were committed after the faulty block? - more? Thanks Emmanuel
Emmanuel Cecchet <manu@frogthinker.org> writes: > Tom Lane wrote: >> It looks like you've got a corrupted page in shared buffers, and every >> time the system tries to flush it to disk for a checkpoint, it fails. > Will the database be able to restart with a corrupted WAL? I don't think you have a corrupted WAL. regards, tom lane