Thread: BUG #8347: PANIC: heap_insert_redo: failed to add tuple when applying WAL
BUG #8347: PANIC: heap_insert_redo: failed to add tuple when applying WAL
From
maciek@heroku.com
Date:
The following bug has been logged on the website: Bug reference: 8347 Logged by: Maciek Sakrejda Email address: maciek@heroku.com PostgreSQL version: 9.2.4 Operating system: Ubuntu 12.04.2 LTS 64-bit Description: Running into a recovery failure on a customer's replica: Jul 31 00:11:55: LOG: restored log file "00000001000000E200000067" from archive Jul 31 00:11:55: WARNING: will not overwrite a used ItemId Jul 31 00:11:55: CONTEXT: xlog redo insert: rel 1663/16385/16619; tid 25260/37 Jul 31 00:11:55: PANIC: heap_insert_redo: failed to add tuple Jul 31 00:11:55: CONTEXT: xlog redo insert: rel 1663/16385/16619; tid 25260/37 I see a similar bug filed [1], but no replies. This happens repeatedly when attempting to apply this segment. [1]: http://www.postgresql.org/message-id/CANbzriT3h1kf2EaKEBcDqwu4AYwUjCuKcrDkjdxJ0CTjNeGnFQ@mail.gmail.com
Re: BUG #8347: PANIC: heap_insert_redo: failed to add tuple when applying WAL
From
Andres Freund
Date:
On 2013-07-31 01:27:39 +0000, maciek@heroku.com wrote: > The following bug has been logged on the website: > > Bug reference: 8347 > Logged by: Maciek Sakrejda > Email address: maciek@heroku.com > PostgreSQL version: 9.2.4 > Operating system: Ubuntu 12.04.2 LTS 64-bit > Description: > > Running into a recovery failure on a customer's replica: > > > Jul 31 00:11:55: LOG: restored log file "00000001000000E200000067" from > archive > Jul 31 00:11:55: WARNING: will not overwrite a used ItemId > Jul 31 00:11:55: CONTEXT: xlog redo insert: rel 1663/16385/16619; tid > 25260/37 > Jul 31 00:11:55: PANIC: heap_insert_redo: failed to add tuple > Jul 31 00:11:55: CONTEXT: xlog redo insert: rel 1663/16385/16619; tid > 25260/37 > > > I see a similar bug filed [1], but no replies. This happens repeatedly when > attempting to apply this segment. Any chance you could https://github.com/snaga/xlogdump that and the neighbouring segments? That might tell us whether we're dealing with broken locking or possibly disk corruption (doesn't sound too likely). Just to be sure, you're not running with full_page_writes = off or something? Could you possibly run a patched postgres against that, to get more info? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #8347: PANIC: heap_insert_redo: failed to add tuple when applying WAL
From
Maciek Sakrejda
Date:
On Tue, Jul 30, 2013 at 9:28 PM, Andres Freund <andres@2ndquadrant.com>wrote: > Any chance you could https://github.com/snaga/xlogdump that and the > neighbouring segments? That might tell us whether we're dealing with > broken locking or possibly disk corruption (doesn't sound too likely). > Actually, we did find what looks like some pretty crazy disk corruption after I reported this (heap tuple data in pg_clog files). I'm surprised Postgres did not wig out more, actually. I can run xlogdump later this week if it's still of interest, but I'm pretty satisfied that this was not Postgres' fault. Incidentally, the system performed admirably in the course of the recovery, considering the severely compromised state of heap and clog data. I'm really glad we're using Postgres.
Isn't it a funny coincidence, that we also had a corruption of that same/similar type? my disk was quite confidently not tampered. I am wondering: Does PG sign, or checksum wal_files? Is the integrity of wal_files ensured by any mechanism? Because if it IS, then - in our case - it's a corruption caused BY the postgres master server. I can replay the wal's and re-create the same error over and over. lg,k On Thu, Aug 1, 2013 at 11:13 PM, Maciek Sakrejda <maciek@heroku.com> wrote: > On Tue, Jul 30, 2013 at 9:28 PM, Andres Freund <andres@2ndquadrant.com>wrote: > >> Any chance you could https://github.com/snaga/xlogdump that and the >> neighbouring segments? That might tell us whether we're dealing with >> broken locking or possibly disk corruption (doesn't sound too likely). >> > > Actually, we did find what looks like some pretty crazy disk corruption > after I reported this (heap tuple data in pg_clog files). I'm surprised > Postgres did not wig out more, actually. I can run xlogdump later this week > if it's still of interest, but I'm pretty satisfied that this was not > Postgres' fault. > > Incidentally, the system performed admirably in the course of the > recovery, considering the severely compromised state of heap and clog data. > I'm really glad we're using Postgres. >
Re: BUG #8347: PANIC: heap_insert_redo: failed to add tuple when applying WAL
From
Daniel Farina
Date:
On Fri, Aug 2, 2013 at 12:51 AM, Klaus Ita <klaus@worstofall.com> wrote: > Isn't it a funny coincidence, that we also had a corruption of that > same/similar type? > > my disk was quite confidently not tampered. I am wondering: Does PG sign, or > checksum wal_files? Is the integrity of wal_files ensured by any mechanism? > Because if it IS, then - in our case - it's a corruption caused BY the > postgres master server. I can replay the wal's and re-create the same error > over and over. Corruption can hitch a ride on a WAL full page image without much difficulty, as long as the page header looks legit (from what I've seen so far, a bad page header will prevent the system from doing much with it, so no FPIs will be generated).