BUG #10432: failed to re-find parent key in index - Mailing list pgsql-bugs

From m.sakrejda@gmail.com
Subject BUG #10432: failed to re-find parent key in index
Date
Msg-id 20140523175217.2692.50568@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #10432: failed to re-find parent key in index  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      10432
Logged by:          Maciek Sakrejda
Email address:      m.sakrejda@gmail.com
PostgreSQL version: 9.3.4
Operating system:   Ubuntu 12.04 LTS 64-bit
Description:

Hi,

I just got the error `failed to re-find parent key in index "1665279" for
split pages 175193/193740`. From the list archives [1], it looks like
there's no easy recovery, but this happened on a replica right after the
recovery trigger file was touched and recovery completed, so that's not a
huge concern (unless it also happens to the next replica). Is there
something to be learned from the failed cluster? The only odd thing I see in
the logs is that it looks like we tried to take a base backup before
recovery actually completed, though the attempt appears to have failed
cleanly. Here are the logs from around the time of the failure, including
WAL-E logs:

May 23 16:21:28 postgres[10]: [1244-1]  LOG:  trigger file found:
/etc/postgresql/wal-e.d/pull-env/STANDBY_OFF
May 23 16:21:28 postgres[7545]: [5-1]  FATAL:  terminating walreceiver
process due to administrator command
May 23 16:21:28 wal_e.operator.backup INFO     MSG: begin wal restore#012
    STRUCTURED: time=2014-05-23T16:21:28.169498-00 pid=7757 action=wal-fetch
key=s3://.../wal_005/000000020000033900000066.lzo prefix=.../
seg=000000020000033900000066 state=begin
May 23 16:21:28 wal_e.blobstore.s3.s3_util WARNING  MSG: could no longer
locate object while performing wal restore#012        DETAIL: The absolute
URI that could not be located is
s3://.../wal_005/000000020000033900000066.lzo.#012        HINT: This can be
normal when Postgres is trying to detect what timelines are available during
restoration.#012        STRUCTURED: time=2014-05-23T16:21:28.364020-00
pid=7757
May 23 16:21:28 wal_e.operator.backup INFO     MSG: complete wal restore#012
       STRUCTURED: time=2014-05-23T16:21:28.365570-00 pid=7757
action=wal-fetch key=s3://.../wal_005/000000020000033900000066.lzo
prefix=.../ seg=000000020000033900000066 state=complete
May 23 16:21:28 postgres[10]: [1245-1]  LOG:  redo done at 339/650000C8
May 23 16:21:28 postgres[10]: [1246-1]  LOG:  last completed transaction was
at log time 2014-05-23 14:43:31.251167+00
May 23 16:21:28 wal_e.operator.backup INFO     MSG: begin wal restore#012
    STRUCTURED: time=2014-05-23T16:21:28.535075-00 pid=7763 action=wal-fetch
key=s3://.../wal_005/000000020000033900000065.lzo prefix=.../
seg=000000020000033900000065 state=begin
May 23 16:21:29 wal_e.blobstore.s3.s3_util INFO     MSG: completed download
and decompression#012        DETAIL: Downloaded and decompressed
"s3://.../wal_005/000000020000033900000065.lzo" to
"pg_xlog/RECOVERYXLOG"#012        STRUCTURED:
time=2014-05-23T16:21:29.094826-00 pid=7763
May 23 16:21:29 wal_e.operator.backup INFO     MSG: complete wal restore#012
       STRUCTURED: time=2014-05-23T16:21:29.096962-00 pid=7763
action=wal-fetch key=s3://.../wal_005/000000020000033900000065.lzo
prefix=.../ seg=000000020000033900000065 state=complete
May 23 16:21:29 postgres[10]: [1247-1]  LOG:  restored log file
"000000020000033900000065" from archive
May 23 16:21:29 wal_e.operator.backup INFO     MSG: begin wal restore#012
    STRUCTURED: time=2014-05-23T16:21:29.269909-00 pid=7769 action=wal-fetch
key=s3://.../wal_005/00000003.history.lzo prefix=.../ seg=00000003.history
state=begin
May 23 16:21:29 wal_e.blobstore.s3.s3_util WARNING  MSG: could no longer
locate object while performing wal restore#012        DETAIL: The absolute
URI that could not be located is s3://.../wal_005/00000003.history.lzo.#012
      HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.#012        STRUCTURED:
time=2014-05-23T16:21:29.424936-00 pid=7769
May 23 16:21:29 wal_e.operator.backup INFO     MSG: complete wal restore#012
       STRUCTURED: time=2014-05-23T16:21:29.426295-00 pid=7769
action=wal-fetch key=s3://.../wal_005/00000003.history.lzo prefix=.../
seg=00000003.history state=complete
May 23 16:21:29 postgres[10]: [1248-1]  LOG:  selected new timeline ID: 3
May 23 16:21:29 wal_e.operator.backup INFO     MSG: begin wal restore#012
    STRUCTURED: time=2014-05-23T16:21:29.593188-00 pid=7775 action=wal-fetch
key=s3://.../wal_005/00000002.history.lzo prefix=.../ seg=00000002.history
state=begin
May 23 16:21:29 postgres[7779]: [4-1] [unknown] LOG:  connection received:
host=[local]
May 23 16:21:29 postgres[7779]: [5-1] postgres LOG:  connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7781]: [4-1] [unknown] LOG:  connection received:
host=[local]
May 23 16:21:29 postgres[7781]: [5-1] postgres LOG:  connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7781]: [6-1] postgres ERROR:  recovery is in
progress
May 23 16:21:29 postgres[7781]: [6-2] postgres HINT:  WAL control functions
cannot be executed during recovery.
May 23 16:21:29 postgres[7781]: [6-3] postgres STATEMENT:  COPY (SELECT
file_name,   lpad(file_offset::text, 8, '0') AS file_offset FROM
pg_xlogfile_name_offset(
pg_start_backup('freeze_start_2014-05-23T16:21:29.681458+00:00'))) TO STDOUT
WITH CSV HEADER;
May 23 16:21:29 wal_e.operator.backup WARNING  MSG: blocking on sending WAL
segments#012        DETAIL: The backup was not completed successfully, but
we have to wait anyway.  See README: TODO about pg_cancel_backup#012
STRUCTURED: time=2014-05-23T16:21:29.764625-00 pid=18790
May 23 16:21:29 postgres[7782]: [4-1] [unknown] LOG:  connection received:
host=[local]
May 23 16:21:29 postgres[7782]: [5-1] postgres LOG:  connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7782]: [6-1] postgres ERROR:  recovery is in
progress
May 23 16:21:29 postgres[7782]: [6-2] postgres HINT:  WAL control functions
cannot be executed during recovery.
May 23 16:21:29 postgres[7782]: [6-3] postgres STATEMENT:  COPY (SELECT
file_name,   lpad(file_offset::text, 8, '0') AS file_offset FROM
pg_xlogfile_name_offset(  pg_stop_backup())) TO STDOUT WITH CSV HEADER;
May 23 16:21:29 wal_e.main   ERROR    MSG: Could not stop hot backup#012
   STRUCTURED: time=2014-05-23T16:21:29.844455-00 pid=18790
May 23 16:21:29 wal_e.blobstore.s3.s3_util INFO     MSG: completed download
and decompression#012        DETAIL: Downloaded and decompressed
"s3://.../wal_005/00000002.history.lzo" to "pg_xlog/RECOVERYHISTORY"#012
   STRUCTURED: time=2014-05-23T16:21:29.866686-00 pid=7775
May 23 16:21:29 wal_e.operator.backup INFO     MSG: complete wal restore#012
       STRUCTURED: time=2014-05-23T16:21:29.868954-00 pid=7775
action=wal-fetch key=s3://.../wal_005/00000002.history.lzo prefix=.../
seg=00000002.history state=complete
May 23 16:21:29 postgres[10]: [1249-1]  LOG:  restored log file
"00000002.history" from archive
May 23 16:21:30 postgres[10]: [1250-1]  LOG:  archive recovery complete
May 23 16:21:30 postgres[10]: [1251-1]  FATAL:  failed to re-find parent key
in index "1665279" for split pages 175193/193740
May 23 16:21:31 postgres[8]: [4-1]  LOG:  startup process (PID 10) exited
with exit code 1
May 23 16:21:31 postgres[8]: [5-1]  LOG:  terminating any other active
server processes


[1]: http://www.postgresql.org/message-id/7753.1291075332@sss.pgh.pa.us

pgsql-bugs by date:

Previous
From: rodriguez
Date:
Subject: Re: BUG #10429: the same update return different result
Next
From: Claudio Nieder
Date:
Subject: Re: BUG #10429: the same update return different result