Re: Failing start-up archive recovery at Standby mode in PG9.2.4 - Mailing list pgsql-hackers

From Mitsumasa KONDO
Subject Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date
Msg-id CADupcHWjBsozhZFZctpvxQryA=ikKL84m2th+B0wgomS3GpMBQ@mail.gmail.com
Whole thread Raw
In response to Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
List pgsql-hackers
I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during archive recovery. But I cannot recognize why illegal WAL file was created in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(),  first check WAL file does not get fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
>     if (sources & XLOG_FROM_ARCHIVE)
>     {
>       fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
>        if (fd != -1)
>        {
>           elog(DEBUG1, "got WAL segment from archive");
>           return fd;
>        }
>     }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And return illegal WAL File’s fd.

XLogFileReadAnyTLI()
>      if (sources & XLOG_FROM_PG_XLOG)
>      {
>         fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
>         if (fd != -1)
>            return fd;
>      }

Returned fd is be readFile value. Of cource readFile value is over 0. So out of for-loop.

XLogPageRead
>              readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
>                                      sources);
>               switched_segment = true;
>               if (readFile >= 0)
>                  break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
>   if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
>  {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not seek in log file %u, segment %u to offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>   {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not read from log file %u, segment %u, offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
>      goto next_record_is_invalid;


I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO 

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Recovery target 'immediate'
Next
From: Robert Haas
Date:
Subject: Re: libpq COPY handling