Home > mailing lists

Re: Why standby restores some WALs many times from archive? - Mailing list pgsql-hackers

From	Sergey Burladyan
Subject	Re: Why standby restores some WALs many times from archive?
Date	January 10, 2018 22:07:14
Msg-id	87shbd5y8t.fsf@gmail.com Whole thread Raw
In response to	Why standby restores some WALs many times from archive? (Victor Yagofarov <xnasx@yandex.ru>)
List	pgsql-hackers

Tree view

I think I found what happened here.

One WAL record can be split between WAL files.

In XLogReadRecord, if last WAL record is incomplete, it try to get next WAL:
        /* Copy the first fragment of the record from the first page. */
        memcpy(state->readRecordBuf,
               state->readBuf + RecPtr % XLOG_BLCKSZ, len);
        buffer = state->readRecordBuf + len;
        gotlen = len;

        do
        {
            /* Calculate pointer to beginning of next page */
            targetPagePtr += XLOG_BLCKSZ;

            /* Wait for the next page to become available */
            readOff = ReadPageInternal(state, targetPagePtr,
                                 Min(total_len - gotlen + SizeOfXLogShortPHD,
                                     XLOG_BLCKSZ));

            if (readOff < 0)
                goto err;

but in my case next WAL not yet in archive (archive_timeout=0 in master)
and it clean cache:
err:

    /*
     * Invalidate the xlog page we've cached. We might read from a different
     * source after failure.
     */
    state->readSegNo = 0;
    state->readOff = 0;
    state->readLen = 0;

PG switch to streaming and last WAL (00000001000000000000002B for
example) still not replayed completely. We do not use streaming and it
switch back to archive:
LOG:  restored log file "00000001000000000000002B" from archive
...
DEBUG:  could not restore file "00000001000000000000002C" from archive: child process exited with exit code 1
DEBUG:  switched WAL source from archive to stream after failure
DEBUG:  switched WAL source from stream to archive after failure

Now it must reread first part of last WAL record from
00000001000000000000002B, but XLogFileReadAnyTLI is _always_ read
from the archive first, even if this file is already in pg_xlog:
        if (source == XLOG_FROM_ANY || source == XLOG_FROM_ARCHIVE)
        {
            fd = XLogFileRead(segno, emode, tli,
                              XLOG_FROM_ARCHIVE, true);
            if (fd != -1)
            {
                elog(DEBUG1, "got WAL segment from archive");
                if (!expectedTLEs)
                    expectedTLEs = tles;
                return fd;
            }
        }

        if (source == XLOG_FROM_ANY || source == XLOG_FROM_PG_XLOG)
        {
            fd = XLogFileRead(segno, emode, tli,
                              XLOG_FROM_PG_XLOG, true);
            if (fd != -1)
            {
                if (!expectedTLEs)
                    expectedTLEs = tles;
                return fd;
            }
        }

Well, I think we'll be able to cache locally the last WAL file in restore_command
if needed :-)

-- 
Sergey Burladyan

pgsql-hackers by date:

From: Shubham Barai
Date: 10 January 2018, 21:55:57
Subject: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

From: Peter Eisentraut
Date: 10 January 2018, 22:23:55
Subject: Re: portal pinning

Re: Why standby restores some WALs many times from archive? - Mailing list pgsql-hackers

Previous

Next