On Fri, 23 Jan 2004, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Tom's answer will be undoubtly better ...
>
> Nope, I think you got all the relevant points.
>
> The only thing I'd add after having had more time to think about it is
> that this seems very much like the problem we noticed recently with
> recovery-from-WAL being broken by the new code in bufmgr.c that tries to
> validate the header fields of any page it reads in. We had to add an
> escape hatch to disable that check while InRecovery, and I expect what
> we will end up with here is a few lines added to slru.c to make it treat
> read-past-EOF as a non-error condition when InRecovery. Now the clog
> code has always had all that paranoid error checking, but because it
> deals in such tiny volumes of data (only 2 bits per transaction), it's
> unlikely to suffer an out-of-disk-space condition. That's why we hadn't
> seen this failure mode before.
It seems that by adding the following to SlruPhysicalReadPage() we can
recover in a reasonable way here. Instead of:
if (lseek(fd, (off_t) offset, SEEK_SET) < 0) { slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno;
return false; }
We have:
if (lseek(fd, (off_t) offset, SEEK_SET) < 0) {if(!InRecovery){ slru_errcause = SLRU_SEEK_FAILED;
slru_errno= errno; return false;} ereport(LOG, (errmsg("Short read from file \"%s\", reading
aszeroes", path))); MemSet(shared->page_buffer[slotno], 0, BLCKSZ); return true; }
Which is exactly how we recover from a missing pg_clog file.
>
> regards, tom lane
Gavin