Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: WAL logging problem in 9.4.3?
Date
Msg-id CAHGQGwGzxAyKLUm+z2n43-ee6u976jLmQAz=pPBVzSueD3CAig@mail.gmail.com
Whole thread Raw
In response to Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jul 3, 2015 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> The optimization of TRUNCATE opereation that we can use when
>> CREATE TABLE and TRUNCATE are executed in the same transaction block
>> seems to cause the problem. In this case, only index file truncation is
>> logged, and index creation in btbuild() is not logged because wal_level
>> is minimal. Then at the subsequent crash recovery, index file is truncated
>> to 0 byte... Very simple fix is to log an index creation in that case,
>> but not sure if that's ok to do..
>
>> In 9.2 or before, this problem doesn't occur because no such error is thrown
>> even if an index file size is zero. But in 9.3 or later, since the planner
>> tries to read a meta page of an index to get the height of the btree tree,
>> an empty index file causes such error. The planner was changed that way by
>> commit 31f38f28, and the problem seems to be an oversight of that commit.
>
> What?  You want to blame the planner for failing because the index was
> left corrupt by broken WAL replay?  A failure would occur anyway at
> execution.

Yep, right. I was not thinking that such index with file size 0 is corrupted
because the reported problem didn't happen before that commit was added.
But that's my fault. Such index can cause an error even in other code paths.

Okay, so probably we need to change WAL replay of TRUNCATE so that
the index file is truncated to one containing only meta page instead of
empty one. That is, the WAL replay of TRUNCATE would need to call
index_build() after smgrtruncate() maybe.

Then how should we implement that? Invent new WAL record type that
calls smgrtruncate() and index_build() during WAL replay? Or add the
special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
call index_build() only if the flag is found? Any other good idea?
Anyway ISTM that we might need to add or modify WAL record.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: psql :: support for \ev viewname and \sv viewname
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench - allow backslash-continuations in custom scripts