Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WAL logging problem in 9.4.3?
Date
Msg-id 28415.1436197794@sss.pgh.pa.us
Whole thread Raw
In response to Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Responses Re: WAL logging problem in 9.4.3?  (Fujii Masao <masao.fujii@gmail.com>)
Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2015-07-06 11:14:40 -0400, Tom Lane wrote:
>> The COUNT() correctly says 11 rows, but after crash-and-recover,
>> only the row with -1 is there.  This is because the INSERT writes
>> out an INSERT+INIT WAL record, which we happily replay, clobbering
>> the data added later by COPY.

> ISTM any WAL logged action that touches a relfilenode essentially needs
> to disable further optimization based on the knowledge that the relation
> is new.

After a bit more thought, I think it's not so much "any WAL logged action"
as "any unconditionally-replayed action".  INSERT+INIT breaks this
example because heap_xlog_insert will unconditionally replay the action,
even if the page is valid and has same or newer LSN.  Similarly, TRUNCATE
is problematic because we redo it unconditionally (and in that case it's
hard to see an alternative).

> It'd not be impossible to add more state to the relcache entry for the
> relation. Whether it's likely that we'd find all the places that'd need
> updating that state, I'm not sure.

Yeah, the sticking point is mainly being sure that the state is correctly
tracked, both now and after future changes.  We'd need to identify a state
invariant that we could be pretty confident we'd not break.

One idea I had was to allow the COPY optimization only if the heap file is
physically zero-length at the time the COPY starts.  That would still be
able to optimize in all the cases we care about making COPY fast for.
Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Sawada Masahiko
Date:
Subject: Re: Freeze avoidance of very large table.
Next
From: Corey Huinker
Date:
Subject: Re: dblink: add polymorphic functions.