On 10/10/2016 05:25 PM, Michael Paquier wrote:
> On Fri, Oct 7, 2016 at 2:59 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>> I believe the fix is very simple. The FSM change during truncation is
>> critical and the buffer must be marked by MarkBufferDirty() i.e. those
>> changes must make to the disk. I think it's alright not to WAL log them
>> because XLOG_SMGR_TRUNCATE will redo() them if a crash occurs. But it must
>> not be lost across a checkpoint. Also, since it happens only during relation
>> truncation, I don't see any problem from performance perspective.
>
> Agreed. I happen to notice that VM is similalry careful when it comes
> to truncate it (visibilitymap_truncate).
visibilitymap_truncate is actually also wrong, in a different way. The
truncation WAL record is written only after the VM (and FSM) are
truncated. But visibilitymap_truncate() has already modified and dirtied
the page. If the VM page change is flushed to disk before the WAL
record, and you crash, you might have a torn VM page and a checksum failure.
Simply replacing the MarkBufferDirtyHint() call with MarkBufferDirty()
in FreeSpaceMapTruncateRel would have the same issue. If you call
MarkBufferDirty(), you must WAL-log the change, and also set the page's
LSN to make sure the WAL record is flushed first.
I think we need something like the attached.
- Heikki