On Fri, Aug 25, 2023 at 03:47:27PM +0300, Heikki Linnakangas wrote:
> That 'copying_initfork' condition is wrong. The first hint of that is that
> 'use_wal' is always true for an init fork. I believe this was meant to check
> for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise, this bad thing
> can happen:
>
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
>
> When you recover, the unlogged table is not reset, because it was a clean
> postgres shutdown. But the part of the file that was copied after the
> checkpoint at step 3 was never fsync'd, so part of the file is lost. I was
> able to reproduce with a VM that I kill to simulate step 5.
Oops.
The comment at the top of smgrimmedsync() looks incorrect now to me
now. When copying the data from something else than an init fork, the
relation pages are not WAL-logged for an unlogged relation.
> Fortunately the fix is trivial, see attached. I suspect we have similar
> problems in a few other places, though. end_heap_rewrite(), _bt_load(), and
> gist_indexsortbuild have a similar-looking smgrimmedsync() calls, with no
> consideration for unlogged relations at all. I haven't tested those, but
> they look wrong to me.
Oops ^ N. And end_heap_rewrite() mentions directly
RelationCopyStorage()..
--
Michael