On Thu, May 23, 2019 at 2:43 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, May 21, 2019 at 08:39:18AM -0400, Robert Haas wrote:
> > Yes. I thought I had described it. You create an unlogged table,
> > with an index of a type that does not smgrimmedsync(), your
> > transaction commits, and then the system crashes, losing the _init
> > fork for the index.
>
> The init forks won't magically go away, except in one case for empty
> routines not going through shared buffers.
No magic is required. If you haven't called fsync(), the file might
not be there after a system crash.
Going through shared_buffers guarantees that the file will be
fsync()'d before the next checkpoint, but I'm talking about a scenario
where you crash before the next checkpoint.
> Then, empty routines going through shared buffers fill in one or more
> buffers, mark it/them as empty, dirty it/them, log the page(s) and then
> unlock the buffer(s). If a crash happens after the transaction
> commits, so we would still have the init page in WAL, and at the end
> of recovery we would know about it.
Yeah, but the problem is that the currently system requires us to know
about it at the *beginning* of recovery. See my earlier remarks:
Suppose we create an unlogged table and then crash. The main fork
makes it to disk, and the init fork does not. Before WAL replay, we
remove any main forks that have init forks, but because the init fork
was lost, that does not happen. Recovery recreates the init fork.
After WAL replay, we try to copy_file() each _init fork to the
corresponding main fork. That fails, because copy_file() expects to be
able to create the target file, and here it can't do that because it
already exists.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company