Re: [HACKERS] Unlogged tables cleanup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Unlogged tables cleanup
Date
Msg-id CA+TgmoarmtbAPFj=tCT4Tm4LqQnwHxJRFCSG2Bm=m6cE-nc=fQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Unlogged tables cleanup  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Unlogged tables cleanup  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [HACKERS] Unlogged tables cleanup  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, May 13, 2019 at 12:50 PM Andres Freund <andres@anarazel.de> wrote:
> > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > so it would be sufficient to have the init fork be recovered from WAL
> > for that to work.  However, we also do ResetUnloggedRelations *before*
> > replaying WAL in order to remove leftover not-init-fork files, and that
> > process requires that the init fork is present at that time.
>
> What scenario are you precisely wondering about? That
> ResetUnloggedRelations() could overwrite the main fork, while not yet
> having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> that only be possible while still in an inconsistent state? A checkpoint
> would have serialized the correct contents, and we'd not reach HS
> consistency before having replayed that WAL records resetting the table
> and the init fork consistency?

I think I see what Alvaro is talking about, or at least I think I see
*a* possible problem based on his remarks.

Suppose we create an unlogged table and then crash. The main fork
makes it to disk, and the init fork does not.  Before WAL replay, we
remove any main forks that have init forks, but because the init fork
was lost, that does not happen.  Recovery recreates the init fork.
After WAL replay, we try to copy_file() each _init fork to the
corresponding main fork. That fails, because copy_file() expects to be
able to create the target file, and here it can't do that because it
already exists.

If that's the scenario, I'm not sure the smgrimmedsync() call is
sufficient.  Suppose we log_smgrcreate() but then crash before
smgrimmedsync()... seems like we'd need to do them in the other order,
or else maybe just pass a flag to copy_file() telling it not to be so
picky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Unlogged tables cleanup
Next
From: Robert Haas
Date:
Subject: Re: att_isnull