Re: BUG #17731: Server doesn't start after abnormal shutdown while creating unlogged tables - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17731: Server doesn't start after abnormal shutdown while creating unlogged tables
Date
Msg-id ZE9rSxi0BCHfUH0x@paquier.xyz
Whole thread Raw
In response to Re: BUG #17731: Server doesn't start after abnormal shutdown while creating unlogged tables  (Karina Litskevich <litskevichkarina@gmail.com>)
Responses Re: BUG #17731: Server doesn't start after abnormal shutdown while creating unlogged tables  (Karina Litskevich <litskevichkarina@gmail.com>)
List pgsql-bugs
On Mon, Apr 24, 2023 at 03:59:38PM +0300, Karina Litskevich wrote:
> For unlogged tables and indexes init forks are created to simulate truncate on
> server startup. In StartupXLOG() every main fork, for which corresponding init
> fork exists, is deleted before replaying WAL, and then new main fork is created
> by copying init fork:
>
> ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
> ...
> PerformWalRecovery();
> ...
> ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
>
> So in case before WAL recovery main fork exists and init fork isn't, and during
> recovery init fork is created, we get this problem. The second
> ResetUnloggedRelations() call sees just created init fork and tries to create a
> main fork from it expecting that the old main fork was already deleted by the
> first ResetUnloggedRelations() call, but it wasn't because the main fork hasn't
> corresponding init fork at that moment yet.
>
> If you try to start server again, it will start successfully, as this time both
> init and main forks will present from the beginning.

So, from what I read, what you basically mean is a sequence like that:
1) create unlogged table.
2) drop it.
3) Stop the server in immediate mode before the next checkpoint has
the time to finish cleaning up the main fork still lying around.  At
this point the server has the truncated main fork, but not the init
fork as it has already been removed.
4) Restart server, recovery begins.
5) ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP) happens, sees
only what looks like a main fork, thinks there is nothing to do
because there is no init fork.
6) Begin WAL redo,
7) Replay the record that created the init fork.
8) Finish recovery.
9) ResetUnloggedRelations(UNLOGGED_RELATION_INIT) sees both the init
fork and the main fork.  We would do a copy_dir() from the init file
to the main fork, that fails on EEXIST.

Between points 7 and 8, there is something I am not really following,
though.  The deletion of all the forks of an unlogged table should be
replayed as well until we reach consistency, no?  At redo, the cleanup
of the forks is done when the COMMIT record of the transaction that
dropped the table is replayed, rather than delayed at checkpoint as a
sync request.  Hence, the init fork previously created should not
exist to begin with, no?  Am I missing something?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17906: Segmentation fault and database crash during procedure call