Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CAFiTN-uCLPQf_3JHZwuMdUbMR0+tC8W5fG5-kQUKJR2b=-w-rQ@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Thu, Aug 4, 2022 at 12:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> > Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
> > suspect this isn't actually related to STRATEGY wal_log and could likely be
> > reproduced in older versions too.
>
> I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
> That may just mean that it's easier to hit now.

I think this looks like a problem with drop db but IMHO you are seeing
this behavior only when a database is created using WAL LOG because in
this strategy we are using buffers to write the destination database
pages and some of the dirty buffers and sync requests might still be
pending.  And now when we try to drop the database it drops all the
dirty buffers and all pending sync requests and then before it
actually removes the directory it gets interrupted and now you see the
database directory on disk which is partially corrupted.  See below
sequence of drop database


dropdb()
{
...
DropDatabaseBuffers(db_id);
...
ForgetDatabaseSyncRequests(db_id);
...
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);

WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
 -- Inside this it can process the cancel query and get interrupted
remove_dbtablespaces(db_id);
..
}

I reproduced the same error by inducing error just before
WaitForProcSignalBarrier.

postgres[14968]=# CREATE DATABASE a STRATEGY WAL_LOG ; drop database a;
CREATE DATABASE
ERROR:  XX000: test error
LOCATION:  dropdb, dbcommands.c:1684
postgres[14968]=# \c a
connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC:
could not open critical system index 2662
Previous connection kept
postgres[14968]=#


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Cleaning up historical portability baggage
Next
From: Amul Sul
Date:
Subject: Re: Refactoring postgres_fdw/connection.c