Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CAFiTN-v9M6myyLqZt0u5+52dqdpGkGbdjNcM3WMB5KOWcGD2tA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Thu, Aug 4, 2022 at 9:41 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Aug 4, 2022 at 12:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> > > Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
> > > suspect this isn't actually related to STRATEGY wal_log and could likely be
> > > reproduced in older versions too.
> >
> > I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
> > That may just mean that it's easier to hit now.
>
> I think this looks like a problem with drop db but IMHO you are seeing
> this behavior only when a database is created using WAL LOG because in
> this strategy we are using buffers to write the destination database
> pages and some of the dirty buffers and sync requests might still be
> pending.  And now when we try to drop the database it drops all the
> dirty buffers and all pending sync requests and then before it
> actually removes the directory it gets interrupted and now you see the
> database directory on disk which is partially corrupted.  See below
> sequence of drop database
>
>
> dropdb()
> {
> ...
> DropDatabaseBuffers(db_id);
> ...
> ForgetDatabaseSyncRequests(db_id);
> ...
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
>
> WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
>  -- Inside this it can process the cancel query and get interrupted
> remove_dbtablespaces(db_id);
> ..
> }
>
> I reproduced the same error by inducing error just before
> WaitForProcSignalBarrier.
>
> postgres[14968]=# CREATE DATABASE a STRATEGY WAL_LOG ; drop database a;
> CREATE DATABASE
> ERROR:  XX000: test error
> LOCATION:  dropdb, dbcommands.c:1684
> postgres[14968]=# \c a
> connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC:
> could not open critical system index 2662
> Previous connection kept
> postgres[14968]=#

So basically, from this we can say it is completely a problem with
drop databases, I mean I can produce any behavior by interrupting drop
database
1. If we created some tables/inserted data and the drop database got
cancelled, it might have a database directory and those objects are
lost.
2.  Or you can even drop the database directory and then get cancelled
before deleting the pg_database entry then also you will end up with a
corrupted database, doesn't matter whether you created it with WAL LOG
or FILE COPY.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Fix obsoleted comments for function prototypes
Next
From: Shinya Kato
Date:
Subject: Fix inconsistencies GUC categories