Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CA+TgmoYOR+nkaVNreJyGxUqDh7NnVp6DYM3G=w3zcQOvuvNahA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Mon, Mar 21, 2022 at 11:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I tried to debug the case but I realized that somehow
> CHECK_FOR_INTERRUPTS() is not calling the
> AcceptInvalidationMessages() and I could not find the same while
> looking into the code as well.   While debugging I noticed that
> AcceptInvalidationMessages() is called multiple times but that is only
> through LockRelationId() but while locking the relation we had already
> closed the previous smgr because at a time we keep only one smgr open.
> And that's the reason it is not hitting the issue which we think it
> could. Is there any condition under which it will call
> AcceptInvalidationMessages() through CHECK_FOR_INTERRUPTS() ? because
> I could not see while debugging as well as in code.

Yeah, I think the reason you can't find it is that it's not there. I
was confused in what I wrote earlier. I think we only process sinval
catchups when we're idle, not at every CHECK_FOR_INTERRUPTS(). And I
think the reason for that is precisely that it would be hard to write
correct code otherwise, since invalidations might then get processed
in a lot more places. So ... I guess all we really need to do here is
avoid assuming that the results of smgropen() are valid across any
code that might acquire relation locks. Which I think the code already
does.

But on a related note, why doesn't CreateDatabaseUsingWalLog() acquire
locks on both the source and destination relations? It looks like
you're only taking locks for the source template database ... but I
thought the intention here was to make sure that we didn't pull pages
into shared_buffers without holding a lock on the relation and/or the
database? I suppose the point is that while the template database
might be concurrently dropped, nobody can be doing anything
concurrently to the target database because nobody knows that it
exists yet. Still, I think that this would be the only case where we
let pages into shared_buffers without a relation or database lock,
though maybe I'm confused about this point, too. If not, perhaps we
should consider locking the target database OID and each relation OID
as we are copying it?

I guess I'm imagining that there might be more code pathways in the
future that want to ensure that there are no remaining buffers for
some particular database or relation OID. It seems natural to want to
be able to take some lock that prevents buffers from being added, and
then go and get rid of all the ones that are there already. But I
admit I can't quite think of a concrete case where we'd want to do
something like this where the patch as coded would be a problem. I'm
just thinking perhaps taking locks is fairly harmless and might avoid
some hypothetical problem later.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: refactoring basebackup.c (zstd workers)
Next
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c (zstd workers)