Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CA+TgmoZ=KWLxO4VE7e4bLsUEXyJJy7EF0QT8dvvXLTRh228VBQ@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Feb 15, 2022 at 6:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Here is the updated version of the patch, the changes are 1) Fixed
> review comments given by Robert and one open comment from Ashutosh.
> 2) Preserved the old create db method. 3) As agreed upthread for now
> we are using the new strategy only for createdb not for movedb so I
> have removed the changes in ForgetDatabaseSyncRequests() and
> DropDatabaseBuffers().  3) Provided a database creation strategy
> option as of now I have kept it as below.
>
> CREATE DATABASE ... WITH (STRATEGY = WAL_LOG);      -- default if
> option is omitted
> CREATE DATABASE ... WITH (STRATEGY = FILE_COPY);

All right. I think there have been two design-level objections to this
patch, and this resolves one of them. The other one is trickier,
because AFAICT it's basically an opinion question: is accessing
pg_class in the template database from some backend that is connected
to another database too ugly to be acceptable? Several people have
expressed concerns about that, but it's not clear to me whether they
are essentially saying "that is not what I would do if I were doing
this project" or more like "if you commit something that does it that
way I will be enraged and demand an immediate revert and the removal
of your commit bit." If it's the former, I think it's possible to
clean up various details of these patches to make them look nicer than
they do at present and get something committed for PostgreSQL 15. But
if it is the latter then there's really no point to that kind of
cleanup work and we should probably just give up now. So, Andres,
Heikki, and anybody else with a strong opinion, can you clarify how
vigorously you hate this design, or don't?

My own opinion is that this is actually rather elegant. It just makes
sense to me that the right way to figure out what relations are in a
database is to get that list from the database rather than from the
filesystem. Nobody would accept the idea of making \d work by listing
out the directory contents rather than by walking pg_class, and so the
only reason we ought to accept that in the case of cloning a database
is if the code is too ugly any other way. But is it really? It's true
that this patch set does some refactoring of interfaces in order to
make that work, and there's a few things about how it does that that I
think could be improved, but on the whole, it's seems like a
remarkably small amount of code to do something that we've long
considered absolutely taboo. Now, it's nowhere close to being
something that could be used to allow fully general cross-database
access, and there are severe problems with the idea of allowing any
such thing. In particular, there are various places that test for
connections to a database, and aren't going to be expected processes
not connected to the database to be touching it. My belief is that a
heavyweight lock on the database is a suitable surrogate, because we
actually take such a lock when connecting to a database, and that
forms part of the regular interlock. Taking such locks routinely for
short periods would be expensive and might create other problems, but
doing it for a maintenance operation seems OK. Also, if we wanted to
actually support full cross-database access, locking wouldn't be the
only problem by far. We'd have to deal with things like the relcache
and the catcache, which would be hard, and might increase the cost of
very common things that we need to be cheap. But none of that is
implicated in this patch, which only generalizes code paths that are
not so commonly taken as to pose a problem, and manages to reuse quite
a bit of code rather than introducing entirely new code to do the same
things.
.
It does introduce some new code here and there, though: there isn't
zero duplication. The biggest chunk of that FWICS is in 0006, in
GetDatabaseRelationList and GetRelListFromPage. I just can't get
excited about that. It's literally about two screens worth of code.
We're not talking about duplicating src/backend/access/heapam or
something like that. I do think it would be a good idea to split it up
just a bit more: I think the code inside GetRelListFromPage that is
guarded by HeapTupleSatisfiesVisibility() could be moved into a
separate subroutine, and I think that would likely look a big nicer.
But fundamentally I just don't see a huge issue here. That is not to
say there isn't a huge issue here: just that I don't see it.

Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: samay sharma
Date:
Subject: Proposal: Support custom authentication methods using hooks
Next
From: Tom Lane
Date:
Subject: Re: buildfarm warnings