Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | CA+TgmoZ=KWLxO4VE7e4bLsUEXyJJy7EF0QT8dvvXLTRh228VBQ@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
|
List | pgsql-hackers |
On Tue, Feb 15, 2022 at 6:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > Here is the updated version of the patch, the changes are 1) Fixed > review comments given by Robert and one open comment from Ashutosh. > 2) Preserved the old create db method. 3) As agreed upthread for now > we are using the new strategy only for createdb not for movedb so I > have removed the changes in ForgetDatabaseSyncRequests() and > DropDatabaseBuffers(). 3) Provided a database creation strategy > option as of now I have kept it as below. > > CREATE DATABASE ... WITH (STRATEGY = WAL_LOG); -- default if > option is omitted > CREATE DATABASE ... WITH (STRATEGY = FILE_COPY); All right. I think there have been two design-level objections to this patch, and this resolves one of them. The other one is trickier, because AFAICT it's basically an opinion question: is accessing pg_class in the template database from some backend that is connected to another database too ugly to be acceptable? Several people have expressed concerns about that, but it's not clear to me whether they are essentially saying "that is not what I would do if I were doing this project" or more like "if you commit something that does it that way I will be enraged and demand an immediate revert and the removal of your commit bit." If it's the former, I think it's possible to clean up various details of these patches to make them look nicer than they do at present and get something committed for PostgreSQL 15. But if it is the latter then there's really no point to that kind of cleanup work and we should probably just give up now. So, Andres, Heikki, and anybody else with a strong opinion, can you clarify how vigorously you hate this design, or don't? My own opinion is that this is actually rather elegant. It just makes sense to me that the right way to figure out what relations are in a database is to get that list from the database rather than from the filesystem. Nobody would accept the idea of making \d work by listing out the directory contents rather than by walking pg_class, and so the only reason we ought to accept that in the case of cloning a database is if the code is too ugly any other way. But is it really? It's true that this patch set does some refactoring of interfaces in order to make that work, and there's a few things about how it does that that I think could be improved, but on the whole, it's seems like a remarkably small amount of code to do something that we've long considered absolutely taboo. Now, it's nowhere close to being something that could be used to allow fully general cross-database access, and there are severe problems with the idea of allowing any such thing. In particular, there are various places that test for connections to a database, and aren't going to be expected processes not connected to the database to be touching it. My belief is that a heavyweight lock on the database is a suitable surrogate, because we actually take such a lock when connecting to a database, and that forms part of the regular interlock. Taking such locks routinely for short periods would be expensive and might create other problems, but doing it for a maintenance operation seems OK. Also, if we wanted to actually support full cross-database access, locking wouldn't be the only problem by far. We'd have to deal with things like the relcache and the catcache, which would be hard, and might increase the cost of very common things that we need to be cheap. But none of that is implicated in this patch, which only generalizes code paths that are not so commonly taken as to pose a problem, and manages to reuse quite a bit of code rather than introducing entirely new code to do the same things. . It does introduce some new code here and there, though: there isn't zero duplication. The biggest chunk of that FWICS is in 0006, in GetDatabaseRelationList and GetRelListFromPage. I just can't get excited about that. It's literally about two screens worth of code. We're not talking about duplicating src/backend/access/heapam or something like that. I do think it would be a good idea to split it up just a bit more: I think the code inside GetRelListFromPage that is guarded by HeapTupleSatisfiesVisibility() could be moved into a separate subroutine, and I think that would likely look a big nicer. But fundamentally I just don't see a huge issue here. That is not to say there isn't a huge issue here: just that I don't see it. Comments? -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: