Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | CA+TgmoZVBcEFZ+8DKOm5qf0X_SrXqKfTL5JK2O2=BA=KFkwdvw@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
|
List | pgsql-hackers |
On Thu, Mar 3, 2022 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > The new version of the patch fixes these 2 comments pointed by Ashutosh and also splits the GetRelListFromPage() functionas suggested by Robert and uses the latest snapshot for scanning the pg_class instead of active snapshot as pointedout by Robert. I haven't yet added the test case to create a database using this new strategy option. So if we areokay with these two options FILE_COPY and WAL_LOG then I will add test cases for the same. Reviewing 0001, the boundaries of the critical section move slightly, but only over a memcpy, which can't fail, so that seems fine. But this comment looks ominous: * Note: we're cheating a little bit here by assuming that mapped files * are either in pg_global or the database's default tablespace. It's not clear to me how the code that follows relies on this assumption, but the overall patch set would make that not true any more, so there's some kind of an issue to think about there. It's a little asymmetric that load_relmap_file() gets a subroutine read_relmap_file() while write_relmap_file() gets a subroutine write_relmap_file_internal(). Perhaps we could call the functions {load,write}_named_relmap_file() or something of that sort. Reviewing 0002, your comment updates in relmap_redo() are not complete. Note that there's an unmodified comment that says "Write out the new map and send sinval" just above where you modify the code to only conditionally send sinval. I'm somewhat uncomfortable with the shape of this logic, too. It looks weird to be sometimes calling write_relmap_file and sometimes write_relmap_file_internal. You'd expect functions with those names to be called at different abstraction levels, rather than at parallel call sites. The renaming I proposed would help with this but it's not just a cosmetic issue: the logic to construct mapfilename is in this function in one case, but in the called function in the other case. I can't help but think that the write_relmap_file()/write_relmap_file_internal() split isn't entirely the right thing. I think part of the confusion here is that, pre-patch, write_relmap_file() gets called during both recovery and normal running, and the updates to shared_map or local_map are actually nonsense during recovery, because the local map at least is local to whatever our database is, and we don't have a database connection if we're the startup process. After your patch, we're still going through write_relmap_file in recovery in some cases, but really those map updates don't seem like things that should be happening at all. And on the other hand it's not clear to me why the CRC stuff isn't needed in all cases, but that's only going to happen when we go through the non-internal version of the function. You've probably spent more time looking at this code than I have, but I'm wondering if the division should be like this: we have one function that does the actual update, and another function that does the update plus sets global variables. Recovery always uses the first one, and outside of recovery we use the first one for the create-database case and the second one otherwise. Thoughts? Regarding 0003, my initial thought was to like the fact that you'd avoided duplicating code by using a function parameter, but as I look at it a bit more, it's not clear to me that it's enough code that we really care about not duplicating it. I would not expect to find a function called RelationCopyAllFork() in tablecmds.c. I'd expect to find it in storage.c, I think. And I think I'd be surprised to find out that it doesn't actually know anything about copying; it's basically just a loop over the forks to which you can supply your own copy-function. And the fact that it's got an argument of type copy_relation_storage and the argument name is copy_storage and the value is sometimes RelationCopyStorageis a terminological muddle, too. So I feel like perhaps this needs more thought. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: