Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | CA+TgmobDOdteCQ-SUwntRykbuP59uqzFc6pTir10vd_mu07cTQ@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
|
List | pgsql-hackers |
On Fri, Sep 3, 2021 at 5:54 PM Andres Freund <andres@anarazel.de> wrote: > > I think we already have such a code in multiple places where we bypass the > > shared buffers for copying the relation > > e.g. index_copy_data(), heapam_relation_copy_data(). > > That's not at all comparable. We hold an exclusive lock on the relation at > that point, and we don't have a separate implementation of reading tuples from > the table or something like that. I don't think there's a way to do this that is perfectly clean, so the discussion here is really about finding the least unpleasant alternative. I *really* like the idea of using pg_class to figure out what relations to copy. As far as I'm concerned, pg_class is the canonical list of what's in the database, and to the extent that the filesystem happens to agree, that's good luck. From that perspective, using the filesystem to figure out what to copy is by definition a hack. Now, having to use dedicated tuple-reading code is also a hack, but to me that's largely an accident of questionable design decisions elsewhere. You can't read a buffer with just the minimal amount of information that you need to read a buffer; you have to have a relcache entry, so we have things like ReadBufferWithoutRelcache and CreateFakeRelcacheEntry. It's a little crazy to me that someone saw that ReadBuffer() needed a thing which some callers might not have and instead of saying "hmm, maybe we ought to change the arguments so that anyone with enough information to call this function can do so," they said "hmm, let's create a fake object that is not really the same as a real one but good enough to fool the function into doing the right thing, probably." I think the code layering here is just flat-out broken and ought to be fixed. A layer whose job it is to read and write blocks should not know that relations are even a thing. (The widespread use of global variables in the relcache code, the catcache code, and many other places in lieu of explicit parameter-passing just makes everything a lot worse.) So I think if we commit to the hackiness of the sort that this patch introduces, there is some hope of things getting better in the future. I don't think it's a real easy path forward, but maybe it's possible. If on the other hand we commit to using the filesystem, I don't see how it ever gets any better. Unlogged tables are a great example of a feature that depended on the filesystem and it now seems to me to be - by far - the worst thing about that feature. I have no idea how to get rid of that dependency or all of the associated problems without reverting the feature. But in this case, we seem to have another option, and so I think we should take it. Your (or other people's mileage) may vary ... this is just my view of it. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: