Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CA+TgmobDOdteCQ-SUwntRykbuP59uqzFc6pTir10vd_mu07cTQ@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Andres Freund <andres@anarazel.de>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Fri, Sep 3, 2021 at 5:54 PM Andres Freund <andres@anarazel.de> wrote:
> > I think we already have such a code in multiple places where we bypass the
> > shared buffers for copying the relation
> > e.g. index_copy_data(), heapam_relation_copy_data().
>
> That's not at all comparable. We hold an exclusive lock on the relation at
> that point, and we don't have a separate implementation of reading tuples from
> the table or something like that.

I don't think there's a way to do this that is perfectly clean, so the
discussion here is really about finding the least unpleasant
alternative. I *really* like the idea of using pg_class to figure out
what relations to copy. As far as I'm concerned, pg_class is the
canonical list of what's in the database, and to the extent that the
filesystem happens to agree, that's good luck. From that perspective,
using the filesystem to figure out what to copy is by definition a
hack.

Now, having to use dedicated tuple-reading code is also a hack, but to
me that's largely an accident of questionable design decisions
elsewhere. You can't read a buffer with just the minimal amount of
information that you need to read a buffer; you have to have a
relcache entry, so we have things like ReadBufferWithoutRelcache and
CreateFakeRelcacheEntry. It's a little crazy to me that someone saw
that ReadBuffer() needed a thing which some callers might not have and
instead of saying "hmm, maybe we ought to change the arguments so that
anyone with enough information to call this function can do so," they
said "hmm, let's create a fake object that is not really the same as a
real one but good enough to fool the function into doing the right
thing, probably." I think the code layering here is just flat-out
broken and ought to be fixed. A layer whose job it is to read and
write blocks should not know that relations are even a thing. (The
widespread use of global variables in the relcache code, the catcache
code, and many other places in lieu of explicit parameter-passing just
makes everything a lot worse.)

So I think if we commit to the hackiness of the sort that this patch
introduces, there is some hope of things getting better in the future.
I don't think it's a real easy path forward, but maybe it's possible.
If on the other hand we commit to using the filesystem, I don't see
how it ever gets any better. Unlogged tables are a great example of a
feature that depended on the filesystem and it now seems to me to be -
by far - the worst thing about that feature. I have no idea how to get
rid of that dependency or all of the associated problems without
reverting the feature. But in this case, we seem to have another
option, and so I think we should take it.

Your (or other people's mileage) may vary ... this is just my view of it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Migração Postgresql 8.3 para versão Postgresql 9.3
Next
From: Peter Geoghegan
Date:
Subject: Re: The Free Space Map: Problems and Opportunities