On Wed, Jun 16, 2021 at 6:13 PM Andres Freund <andres@anarazel.de> wrote:
> I don't think the main issue is the speed of checkpointing itself? The reaoson
> to maintain the old paths is that the "new approach" is bloating WAL volume,
> no? Right now cloning a 1TB database costs a few hundred bytes of WAL and about
> 1TB of write IO. With the proposed approach, the write volume approximately
> doubles, because there'll also be about 1TB in WAL.
This is a good point, but on the other hand, I think this smells a lot
like the wal_level=minimal optimization where we don't need to log
data being bulk-loaded into a table created in the same transaction if
wal_level=minimal. In theory, that optimization has a lot of value,
but in practice it gets a lot of bad press on this list, because (1)
sometimes doing the fsync is more expensive than writing the extra WAL
would have been and (2) most people want to run with
wal_level=replica/logical so it ends up being a code path that isn't
used much and is therefore more likely than average to have bugs
nobody's terribly interested in fixing (except Noah ... thanks Noah!).
If we add features in the future, lke TDE or perhaps incremental
backup, that rely on new pages getting new LSNs instead of recycled
ones, this may turn into the same kind of wart. And as with that
optimization, you're probably not even better off unless the database
is pretty big, and you might be worse off if you have to do fsyncs or
flush buffers synchronously. I'm not severely opposed to keeping both
methods around, so if that's really what people want to do, OK, but I
guess I wonder whether we're really going to be happy with that
decision down the road.
--
Robert Haas
EDB: http://www.enterprisedb.com