Home > mailing lists

Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Disable WAL logging to speed up data loading
Date	November 2, 2020 15:28:22
Msg-id	CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRWdmRevDnQ@mail.gmail.com Whole thread Raw
In response to	Re: Disable WAL logging to speed up data loading (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses	Re: Disable WAL logging to speed up data loading RE: Disable WAL logging to speed up data loading
List	pgsql-hackers

Tree view

On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Yes. What I meant was such a safe guard needs to be implemented.
>
> This may mean that if we want to recover the database from that backup,
> we need to specify the recovery target so that the archive recovery stops
> just before the WAL record indicating wal_level change.

Yeah, I think we need these kinds of safeguards, for sure.

I'm also concerned about the way that this proposed feature interacts
with incremental backup capabilities that already exist in tools like
pgBackRest, EDB's BART, pg_probackup, and future things we might want
to introduce into core, along the lines of what I have previously
proposed. Now, I think pgBackRest uses only timestamps and checksums,
so it probably doesn't care, but some of the other solutions rely on
WAL-scanning to gather a list of changed blocks. I guess there's no
reason that they can't notice the wal_level being changed and do the
right thing; they should probably have that kind of capability
already. Still, it strikes me that it might be useful if we had a
stronger mechanism.

I'm not exactly sure what that would look like, but suppose we had a
feature where every time wal_level drops below replica, a counter gets
incremented by 1, and that counter is saved in the control file. Or
maybe when wal_level drops below minimal to none. Or maybe there are
two counters. Anyway, the idea is that if you have a snapshot of the
cluster at one time and a snapshot at another time, you can see
whether anything scary has happened in the middle without needing all
of the WAL in between.

Maybe this is off-topic for this thread or not really needed, but I'm
not sure. I don't think wal_level=none is a bad idea intrinsically,
but I think it would be easy to implement it poorly and end up harming
a lot of users. I have no problem with giving people a way to do
dangerous things, but we should do our best to let people know how
much danger they've incurred.

By the way, another problem here is that some AMs - e.g. GiST, IIRC -
use LSNs to figure out whether a block has changed. For temporary and
unlogged tables, we use "fake" LSNs that are generated using a
counter, but that approach only works because such relations are never
really WAL-logged. Mixing fake LSNs and real LSNs will break stuff,
and not bumping the LSN when the page changes probably will, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Eugen Konkov
Date: 02 November 2020, 15:05:29
Subject: Proposition for autoname columns

From: Magnus Hagander
Date: 02 November 2020, 15:35:23
Subject: Re: Disable WAL logging to speed up data loading

Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

Previous

Next