Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Disable WAL logging to speed up data loading |
Date | |
Msg-id | CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRWdmRevDnQ@mail.gmail.com Whole thread Raw |
In response to | Re: Disable WAL logging to speed up data loading (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Disable WAL logging to speed up data loading
RE: Disable WAL logging to speed up data loading |
List | pgsql-hackers |
On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > Yes. What I meant was such a safe guard needs to be implemented. > > This may mean that if we want to recover the database from that backup, > we need to specify the recovery target so that the archive recovery stops > just before the WAL record indicating wal_level change. Yeah, I think we need these kinds of safeguards, for sure. I'm also concerned about the way that this proposed feature interacts with incremental backup capabilities that already exist in tools like pgBackRest, EDB's BART, pg_probackup, and future things we might want to introduce into core, along the lines of what I have previously proposed. Now, I think pgBackRest uses only timestamps and checksums, so it probably doesn't care, but some of the other solutions rely on WAL-scanning to gather a list of changed blocks. I guess there's no reason that they can't notice the wal_level being changed and do the right thing; they should probably have that kind of capability already. Still, it strikes me that it might be useful if we had a stronger mechanism. I'm not exactly sure what that would look like, but suppose we had a feature where every time wal_level drops below replica, a counter gets incremented by 1, and that counter is saved in the control file. Or maybe when wal_level drops below minimal to none. Or maybe there are two counters. Anyway, the idea is that if you have a snapshot of the cluster at one time and a snapshot at another time, you can see whether anything scary has happened in the middle without needing all of the WAL in between. Maybe this is off-topic for this thread or not really needed, but I'm not sure. I don't think wal_level=none is a bad idea intrinsically, but I think it would be easy to implement it poorly and end up harming a lot of users. I have no problem with giving people a way to do dangerous things, but we should do our best to let people know how much danger they've incurred. By the way, another problem here is that some AMs - e.g. GiST, IIRC - use LSNs to figure out whether a block has changed. For temporary and unlogged tables, we use "fake" LSNs that are generated using a counter, but that approach only works because such relations are never really WAL-logged. Mixing fake LSNs and real LSNs will break stuff, and not bumping the LSN when the page changes probably will, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: