Re: trying again to get incremental backup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: trying again to get incremental backup |
Date | |
Msg-id | CA+Tgmoa-Ow_JdDd0K3hxY5U+k0aoTg=SfTH3gtiMQVkb_YosAQ@mail.gmail.com Whole thread Raw |
In response to | Re: trying again to get incremental backup (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
List | pgsql-hackers |
On Wed, Nov 1, 2023 at 8:57 AM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > Thanks for answering! It all sounds like this > resync-standby-using-primary-incrbackup idea isn't fit for the current > pg_combinebackup, but rather for a new tool hopefully in future. It > could take the current LSN from stuck standby, calculate manifest on > the lagged and offline standby (do we need to calculate manifest > Checksum in that case? I cannot find code for it), deliver it via > "UPLOAD_MANIFEST" to primary and start fetching and applying the > differences while doing some form of copy-on-write from old & incoming > incrbackup data to "$relfilenodeid.new" and then durable_unlink() old > one and durable_rename("$relfilenodeid.new", "$relfilenodeid". Would > it still be possible in theory? (it could use additional safeguards > like rename controlfile when starting and just before ending to > additionally block startup if it hasn't finished). Also it looks as > per comment nearby struct IncrementalBackupInfo.manifest_files that > even checksums are just more for safeguarding rather than core > implementation (?) > > What I've meant in the initial idea is not to hinder current efforts, > but asking if the current design will not stand in a way for such a > cool new addition in future ? Hmm, interesting idea. I think something like that could be made to work. My first thought was that it would sort of suck to have to compute a manifest as a precondition of doing this, but then I started to think maybe it wouldn't, really. I mean, you'd have to scan the local directory tree and collect all the filenames so that you could remove any files that are no longer present in the current version of the data directory which the incremental backup would send to you. If you're already doing that, the additional cost of generating a manifest isn't that high, at least if you don't include checksums, which aren't required. On the other hand, if you didn't need to send the server a manifest and just needed to send the required WAL ranges, that would be even cheaper. I'll spend some more time thinking about this next week. > As per earlier test [1], I've already tried to simulate that in > incrbackuptests-0.1.tgz/test_across_wallevelminimal.sh , but that > worked (but that was with CTAS-wal-minimal-optimization -> new > relfilenodeOID is used for CTAS which got included in the incremental > backup as it's new file) Even retested that with Your v7 patch with > asserts, same. When simulating with "BEGIN; TRUNCATE nightmare; COPY > nightmare FROM '/tmp/copy.out'; COMMIT;" on wal_level=minimal it still > recovers using incremental backup because the WAL contains: TRUNCATE itself is always WAL-logged, but data added to the relation in the same relation as the TRUNCATE isn't always WAL-logged (but sometimes it is, depending on the relation size). So the failure case wouldn't be missing the TRUNCATE but missing some data-containing blocks within the relation shortly after it was created or truncated. I think what I need to do here is avoid summarizing WAL that was generated under wal_level=minimal. The walsummarizer process should just refuse to emit summaries for any such WAL. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: