Re: trying again to get incremental backup - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: trying again to get incremental backup |
Date | |
Msg-id | 202311161021.nisg7imt7kyf@alvherre.pgsql Whole thread Raw |
In response to | Re: trying again to get incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: trying again to get incremental backup
|
List | pgsql-hackers |
On 2023-Nov-13, Robert Haas wrote: > On Mon, Nov 13, 2023 at 11:25 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Also, it would be good to provide and use a > > function to initialize a BlockRefTableKey from the RelFileNode and > > forknum components, and ensure that any padding bytes are zeroed. > > Otherwise it's not going to be a great hash key. On my platform there > > aren't any (padding bytes), but I think it's unwise to rely on that. > > I'm having trouble understanding the second part of this suggestion. > Note that in a frontend context, SH_RAW_ALLOCATOR is pg_malloc0, and > in a backend context, we get the default, which is > MemoryContextAllocZero. Maybe there's some case this doesn't cover, > though? I meant code like this memcpy(&key.rlocator, rlocator, sizeof(RelFileLocator)); key.forknum = forknum; entry = blockreftable_lookup(brtab->hash, key); where any padding bytes in "key" could have arbitrary values, because they're not initialized. So I'd have a (maybe static inline) function BlockRefTableKeyInit(&key, rlocator, forknum) that fills it in for you. Note: #define SH_EQUAL(tb, a, b) (memcmp(&a, &b, sizeof(BlockRefTableKey)) == 0) AFAICT the new simplehash uses in this patch series are the only ones that use memcmp() as SH_EQUAL, so we don't necessarily have precedent on lack of padding bytes initialization in existing uses of simplehash. > > These forward struct declarations are not buying you anything, I'd > > remove them: > > I've had problems from time to time when I don't do this. I'll remove > it here, but I'm not convinced that it's always useless. Well, certainly there are places where they are necessary. > > I don't much like the way header files in src/bin/pg_combinebackup files > > are structured. Particularly, causing a simplehash to be "instantiated" > > just because load_manifest.h is included seems poised to cause pain. I > > think there should be a file with the basic struct declarations (no > > simplehash); and then maybe since both pg_basebackup and > > pg_combinebackup seem to need the same simplehash, create a separate > > header file containing just that.. But, did you notice that anything > > that includes reconstruct.h will instantiate the simplehash stuff, > > because it includes load_manifest.h? It may be unwise to have the > > simplehash in a header file. Maybe just declare it in each .c file that > > needs it. The duplicity is not that large. > > I think that I did this correctly. Oh, I hadn't grokked that we had this SH_SCOPE thing and a separate SH_DECLARE for it being extern. OK, please ignore that. > > Why leave unnamed arguments in function declarations? > > I mean, I've changed it now, but I don't think it's worth getting too > excited about. Well, we did get into consistency arguments on this point previously. I agree it's not *terribly* important, but on thread https://www.postgresql.org/message-id/flat/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg%40mail.gmail.com people got really worked up about this stuff. > > In GetFileBackupMethod(), which arguments are in and which are out? > > The comment doesn't say, and it's not obvious why we pass both the file > > path as well as the individual constituent pieces for it. > > The header comment does document which values are potentially set on > return. I guess I thought it was clear enough that the stuff not > documented to be output parameters was input parameters. Most of them > aren't even pointers, so they have to be input parameters. The only > exception is 'path', which I have some difficulty thinking that anyone > is going to imagine to be an input pointer. An output pointer, you mean :-) (Should it be const?) When the return value is BACK_UP_FILE_FULLY, it's not clear what happens to these output values; we modify some, but why? Maybe they should be left alone? In that case, the "if size == 0" test should move a couple of lines up, in the brtentry == NULL block. BTW, you could do the qsort() after deciding to backup the file fully if more than 90% needs to be replaced. BTW, in sendDir() why do lookup_path = pstrdup(pathbuf + basepathlen + 1); when you could do lookup_path = pstrdup(tarfilename); ? > > There are two functions named record_manifest_details_for_file() in > > different programs. > > I had trouble figuring out how to name this stuff. I did notice the > awkwardness, but surely nobody can think that two functions with the > same name in different binaries can be actually the same function. Of course not, but when cscope-jumping around, it is weird. > If we want to inject more underscores here, my vote is to go all the > way and make it per_wal_range_cb. +1 > > In walsummarizer.c, HandleWalSummarizerInterrupts is called in > > summarizer_read_local_xlog_page but SummarizeWAL() doesn't do that. > > Maybe it should? > > I replaced all the CHECK_FOR_INTERRUPTS() in that file with > HandleWalSummarizerInterrupts(). Does that seem right? Looks to be what walwriter.c does at least, so I guess it's OK. > > I think this path is not going to be very human-likeable. > > snprintf(final_path, MAXPGPATH, > > XLOGDIR "/summaries/%08X%08X%08X%08X%08X.summary", > > tli, > > LSN_FORMAT_ARGS(summary_start_lsn), > > LSN_FORMAT_ARGS(summary_end_lsn)); > > Why not add a dash between the TLI and between both LSNs, or something > > like that? > But I have a hard time arguing that it wouldn't be more readable still > if we put some separator characters in there. I didn't do that because > then they'd look less like WAL file names, but maybe that's not really > a problem. A possible reason not to bother is that these files are > less necessary for humans to care about than WAL files, since they > don't need to be archived or transported between nodes in any way. > Basically I think this is probably fine the way it is, but if you or > others think it's really important to change it, I can do that. Just > as long as we don't spend 50 emails arguing about which separator > character to use. Yeah, I just think that endless stream of hex chars are hard to read, and I've found myself following digits in the screen with my fingers in order to parse file names. I guess you could say thousands separators for regular numbers aren't needed either, but we do have them for readability sake. I think a new section in chapter 30 "Reliability and the Write-Ahead Log" is warranted. It would explain the summarization process, what the summary files are used for, and the deletion mechanism. I can draft something if you want. It's not clear to me if WalSummarizerCtl->pending_lsn if fulfilling some purpose or it's just a leftover from prior development. I see it's only read in an assertion ... Maybe if we think this cross-check is important, it should be turned into an elog? Otherwise, I'd remove it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia)
pgsql-hackers by date: