On Thu, Apr 11, 2024 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 10, 2024 at 6:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Could we just write the blocks directly into the output array, and
> > then transpose them directly in place if start_blkno > 0? See
> > attached. I may be missing something, but the only downside I can
> > think of is that the output array is still clobbered even if we decide
> > to return BACK_UP_FILE_FULLY because of the 90% rule, but that just
> > requires a warning in the comment at the top.
>
> Yeah. This approach makes the name "relative_block_numbers" a bit
> confusing, but not running out of memory is nice, too.
Pushed. That fixes the stack problem.
Of course we still have this:
/*
* Since this array is relatively large, avoid putting it on the stack.
* But we don't need it at all if this is not an incremental backup.
*/
if (ib != NULL)
relative_block_numbers = palloc(sizeof(BlockNumber) * RELSEG_SIZE);
To rescue my initdb --rel-segsize project[1] for v18, I will have a go
at making that dynamic. It looks like we don't actually need to
allocate that until we get to the GetFileBackupMethod() call, and at
that point we have the file size. If I understand correctly,
statbuf.st_size / BLCKSZ would be enough, so we could embiggen our
block number buffer there if required, right? That way, you would
only need shedloads of virtual memory if you used initdb
--rel-segsize=shedloads and you actually have shedloads of data in a
table/segment. For example, with initdb --rel-segsize=1TB and a table
that contains 1TB+ of data, that'd allocate 512MB. It sounds
borderline OK when put that way. It sounds not OK with
--rel-segsize=infinite and 32TB of data -> palloc(16GB). I suppose
one weasel-out would be to say we only support segments up to (say)
1TB, until eventually we figure out how to break this code's
dependency on segments. I guess we'll have to do that one day to
support incremental backups of other smgr implementations that don't
even have segments (segments are a private detail of md.c after all,
not part of the smgr abstraction).
[1] https://commitfest.postgresql.org/48/4305/