On Sun, Mar 31, 2024 at 5:33 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I'm on 2.2.2 (on Linux). But there's something wrong, because the
> pg_combinebackup that took ~150s on xfs/btrfs, takes ~900s on ZFS.
>
> I'm not sure it's a ZFS config issue, though, because it's not CPU or
> I/O bound, and I see this on both machines. And some simple dd tests
> show the zpool can do 10x the throughput. Could this be due to the file
> header / pool alignment?
Could ZFS recordsize > 8kB be making it worse, repeatedly dealing with
the same 128kB record as you copy_file_range 16 x 8kB blocks?
(Guessing you might be using the default recordsize?)
> I admit I'm not very familiar with the format, but you're probably right
> there's a header, and header_length does not seem to consider alignment.
> make_incremental_rfile simply does this:
>
> /* Remember length of header. */
> rf->header_length = sizeof(magic) + sizeof(rf->num_blocks) +
> sizeof(rf->truncation_block_length) +
> sizeof(BlockNumber) * rf->num_blocks;
>
> and sendFile() does the same thing when creating incremental basebackup.
> I guess it wouldn't be too difficult to make sure to align this to
> BLCKSZ or something like this. I wonder if the file format is documented
> somewhere ... It'd certainly be nicer to tweak before v18, if necessary.
>
> Anyway, is that really a problem? I mean, in my tests the CoW stuff
> seemed to work quite fine - at least on the XFS/BTRFS. Although, maybe
> that's why it took longer on XFS ...
Yeah I'm not sure, I assume it did more allocating and copying because
of that. It doesn't matter and it would be fine if a first version
weren't as good as possible, and fine if we tune the format later once
we know more, ie leaving improvements on the table. I just wanted to
share the observation. I wouldn't be surprised if the block-at-a-time
coding makes it slower and maybe makes the on disk data structures
worse, but I dunno I'm just guessing.
It's also interesting but not required to figure out how to tune ZFS
well for this purpose right now...