On Sun, Nov 17, 2024 at 01:50:53PM -0500, Greg Sabino Mullane wrote:
> On Wed, Nov 6, 2024 at 5:07 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> Therefore, it can be much faster to instead move the entire data directory
>> from the old cluster
>> to the new cluster and to then swap the catalog relation files.
>
> Thank you for breaking this up so clearly into separate commits. I think it
> is a very interesting idea, and anything to speed up pg_upgrade is always
> welcome. Some minor thoughts:
Thank you for reviewing!
>> .. we don't really expect there to be directories within database
> directories,
>> so perhaps it would be better to either unconditionally rename or to fail.
> Failure seems the best option here, so we can cleanly handle any future
> cases in which we decide to put dirs in this directory.
Good point.
>> if (RelFileNumberIsValid(rfn))
>> {
>> FileNameMap key;
>>
>> key.relfilenumber = (RelFileNumber) rfn;
>> if (bsearch(&key, context->maps, context->size,
>> sizeof(FileNameMap), FileNameMapCmp))
>> return 0;
>> }
>>
>> snprintf(dst, sizeof(dst), "%s/%s", context->target, filename);
>> if (rename(fname, dst) != 0)
>
> I'm not quite clear what we are doing here with falling through
> for InvalidOid entries, could you explain?
The idea is that if it looks like a data file that we might want to
transfer (i.e., it starts with a RelFileNumber), we should consult our map
to determine whether to move it. Otherwise, we want to unconditionally
transfer it so that we always use the files generated during pg_restore in
the new cluster (e.g., PG_VERSION and pg_filenode.map). In theory, this
should result in the same end state as what --link mode does today (for the
new cluster, at least).
>> .. vm_must_add_frozenbit isn't handled yet. We could either disallow
>> using catalog-swap mode if the upgrade involves versions older than v9.6
>
> Yes, this. No need for more code to handle super old versions when other
> options exist.
I'm inclined to agree.
>> with this problem is to introduce a special mode for "initdb --sync-only"
>> that calls fsync() for everything _except_ the actual data files. If we
>> fsync() the new catalog files as we move them into place, and if we assume
>> that the old catalog files will have been properly synchronized before
>> upgrading, there's no reason to synchronize them again at the end.
>
> Very cool approach!
:)
--
nathan