Thread: Re: optimize file transfer in pg_upgrade
On Wed, Nov 6, 2024 at 5:07 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
Therefore, it can be much faster to instead move the entire data directory from the old cluster
to the new cluster and to then swap the catalog relation files.
Thank you for breaking this up so clearly into separate commits. I think it is a very interesting idea, and anything to speed up pg_upgrade is always welcome. Some minor thoughts:
> [PATCH v1 3/8] Introduce catalog-swap mode for pg_upgrade.
> .. we don't really expect there to be directories within database directories,
> so perhaps it would be better to either unconditionally rename or to fail.
Failure seems the best option here, so we can cleanly handle any future cases in which we decide to put dirs in this directory.
> if (RelFileNumberIsValid(rfn))
> {
> FileNameMap key;
>
> key.relfilenumber = (RelFileNumber) rfn;
> if (bsearch(&key, context->maps, context->size,
> sizeof(FileNameMap), FileNameMapCmp))
> return 0;
> }
>
> snprintf(dst, sizeof(dst), "%s/%s", context->target, filename);
> if (rename(fname, dst) != 0)
I'm not quite clear what we are doing here with falling through for InvalidOid entries, could you explain?
> [PATCH v1 3/8] Introduce catalog-swap mode for pg_upgrade.
> .. we don't really expect there to be directories within database directories,
> so perhaps it would be better to either unconditionally rename or to fail.
Failure seems the best option here, so we can cleanly handle any future cases in which we decide to put dirs in this directory.
> if (RelFileNumberIsValid(rfn))
> {
> FileNameMap key;
>
> key.relfilenumber = (RelFileNumber) rfn;
> if (bsearch(&key, context->maps, context->size,
> sizeof(FileNameMap), FileNameMapCmp))
> return 0;
> }
>
> snprintf(dst, sizeof(dst), "%s/%s", context->target, filename);
> if (rename(fname, dst) != 0)
I'm not quite clear what we are doing here with falling through for InvalidOid entries, could you explain?
> .. vm_must_add_frozenbit isn't handled yet. We could either disallow
> using catalog-swap mode if the upgrade involves versions older than v9.6
Yes, this. No need for more code to handle super old versions when other options exist.
with this problem is to introduce a special mode for "initdb --sync-only"
that calls fsync() for everything _except_ the actual data files. If we
fsync() the new catalog files as we move them into place, and if we assume
that the old catalog files will have been properly synchronized before
upgrading, there's no reason to synchronize them again at the end.
Very cool approach!
Cheers,
Greg
On Sun, Nov 17, 2024 at 01:50:53PM -0500, Greg Sabino Mullane wrote: > On Wed, Nov 6, 2024 at 5:07 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> Therefore, it can be much faster to instead move the entire data directory >> from the old cluster >> to the new cluster and to then swap the catalog relation files. > > Thank you for breaking this up so clearly into separate commits. I think it > is a very interesting idea, and anything to speed up pg_upgrade is always > welcome. Some minor thoughts: Thank you for reviewing! >> .. we don't really expect there to be directories within database > directories, >> so perhaps it would be better to either unconditionally rename or to fail. > Failure seems the best option here, so we can cleanly handle any future > cases in which we decide to put dirs in this directory. Good point. >> if (RelFileNumberIsValid(rfn)) >> { >> FileNameMap key; >> >> key.relfilenumber = (RelFileNumber) rfn; >> if (bsearch(&key, context->maps, context->size, >> sizeof(FileNameMap), FileNameMapCmp)) >> return 0; >> } >> >> snprintf(dst, sizeof(dst), "%s/%s", context->target, filename); >> if (rename(fname, dst) != 0) > > I'm not quite clear what we are doing here with falling through > for InvalidOid entries, could you explain? The idea is that if it looks like a data file that we might want to transfer (i.e., it starts with a RelFileNumber), we should consult our map to determine whether to move it. Otherwise, we want to unconditionally transfer it so that we always use the files generated during pg_restore in the new cluster (e.g., PG_VERSION and pg_filenode.map). In theory, this should result in the same end state as what --link mode does today (for the new cluster, at least). >> .. vm_must_add_frozenbit isn't handled yet. We could either disallow >> using catalog-swap mode if the upgrade involves versions older than v9.6 > > Yes, this. No need for more code to handle super old versions when other > options exist. I'm inclined to agree. >> with this problem is to introduce a special mode for "initdb --sync-only" >> that calls fsync() for everything _except_ the actual data files. If we >> fsync() the new catalog files as we move them into place, and if we assume >> that the old catalog files will have been properly synchronized before >> upgrading, there's no reason to synchronize them again at the end. > > Very cool approach! :) -- nathan