Thread: Re: optimize file transfer in pg_upgrade

Re: optimize file transfer in pg_upgrade

From
Bruce Momjian
Date:
On Wed, Nov  6, 2024 at 04:07:35PM -0600, Nathan Bossart wrote:
> For clusters with many relations, the file transfer step of pg_upgrade can
> take the longest.  This step clones, copies, or links the user relation
> files from the older cluster to the new cluster, so the amount of time it
> takes is closely related to the number of relations.  However, since v15,
> we've preserved the relfilenodes during pg_upgrade, which means that all of
> these user relation files will have the same name.  Therefore, it can be
> much faster to instead move the entire data directory from the old cluster
> to the new cluster and to then swap the catalog relation files.

That is certainly a creative idea.  I am surprised the links take so
long.  Obviously rollback would be hard, as you mentioned, while now you
can rollback --link until you start.  I think it clearly should be
considered.  The patch is smaller than I expected.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



Re: optimize file transfer in pg_upgrade

From
Nathan Bossart
Date:
On Mon, Nov 18, 2024 at 10:34:00PM -0500, Bruce Momjian wrote:
> On Wed, Nov  6, 2024 at 04:07:35PM -0600, Nathan Bossart wrote:
>> For clusters with many relations, the file transfer step of pg_upgrade can
>> take the longest.  This step clones, copies, or links the user relation
>> files from the older cluster to the new cluster, so the amount of time it
>> takes is closely related to the number of relations.  However, since v15,
>> we've preserved the relfilenodes during pg_upgrade, which means that all of
>> these user relation files will have the same name.  Therefore, it can be
>> much faster to instead move the entire data directory from the old cluster
>> to the new cluster and to then swap the catalog relation files.
> 
> That is certainly a creative idea.  I am surprised the links take so
> long.  Obviously rollback would be hard, as you mentioned, while now you
> can rollback --link until you start.  I think it clearly should be
> considered.

I've yet to try, but I'm cautiously optimistic that it will be possible to
generate simple scripts that can unwind things by just looking at the
directory entries, even if pg_upgrade crashed halfway through the linking
stage.

> The patch is smaller than I expected.

I was surprised by this, too.  Obviously, this one is a bit smaller than
the "real" patches will be because it's just a proof-of-concept, but it
should still be pretty manageable.

-- 
nathan