Thread: Re: optimize file transfer in pg_upgrade
On Wed, Nov 6, 2024 at 5:07 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > these user relation files will have the same name. Therefore, it can be > much faster to instead move the entire data directory from the old cluster > to the new cluster and to then swap the catalog relation files. This is a cool idea. > Another interesting problem is that pg_upgrade currently doesn't transfer > the sequence data files. Since v10, we've restored these via pg_restore. > I believe this was originally done for the introduction of the pg_sequence > catalog, which changed the format of sequence tuples. In the new > catalog-swap mode I am proposing, this means we need to transfer all the > pg_restore-generated sequence data files. If there are many sequences, it > can be difficult to determine which transfer mode and synchronization > method will be faster. Since sequence tuple modifications are very rare, I > think the new catalog-swap mode should just use the sequence data files > from the old cluster whenever possible. Maybe we should rethink the decision not to transfer relfilenodes for sequences. Or have more than one way to do it. pg_upgrade --binary-upgrade --binary-upgrade-even-for-sequences, or whatever. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 28, 2025 at 2:40 PM Robert Haas <robertmhaas@gmail.com> wrote: > Maybe we should rethink the decision not to transfer relfilenodes for > sequences. Or have more than one way to do it. pg_upgrade > --binary-upgrade --binary-upgrade-even-for-sequences, or whatever. Sorry, I meant: pg_dump --binary-upgrade --binary-upgrade-even-for-sequences i.e. pg_upgrade could decide which way to ask pg_dump to do it, depending on versions and flags. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 28, 2025 at 02:41:22PM -0500, Robert Haas wrote: > On Fri, Feb 28, 2025 at 2:40 PM Robert Haas <robertmhaas@gmail.com> wrote: >> Maybe we should rethink the decision not to transfer relfilenodes for >> sequences. Or have more than one way to do it. pg_upgrade >> --binary-upgrade --binary-upgrade-even-for-sequences, or whatever. > > Sorry, I meant: pg_dump --binary-upgrade --binary-upgrade-even-for-sequences > > i.e. pg_upgrade could decide which way to ask pg_dump to do it, > depending on versions and flags. That's exactly where I landed (see v3-0002). I haven't measured whether transferring relfilenodes or dumping the sequence data is faster for the existing modes, but for now I've left those alone, i.e., they still dump sequence data. The new "swap" mode just uses the old cluster's sequence files, and I've disallowed using swap mode for upgrades from <v10 to avoid the sequence tuple format change (along with other incompatible changes). I'll admit I'm a bit concerned that this will cause problems if and when someone wants to change the sequence tuple format again. But that hasn't happened for a while, AFAIK nobody's planning to change it, and even if it does happen, we just need to have my proposed new mode transfer the sequence files like it transfers the catalog files. That will make this mode slower, especially if you have a ton of sequences, but maybe it'll still be a win in most cases. Of course, we probably will need to have pg_upgrade handle other kinds of format changes, too, but IMHO it's still worth trying to speed up pg_upgrade despite the potential future complexities. -- nathan
On Fri, Feb 28, 2025 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > That's exactly where I landed (see v3-0002). I haven't measured whether > transferring relfilenodes or dumping the sequence data is faster for the > existing modes, but for now I've left those alone, i.e., they still dump > sequence data. The new "swap" mode just uses the old cluster's sequence > files, and I've disallowed using swap mode for upgrades from <v10 to avoid > the sequence tuple format change (along with other incompatible changes). Ah. Perhaps I should have read the thread more carefully before commenting. Sounds good, at any rate. > I'll admit I'm a bit concerned that this will cause problems if and when > someone wants to change the sequence tuple format again. But that hasn't > happened for a while, AFAIK nobody's planning to change it, and even if it > does happen, we just need to have my proposed new mode transfer the > sequence files like it transfers the catalog files. That will make this > mode slower, especially if you have a ton of sequences, but maybe it'll > still be a win in most cases. Of course, we probably will need to have > pg_upgrade handle other kinds of format changes, too, but IMHO it's still > worth trying to speed up pg_upgrade despite the potential future > complexities. I think it's fine. If somebody comes along and says "hey, when v23 came out Nathan's feature only sped up pg_upgrade by 2x instead of 3x like it did for v22, so Nathan is a bad person," I think we can fairly reply "thanks for sharing your opinion, feel free not to use the feature and run at 1x speed". There's no rule saying that every optimization must always produce the maximum possible benefit in every scenario. We're just concerned about regressions, and "only delivers some of the speedup if the sequence format has changed on disk" is not a regression. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 28, 2025 at 03:37:49PM -0500, Robert Haas wrote: > On Fri, Feb 28, 2025 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> That's exactly where I landed (see v3-0002). I haven't measured whether >> transferring relfilenodes or dumping the sequence data is faster for the >> existing modes, but for now I've left those alone, i.e., they still dump >> sequence data. The new "swap" mode just uses the old cluster's sequence >> files, and I've disallowed using swap mode for upgrades from <v10 to avoid >> the sequence tuple format change (along with other incompatible changes). > > Ah. Perhaps I should have read the thread more carefully before > commenting. Sounds good, at any rate. On the contrary, I'm glad you independently came to the same conclusion. >> I'll admit I'm a bit concerned that this will cause problems if and when >> someone wants to change the sequence tuple format again. But that hasn't >> happened for a while, AFAIK nobody's planning to change it, and even if it >> does happen, we just need to have my proposed new mode transfer the >> sequence files like it transfers the catalog files. That will make this >> mode slower, especially if you have a ton of sequences, but maybe it'll >> still be a win in most cases. Of course, we probably will need to have >> pg_upgrade handle other kinds of format changes, too, but IMHO it's still >> worth trying to speed up pg_upgrade despite the potential future >> complexities. > > I think it's fine. If somebody comes along and says "hey, when v23 > came out Nathan's feature only sped up pg_upgrade by 2x instead of 3x > like it did for v22, so Nathan is a bad person," I think we can fairly > reply "thanks for sharing your opinion, feel free not to use the > feature and run at 1x speed". There's no rule saying that every > optimization must always produce the maximum possible benefit in every > scenario. We're just concerned about regressions, and "only delivers > some of the speedup if the sequence format has changed on disk" is not > a regression. Cool. I appreciate the design feedback. -- nathan