Thread: improve performance of pg_dump --binary-upgrade
While examining pg_upgrade on a cluster with many tables (created with the command in [0]), I noticed that a huge amount of pg_dump time goes towards the binary_upgrade_set_pg_class_oids() function. This function executes a rather expensive query for a single row, and this function appears to be called for most of the rows in pg_class. The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade' for this case. Instead of executing the query in every call to the function, we can execute it once during the first call and store all the required information in a sorted array that we can bsearch() in future calls. For the aformentioned test, pg_dump on my machine goes from ~2 minutes to ~18 seconds, which is much closer to the ~14 seconds it takes without --binary-upgrade. One downside of this approach is the memory usage. This was more-or-less the first approach that crossed my mind, so I wouldn't be surprised if there's a better way. I tried to keep the pg_dump output the same, but if that isn't important, maybe we could dump all the pg_class OIDs at once instead of calling binary_upgrade_set_pg_class_oids() for each one. [0] https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
One downside of this approach is the memory usage. This was more-or-less
Bar-napkin math tells me in a worst-case architecture and braindead byte alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be about 6.25MB of memory.
The obvious low-memory alternative would be to make a prepared statement, though that does nothing to cut down on the roundtrips.
I think this is a good trade off.
I think this is a good trade off.
On Thu, Apr 18, 2024 at 02:08:28AM -0400, Corey Huinker wrote: > Bar-napkin math tells me in a worst-case architecture and braindead byte > alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be > about 6.25MB of memory. > > The obvious low-memory alternative would be to make a prepared statement, > though that does nothing to cut down on the roundtrips. > > I think this is a good trade off. I've not checked the patch in details or tested it, but caching this information to gain this speed sounds like a very good thing. -- Michael
Attachment
> On 18 Apr 2024, at 06:17, Nathan Bossart <nathandbossart@gmail.com> wrote: > The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade' > for this case. Instead of executing the query in every call to the > function, we can execute it once during the first call and store all the > required information in a sorted array that we can bsearch() in future > calls. That does indeed seem like a saner approach. Since we look up the relkind we can also remove the is_index parameter to binary_upgrade_set_pg_class_oids since we already know that without the caller telling us? > One downside of this approach is the memory usage. I'm not too worried about the worst-case performance of this. > This was more-or-less > the first approach that crossed my mind, so I wouldn't be surprised if > there's a better way. I tried to keep the pg_dump output the same, but if > that isn't important, maybe we could dump all the pg_class OIDs at once > instead of calling binary_upgrade_set_pg_class_oids() for each one. Without changing the backend handling of the Oid's we can't really do that AFAICT, the backend stores the Oid for the next call so it needs to be per relation like now? For Greenplum we moved this to the backend by first dumping all Oids which were read into backend cache, and during relation creation the Oid to use was looked up in the backend. (This wasn't a performance change, it was to allow multiple shared-nothing clusters to have a unified view of Oids, so I never benchmarked it all that well.) The upside of that is that the magic Oid variables in the backend can be removed, but it obviously adds slight overhead in others. -- Daniel Gustafsson
On Thu, Apr 18, 2024 at 02:08:28AM -0400, Corey Huinker wrote: > Bar-napkin math tells me in a worst-case architecture and braindead byte > alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be > about 6.25MB of memory. That doesn't seem too terrible. > The obvious low-memory alternative would be to make a prepared statement, > though that does nothing to cut down on the roundtrips. > > I think this is a good trade off. Cool. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote: >> On 18 Apr 2024, at 06:17, Nathan Bossart <nathandbossart@gmail.com> wrote: > >> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade' >> for this case. Instead of executing the query in every call to the >> function, we can execute it once during the first call and store all the >> required information in a sorted array that we can bsearch() in future >> calls. > > That does indeed seem like a saner approach. Since we look up the relkind we > can also remove the is_index parameter to binary_upgrade_set_pg_class_oids > since we already know that without the caller telling us? Yeah. It looks like that's been possible since commit 9a974cb, so I can write a prerequisite patch for this. >> One downside of this approach is the memory usage. > > I'm not too worried about the worst-case performance of this. Cool. That seems to be the general sentiment. >> This was more-or-less >> the first approach that crossed my mind, so I wouldn't be surprised if >> there's a better way. I tried to keep the pg_dump output the same, but if >> that isn't important, maybe we could dump all the pg_class OIDs at once >> instead of calling binary_upgrade_set_pg_class_oids() for each one. > > Without changing the backend handling of the Oid's we can't really do that > AFAICT, the backend stores the Oid for the next call so it needs to be per > relation like now? Right, this would require additional changes. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote: > On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote: >> That does indeed seem like a saner approach. Since we look up the relkind we >> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids >> since we already know that without the caller telling us? > > Yeah. It looks like that's been possible since commit 9a974cb, so I can > write a prerequisite patch for this. Here's a new patch set with this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
> On 18 Apr 2024, at 22:28, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote: >> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote: >>> That does indeed seem like a saner approach. Since we look up the relkind we >>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids >>> since we already know that without the caller telling us? >> >> Yeah. It looks like that's been possible since commit 9a974cb, so I can >> write a prerequisite patch for this. > > Here's a new patch set with this change. From a read-through they look good, a very nice performance improvement in an important path. I think it would be nice with some comments on the BinaryUpgradeClassOids struct (since the code using it is thousands of lines away), and a comment on the if (oids == NULL) block explaining the caching. -- Daniel Gustafsson
On Thu, Apr 18, 2024 at 10:33:08PM +0200, Daniel Gustafsson wrote: > From a read-through they look good, a very nice performance improvement in an > important path. I think it would be nice with some comments on the > BinaryUpgradeClassOids struct (since the code using it is thousands of lines > away), and a comment on the if (oids == NULL) block explaining the caching. Added. Thanks for reviewing! Unfortunately, this one will have to sit for a couple months... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
I noticed that there are some existing examples of this sort of thing in pg_dump (e.g., commit d5e8930), so I adjusted the patch to match the surrounding style a bit better. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Committed. -- nathan