Thread: improve performance of pg_dump --binary-upgrade

improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
While examining pg_upgrade on a cluster with many tables (created with the
command in [0]), I noticed that a huge amount of pg_dump time goes towards
the binary_upgrade_set_pg_class_oids() function.  This function executes a
rather expensive query for a single row, and this function appears to be
called for most of the rows in pg_class.

The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
for this case.  Instead of executing the query in every call to the
function, we can execute it once during the first call and store all the
required information in a sorted array that we can bsearch() in future
calls.  For the aformentioned test, pg_dump on my machine goes from ~2
minutes to ~18 seconds, which is much closer to the ~14 seconds it takes
without --binary-upgrade.

One downside of this approach is the memory usage.  This was more-or-less
the first approach that crossed my mind, so I wouldn't be surprised if
there's a better way.  I tried to keep the pg_dump output the same, but if
that isn't important, maybe we could dump all the pg_class OIDs at once
instead of calling binary_upgrade_set_pg_class_oids() for each one.

[0] https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: improve performance of pg_dump --binary-upgrade

From
Corey Huinker
Date:
One downside of this approach is the memory usage.  This was more-or-less


Bar-napkin math tells me in a worst-case architecture and braindead byte alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be about 6.25MB of memory.

The obvious low-memory alternative would be to make a prepared statement, though that does nothing to cut down on the roundtrips.

I think this is a good trade off.


Re: improve performance of pg_dump --binary-upgrade

From
Michael Paquier
Date:
On Thu, Apr 18, 2024 at 02:08:28AM -0400, Corey Huinker wrote:
> Bar-napkin math tells me in a worst-case architecture and braindead byte
> alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be
> about 6.25MB of memory.
>
> The obvious low-memory alternative would be to make a prepared statement,
> though that does nothing to cut down on the roundtrips.
>
> I think this is a good trade off.

I've not checked the patch in details or tested it, but caching this
information to gain this speed sounds like a very good thing.
--
Michael

Attachment

Re: improve performance of pg_dump --binary-upgrade

From
Daniel Gustafsson
Date:
> On 18 Apr 2024, at 06:17, Nathan Bossart <nathandbossart@gmail.com> wrote:

> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
> for this case.  Instead of executing the query in every call to the
> function, we can execute it once during the first call and store all the
> required information in a sorted array that we can bsearch() in future
> calls.

That does indeed seem like a saner approach.  Since we look up the relkind we
can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
since we already know that without the caller telling us?

> One downside of this approach is the memory usage.

I'm not too worried about the worst-case performance of this.

> This was more-or-less
> the first approach that crossed my mind, so I wouldn't be surprised if
> there's a better way.  I tried to keep the pg_dump output the same, but if
> that isn't important, maybe we could dump all the pg_class OIDs at once
> instead of calling binary_upgrade_set_pg_class_oids() for each one.

Without changing the backend handling of the Oid's we can't really do that
AFAICT, the backend stores the Oid for the next call so it needs to be per
relation like now?

For Greenplum we moved this to the backend by first dumping all Oids which were
read into backend cache, and during relation creation the Oid to use was looked
up in the backend.  (This wasn't a performance change, it was to allow multiple
shared-nothing clusters to have a unified view of Oids, so I never benchmarked
it all that well.) The upside of that is that the magic Oid variables in the
backend can be removed, but it obviously adds slight overhead in others.

--
Daniel Gustafsson




Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
On Thu, Apr 18, 2024 at 02:08:28AM -0400, Corey Huinker wrote:
> Bar-napkin math tells me in a worst-case architecture and braindead byte
> alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be
> about 6.25MB of memory.

That doesn't seem too terrible.

> The obvious low-memory alternative would be to make a prepared statement,
> though that does nothing to cut down on the roundtrips.
> 
> I think this is a good trade off.

Cool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>> On 18 Apr 2024, at 06:17, Nathan Bossart <nathandbossart@gmail.com> wrote:
> 
>> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
>> for this case.  Instead of executing the query in every call to the
>> function, we can execute it once during the first call and store all the
>> required information in a sorted array that we can bsearch() in future
>> calls.
> 
> That does indeed seem like a saner approach.  Since we look up the relkind we
> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
> since we already know that without the caller telling us?

Yeah.  It looks like that's been possible since commit 9a974cb, so I can
write a prerequisite patch for this.

>> One downside of this approach is the memory usage.
> 
> I'm not too worried about the worst-case performance of this.

Cool.  That seems to be the general sentiment.

>> This was more-or-less
>> the first approach that crossed my mind, so I wouldn't be surprised if
>> there's a better way.  I tried to keep the pg_dump output the same, but if
>> that isn't important, maybe we could dump all the pg_class OIDs at once
>> instead of calling binary_upgrade_set_pg_class_oids() for each one.
> 
> Without changing the backend handling of the Oid's we can't really do that
> AFAICT, the backend stores the Oid for the next call so it needs to be per
> relation like now?

Right, this would require additional changes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote:
> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>> That does indeed seem like a saner approach.  Since we look up the relkind we
>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
>> since we already know that without the caller telling us?
> 
> Yeah.  It looks like that's been possible since commit 9a974cb, so I can
> write a prerequisite patch for this.

Here's a new patch set with this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: improve performance of pg_dump --binary-upgrade

From
Daniel Gustafsson
Date:
> On 18 Apr 2024, at 22:28, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote:
>> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>>> That does indeed seem like a saner approach.  Since we look up the relkind we
>>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
>>> since we already know that without the caller telling us?
>>
>> Yeah.  It looks like that's been possible since commit 9a974cb, so I can
>> write a prerequisite patch for this.
>
> Here's a new patch set with this change.

From a read-through they look good, a very nice performance improvement in an
important path.  I think it would be nice with some comments on the
BinaryUpgradeClassOids struct (since the code using it is thousands of lines
away), and a comment on the if (oids == NULL) block explaining the caching.

--
Daniel Gustafsson




Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
On Thu, Apr 18, 2024 at 10:33:08PM +0200, Daniel Gustafsson wrote:
> From a read-through they look good, a very nice performance improvement in an
> important path.  I think it would be nice with some comments on the
> BinaryUpgradeClassOids struct (since the code using it is thousands of lines
> away), and a comment on the if (oids == NULL) block explaining the caching.

Added.  Thanks for reviewing!  Unfortunately, this one will have to sit for
a couple months...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
I noticed that there are some existing examples of this sort of thing in
pg_dump (e.g., commit d5e8930), so I adjusted the patch to match the
surrounding style a bit better.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: improve performance of pg_dump --binary-upgrade

From
Nathan Bossart
Date:
Committed.

-- 
nathan