Thread: More performance improvements for pg_dump in binary upgrade mode

More performance improvements for pg_dump in binary upgrade mode

From
Daniel Gustafsson
Date:
Prompted by an off-list bugreport of pg_upgrade hanging (which turned out to be
slow enough to be perceived to hang) for large schemas I had a look at pg_dump
performance during --binary-upgrade mode today.  My initial take was to write
more or less exactly what Nathan did in [0], only to realize that it was a)
already proposed and b) I had even reviewed it.  Doh.

The next attempt was to reduce more per-object queries from binary upgrade, and
the typarray lookup binary_upgrade_set_type_oids_by_type_oid seemed like a good
candidate for a cache lookup.  Since already cache type TypeInfo objects, if we
add typarray to TypeInfo we can use the existing lookup code.

As a baseline, pg_dump dumps a synthetic workload of 10,000 (empty) relations
with a width of 1-10 columns:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --format=custom \
  --file a postgres > /dev/null

real    0m1.256s
user    0m0.273s
sys    0m0.059s

The same dump in binary upgrade mode runs significantly slower:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
  --format=custom --file a postgres > /dev/null

real    1m9.921s
user    0m0.782s
sys    0m0.436s

With the typarray caching from the patch attached here added:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
  --format=custom --file b postgres > /dev/null

real    0m45.210s
user    0m0.655s
sys    0m0.299s

With the typarray caching from the patch attached here added *and* Nathan's
patch from [0] added:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
  --format=custom --file a postgres > /dev/null

real    0m1.566s
user    0m0.309s
sys    0m0.080s

The combination of these patches thus puts binary uphrade mode almost on par
with a plain dump, which has the potential to make upgrades of large schemas
faster.  Parallel-parking this patch with Nathan's in the July CF, just wanted
to type it up while it was fresh in my mind.

--
Daniel Gustafsson

[0] https://commitfest.postgresql.org/48/4936/


Attachment
On Wed, May 15, 2024 at 10:15:13PM +0200, Daniel Gustafsson wrote:
> With the typarray caching from the patch attached here added *and* Nathan's
> patch from [0] added:
> 
> $ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
>   --format=custom --file a postgres > /dev/null
> 
> real    0m1.566s
> user    0m0.309s
> sys    0m0.080s
> 
> The combination of these patches thus puts binary uphrade mode almost on par
> with a plain dump, which has the potential to make upgrades of large schemas
> faster.  Parallel-parking this patch with Nathan's in the July CF, just wanted
> to type it up while it was fresh in my mind.

Nice!  I'll plan on taking a closer look at this one.  I have a couple
other ideas in-flight (e.g., parallelizing the once-in-each-database
operations with libpq's asynchronous APIs) that I'm hoping to post soon,
too.  v18 should have a lot of good stuff for pg_upgrade...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: More performance improvements for pg_dump in binary upgrade mode

From
Nathan Bossart
Date:
On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:
> Nice!  I'll plan on taking a closer look at this one.

LGTM.  I've marked the commitfest entry as ready-for-committer.

-- 
nathan



Re: More performance improvements for pg_dump in binary upgrade mode

From
Daniel Gustafsson
Date:
> On 5 Jun 2024, at 04:39, Nathan Bossart <nathandbossart@gmail.com> wrote:
> 
> On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:
>> Nice!  I'll plan on taking a closer look at this one.
> 
> LGTM.  I've marked the commitfest entry as ready-for-committer.

Thanks for review, committed.

--
Daniel Gustafsson