Re: Statistics Import and Export - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Statistics Import and Export
Date
Msg-id Z9c1rbzZegYQTOQE@nathan
Whole thread Raw
In response to Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Statistics Import and Export
List pgsql-hackers
Thanks for working on this, Corey.

On Fri, Mar 14, 2025 at 04:03:16PM -0400, Corey Huinker wrote:
> 0003 -
> 
> Storing the restore function calls in the archive entry hogged a lot of
> memory and made people nervous. This introduces a new function pointer that
> generates those restore SQL calls right before they're written to disk,
> thus reducing the memory load from "stats for every object to be dumped" to
> just one object. Thanks to Nathan for diagnosing some weird quirks with
> various formats.
> 
> 0004 -
> 
> This replaces the query in the prepared statement with one that batches
> them 100 relations at a time, and then maintains that result set until it
> is consumed. It seems to have obvious speedups.

I've been doing a variety of tests with my toy database of 100K relations
[0], and I'm seeing around 20% less memory usage.  That's still 20% more
than without stats, but that's still a pretty nice improvement.

I'd propose two small changes to the design:

* I tested a variety of batch sizes, and to my suprise, I saw the best
  results with around 64 relations per batch.  I imagine the absolute best
  batch size will vary greatly depending on the workload.  It might also
  depend on work_mem and friends.

* The custom format actually does two WriteToc() calls, and since these
  patches move the queries to this part of pg_dump, it means we'll run all
  the queries twice.  The comments around this code suggest that the second
  pass isn't strictly necessary and that it is really only useful for
  data/parallel restore, so we could probably skip it for no-data dumps.

With those two changes, a pg_upgrade-style dump of my test database goes
from ~21.6 seconds without these patches to ~11.2 seconds with them.  For
reference, the same dump without stats takes ~7 seconds on HEAD.

[0] https://postgr.es/m/Z9R9-mFbxukqKmg4%40nathan

-- 
nathan



pgsql-hackers by date:

Previous
From: Shayon Mukherjee
Date:
Subject: Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
Next
From: Peter Smith
Date:
Subject: Re: Tidy recent code bloat in pg_creatersubscriber::cleanup_objects_atexit