Hi,
On 2025-02-21 15:49:10 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with
> > this change.
>
> Well, it's doing strictly more work, so somewhat slower is to be
> expected.
Yea, if we had talked a few percent, I'd not have balked. It's more like 2-4x
though and it'll probably be worse when not connecting over local TCP
connections.
This is a slowdown to the point that the downtime for pg_upgrade will be
substantially lengthened compared to before. But I think we should be able to
address that to a large degree.
> In a quick look at the committed patch, it doesn't seem to have
> used any of the speedup strategies we applied to pg_dump a couple
> of years ago. One or the other of these should help:
>
> * Issue a single query to fetch stats from every table we're dumping
> * Set up a prepared query to avoid re-planning the per-table query
> (compare be85727a3)
>
> I'm not sure how workable the first of these would be though.
> It's not hard to imagine it blowing out pg_dump's memory usage
> for a DB with a lot of tables and high default_statistics_target.
We could presumably do the one-query approach for the relation stats, that's
just three integers. That way we'd at least not end up with two queries for
each table (for pg_class.reltuples etc and for pg_stats).
I guess the memory usage could also be addressed by using COPY, but that's
probably unrealistically complicated.
> The second one should be relatively downside-free.
Yea. And at least with pg_dump running locally that's where a lot of the CPU
time is spent.
Remotely doing lots of one-by-one queries will hurt even with prepared
statements though.
One way to largely address that would be to use a prepared statement combined
with libpq pipelining. That still has separate executor startup etc, but I
think it should still reduce the cost to a point where we don't care anymore.
Greetings,
Andres Freund