Re: Statistics Import and Export - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Statistics Import and Export
Date
Msg-id qzwh4yv62shsxkkifluvvcgfxb6vxp7zydrhd5v4yk7bl7kl2f@3qg6xjsubtyh
Whole thread Raw
In response to Re: Statistics Import and Export  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2025-02-21 15:49:10 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with
> > this change.
>
> Well, it's doing strictly more work, so somewhat slower is to be
> expected.

Yea, if we had talked a few percent, I'd not have balked.  It's more like 2-4x
though and it'll probably be worse when not connecting over local TCP
connections.

This is a slowdown to the point that the downtime for pg_upgrade will be
substantially lengthened compared to before.  But I think we should be able to
address that to a large degree.


> In a quick look at the committed patch, it doesn't seem to have
> used any of the speedup strategies we applied to pg_dump a couple
> of years ago.  One or the other of these should help:
>
> * Issue a single query to fetch stats from every table we're dumping
> * Set up a prepared query to avoid re-planning the per-table query
>   (compare be85727a3)
>
> I'm not sure how workable the first of these would be though.
> It's not hard to imagine it blowing out pg_dump's memory usage
> for a DB with a lot of tables and high default_statistics_target.

We could presumably do the one-query approach for the relation stats, that's
just three integers.  That way we'd at least not end up with two queries for
each table (for pg_class.reltuples etc and for pg_stats).

I guess the memory usage could also be addressed by using COPY, but that's
probably unrealistically complicated.


> The second one should be relatively downside-free.

Yea. And at least with pg_dump running locally that's where a lot of the CPU
time is spent.

Remotely doing lots of one-by-one queries will hurt even with prepared
statements though.

One way to largely address that would be to use a prepared statement combined
with libpq pipelining.  That still has separate executor startup etc, but I
think it should still reduce the cost to a point where we don't care anymore.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Statistics Import and Export
Next
From: Andrew Dunstan
Date:
Subject: Re: TAP test started using meson, can get a tcp port already used by another test.