Hi,
On 2025-03-06 12:04:25 -0500, Corey Huinker wrote:
> > > If there's value in freeing them, why isn't it being done already? What
> > > other thing would consume this freed memory?
> >
> > I'm not saying that they can be freed, they can't right now. My point is
> > just
> > that we *already* keep all the stats in memory, so the fact that fetching
> > all
> > stats in a single query would also require keeping them in memory is not an
> > issue.
> >
>
> That's true in cases where we're not filtering schemas or tables. We fetch
> the pg_class stats as a part of getTables, but those are small, and not a
> part of the query in question.
>
> Fetching all the pg_stats for a db when we only want one table could be a
> nasty performance regression
I don't think anybody argued that we should fetch all stats regardless of
filtering for the to-be-dumped tables.
> and we can't just filter on the oids of the tables we want, because those
> tables can have expression indexes, so the oid filter would get complicated
> quickly.
I don't follow. We already have the tablenames, schemanames and oids of the
to-be-dumped tables/indexes collected in pg_dump, all that's needed is to send
a list of those to the server to filter there?
> > But TBH, I do wonder how much the current memory usage of the statistics
> > dump/restore support is going to bite us. In some cases this will
> > dramatically
> > increase pg_dump/pg_upgrade's memory usage, my tests were with tiny
> > amounts of
> > data and very simple scalar datatypes and you already could see a
> > substantial
> > increase. With something like postgis or even just a lot of jsonb columns
> > this is going to be way worse.
> >
>
> Yes, it will cost us in pg_dump, but it will save customers from some long
> ANALYZE operations.
My concern is that it might prevent some upgrades from *ever* completing,
because of pg_dump running out of memory.
Greetings,
Andres Freund