Re: Statistics Import and Export - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Statistics Import and Export
Date
Msg-id vhlnlrhgqaqxcjioyeyjoxuqzaccvsrbgeuqclalzqg5sp4u2k@42sqfxto23gg
Whole thread Raw
In response to Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers
Hi,

On 2025-03-06 12:04:25 -0500, Corey Huinker wrote:
> > > If there's value in freeing them, why isn't it being done already? What
> > > other thing would consume this freed memory?
> >
> > I'm not saying that they can be freed, they can't right now. My point is
> > just
> > that we *already* keep all the stats in memory, so the fact that fetching
> > all
> > stats in a single query would also require keeping them in memory is not an
> > issue.
> >
> 
> That's true in cases where we're not filtering schemas or tables. We fetch
> the pg_class stats as a part of getTables, but those are small, and not a
> part of the query in question.
> 
>  Fetching all the pg_stats for a db when we only want one table could be a
> nasty performance regression

I don't think anybody argued that we should fetch all stats regardless of
filtering for the to-be-dumped tables.


> and we can't just filter on the oids of the tables we want, because those
> tables can have expression indexes, so the oid filter would get complicated
> quickly.

I don't follow. We already have the tablenames, schemanames and oids of the
to-be-dumped tables/indexes collected in pg_dump, all that's needed is to send
a list of those to the server to filter there?


> > But TBH, I do wonder how much the current memory usage of the statistics
> > dump/restore support is going to bite us. In some cases this will
> > dramatically
> > increase pg_dump/pg_upgrade's memory usage, my tests were with tiny
> > amounts of
> > data and very simple scalar datatypes and you already could see a
> > substantial
> > increase.  With something like postgis or even just a lot of jsonb columns
> > this is going to be way worse.
> >
> 
> Yes, it will cost us in pg_dump, but it will save customers from some long
> ANALYZE operations.

My concern is that it might prevent some upgrades from *ever* completing,
because of pg_dump running out of memory.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: what's going on with lapwing?
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export