On Wed, Mar 5, 2025 at 9:18 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-03-05 20:54:35 -0500, Corey Huinker wrote: > It's been considered and not ruled out, with a "let's see how the simple > thing works, first" approach. Considerations are: > > * pg_stats is keyed on schemaname + tablename (which can also be indexes) > and we need to use that because of the security barrier
I don't think that has to be a big issue, you can just make the the query query multiple tables at once using an = ANY(ARRAY[]) expression or such.
I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you suggesting we back the joins from pg_stats to pg_namespace and pg_class and then filter by oids?
> * The stats data is kinda heavy (most common value lists, most common > elements lists, esp for high stattargets), which would be a considerable > memory impact and some of those stats might not even be needed (example, > index stats for a table that is filtered out)
Doesn't the code currently have this problem already? Afaict the stats are currently all stored in memory inside pg_dump.
Each call to getAttributeStats() fetches the pg_stats for one and only one relation and then writes the SQL call to fout, then discards the result set once all the attributes of the relation are done.
I don't think the query itself would be a problem, a query querying all the required stats should probably use PQsetSingleRowMode() or PQsetChunkedRowsMode().
That makes sense if we get the attribute stats from the result set in the order that we need them, and I don't know how we could possibly do that. We'd still need a table to bsearch() and that would be huge.