Hi,
On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> It's been considered and not ruled out, with a "let's see how the simple
> thing works, first" approach. Considerations are:
>
> * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> and we need to use that because of the security barrier
I don't think that has to be a big issue, you can just make the the query
query multiple tables at once using an = ANY(ARRAY[]) expression or such.
> * The stats data is kinda heavy (most common value lists, most common
> elements lists, esp for high stattargets), which would be a considerable
> memory impact and some of those stats might not even be needed (example,
> index stats for a table that is filtered out)
Doesn't the code currently have this problem already? Afaict the stats are
currently all stored in memory inside pg_dump.
$ for opt in '' --no-statistics; do echo "using option $opt"; for dbname in pgbench_part_100 pgbench_part_1000
pgbench_part_10000;do echo $dbname; /usr/bin/time -f 'Max RSS kB: %M' ./src/bin/pg_dump/pg_dump --no-data
--quote-all-identifiers--no-sync --no-data $opt $dbname -Fp > /dev/null;done;done
using option
pgbench_part_100
Max RSS kB: 12780
pgbench_part_1000
Max RSS kB: 22700
pgbench_part_10000
Max RSS kB: 124224
using option --no-statistics
pgbench_part_100
Max RSS kB: 12648
pgbench_part_1000
Max RSS kB: 19124
pgbench_part_10000
Max RSS kB: 85068
I don't think the query itself would be a problem, a query querying all the
required stats should probably use PQsetSingleRowMode() or
PQsetChunkedRowsMode().
Greetings,
Andres Freund