Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | CAExHW5srxP5MOYD8brXxQqYfjWEUwyxVR0-rA505aN4UjH+KmA@mail.gmail.com Whole thread Raw |
In response to | Re: Statistics Import and Export (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
List | pgsql-hackers |
Hi Corey,
On Mon, Mar 25, 2024 at 3:38 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Corey,On Sat, Mar 23, 2024 at 7:21 AM Corey Huinker <corey.huinker@gmail.com> wrote:v12 attached.
0001 -Some random comments+SELECT
+ format('SELECT pg_catalog.pg_set_attribute_stats( '
+ || 'relation => %L::regclass::oid, attname => %L::name, '
+ || 'inherited => %L::boolean, null_frac => %L::real, '
+ || 'avg_width => %L::integer, n_distinct => %L::real, '
+ || 'most_common_vals => %L::text, '
+ || 'most_common_freqs => %L::real[], '
+ || 'histogram_bounds => %L::text, '
+ || 'correlation => %L::real, '
+ || 'most_common_elems => %L::text, '
+ || 'most_common_elem_freqs => %L::real[], '
+ || 'elem_count_histogram => %L::real[], '
+ || 'range_length_histogram => %L::text, '
+ || 'range_empty_frac => %L::real, '
+ || 'range_bounds_histogram => %L::text) ',
+ 'stats_export_import.' || s.tablename || '_clone', s.attname,
+ s.inherited, s.null_frac,
+ s.avg_width, s.n_distinct,
+ s.most_common_vals, s.most_common_freqs, s.histogram_bounds,
+ s.correlation, s.most_common_elems, s.most_common_elem_freqs,
+ s.elem_count_histogram, s.range_length_histogram,
+ s.range_empty_frac, s.range_bounds_histogram)
+FROM pg_catalog.pg_stats AS s
+WHERE s.schemaname = 'stats_export_import'
+AND s.tablename IN ('test', 'is_odd')
+\gexec
Why do we need to construct the command and execute? Can we instead execute the function directly? That would also avoid ECHO magic.
Addressed in v15
+ <table id="functions-admin-statsimport">
+ <title>Database Object Statistics Import Functions</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ Function
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
COMMENT: The functions throw many validation errors. Do we want to list the acceptable/unacceptable input values in the documentation corresponding to those? I don't expect one line per argument validation. Something like "these, these and these arguments can not be NULL" or "both arguments in each of the pairs x and y, a and b, and c and d should be non-NULL or NULL respectively".
Addressed in v15.
+ /* Statistics are dependent on the definition, not the data */
+ /* Views don't have stats */
+ if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) &&
+ (tbinfo->relkind == RELKIND_VIEW))
+ dumpRelationStats(fout, &tbinfo->dobj, reltypename,
+ tbinfo->dobj.dumpId);
+
Statistics are about data. Whenever pg_dump dumps some filtered data, the
statistics collected for the whole table are uselss. We should avoide dumping
statistics in such a case. E.g. when only schema is dumped what good is
statistics? Similarly the statistics on a partitioned table may not be useful
if some its partitions are not dumped. Said that dumping statistics on foreign
table makes sense since they do not contain data but the statistics still makes sense.
Dumping statistics without data is required for pg_upgrade. This is being discussed in the same thread. But I don't see some of the suggestions e.g. using binary-mode switch being used in v15.
Also, should we handle sequences, composite types the same way? THe latter is probably not dumped, but in case.
Whether or not I pass --no-statistics, there is no difference in the dump output. Am I missing something?$ pg_dump -d postgres > /tmp/dump_no_arguments.out
$ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
$ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out$IIUC, pg_dump includes statistics by default. That means all our pg_dump related tests will have statistics output by default. That's good since the functionality will always be tested. 1. We need additional tests to ensure that the statistics is installed after restore. 2. Some of those tests compare dumps before and after restore. In case the statistics is changed because of auto-analyze happening post-restore, these tests will fail.
Fixed.
Thanks for addressing those comments.
Best Wishes,
Ashutosh Bapat
pgsql-hackers by date: