Re: Statistics Import and Export - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Statistics Import and Export
Date
Msg-id CAExHW5srxP5MOYD8brXxQqYfjWEUwyxVR0-rA505aN4UjH+KmA@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
Hi Corey,


On Mon, Mar 25, 2024 at 3:38 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Corey,


On Sat, Mar 23, 2024 at 7:21 AM Corey Huinker <corey.huinker@gmail.com> wrote:
v12 attached.

0001 - 


Some random comments

+SELECT
+    format('SELECT pg_catalog.pg_set_attribute_stats( '
+            || 'relation => %L::regclass::oid, attname => %L::name, '
+            || 'inherited => %L::boolean, null_frac => %L::real, '
+            || 'avg_width => %L::integer, n_distinct => %L::real, '
+            || 'most_common_vals => %L::text, '
+            || 'most_common_freqs => %L::real[], '
+            || 'histogram_bounds => %L::text, '
+            || 'correlation => %L::real, '
+            || 'most_common_elems => %L::text, '
+            || 'most_common_elem_freqs => %L::real[], '
+            || 'elem_count_histogram => %L::real[], '
+            || 'range_length_histogram => %L::text, '
+            || 'range_empty_frac => %L::real, '
+            || 'range_bounds_histogram => %L::text) ',
+        'stats_export_import.' || s.tablename || '_clone', s.attname,
+        s.inherited, s.null_frac,
+        s.avg_width, s.n_distinct,
+        s.most_common_vals, s.most_common_freqs, s.histogram_bounds,
+        s.correlation, s.most_common_elems, s.most_common_elem_freqs,
+        s.elem_count_histogram, s.range_length_histogram,
+        s.range_empty_frac, s.range_bounds_histogram)
+FROM pg_catalog.pg_stats AS s
+WHERE s.schemaname = 'stats_export_import'
+AND s.tablename IN ('test', 'is_odd')
+\gexec

Why do we need to construct the command and execute? Can we instead execute the function directly? That would also avoid ECHO magic.

Addressed in v15
 

+   <table id="functions-admin-statsimport">
+    <title>Database Object Statistics Import Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>

COMMENT: The functions throw many validation errors. Do we want to list the acceptable/unacceptable input values in the documentation corresponding to those? I don't expect one line per argument validation. Something like "these, these and these arguments can not be NULL" or "both arguments in each of the pairs x and y, a and b, and c and d should be non-NULL or NULL respectively".

Addressed in v15.
 
+ /* Statistics are dependent on the definition, not the data */
+ /* Views don't have stats */
+ if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) &&
+ (tbinfo->relkind == RELKIND_VIEW))
+ dumpRelationStats(fout, &tbinfo->dobj, reltypename,
+  tbinfo->dobj.dumpId);
+

Statistics are about data. Whenever pg_dump dumps some filtered data, the
statistics collected for the whole table are uselss. We should avoide dumping
statistics in such a case. E.g. when only schema is dumped what good is
statistics? Similarly the statistics on a partitioned table may not be useful
if some its partitions are not dumped. Said that dumping statistics on foreign
table makes sense since they do not contain data but the statistics still makes sense.

Dumping statistics without data is required for pg_upgrade. This is being discussed in the same thread. But I don't see some of the suggestions e.g. using binary-mode switch being used in v15.

Also, should we handle sequences, composite types the same way? THe latter is probably not dumped, but in case.
 

Whether or not I pass --no-statistics, there is no difference in the dump output. Am I missing something?
$ pg_dump -d postgres > /tmp/dump_no_arguments.out
$ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
$ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
$

IIUC, pg_dump includes statistics by default. That means all our pg_dump related tests will have statistics output by default. That's good since the functionality will always be tested. 1. We need additional tests to ensure that the statistics is installed after restore. 2. Some of those tests compare dumps before and after restore. In case the statistics is changed because of auto-analyze happening post-restore, these tests will fail.

Fixed.

Thanks for addressing those comments.
 
--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Fix parameters order for relation_copy_for_cluster
Next
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby