pgsql: Fix assorted errors in pg_dump's handling of extendedstatistics - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix assorted errors in pg_dump's handling of extendedstatistics
Date
Msg-id E1ekwIl-0005ov-Cg@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix assorted errors in pg_dump's handling of extended statistics objects.

pg_dump supposed that a stats object necessarily shares the same schema
as its underlying table, and that it doesn't have a separate owner.
These things may have been true during early development of the feature,
but they are not true as of v10 release.

Failure to track the object's schema separately turns out to have only
limited consequences, because pg_get_statisticsobjdef() always schema-
qualifies the target object name in the generated CREATE STATISTICS command
(a decision out of step with the rest of ruleutils.c, but I digress).
Therefore the restored object would be in the right schema, so that the
only problem is that the TOC entry would be mislabeled as to schema.  That
could lead to wrong decisions for schema-selective restores, for example.

The ownership issue is a bit more serious: not only was the TOC entry
potentially mislabeled as to owner, but pg_dump didn't bother to issue an
ALTER OWNER command at all, so that after restore the stats object would
continue to be owned by the restoring superuser.

A final point is that decisions as to whether to dump a stats object or
not were driven by whether the underlying table was dumped or not.  While
that's not wrong on its face, it won't scale nicely to the planned future
extension to cross-table statistics.  Moreover, that design decision comes
out of the view of stats objects as being auxiliary to a particular table,
like a rule or trigger, which is exactly where the above problems came
from.  Since we're now treating stats objects more like independent objects
in their own right, they ought to behave like standalone objects for this
purpose too.  So change to using the generic selectDumpableObject() logic
for them (which presently amounts to "dump if containing schema is to be
dumped").

Along the way to fixing this, restructure so that getExtendedStatistics
collects the identity info (only) for all extended stats objects in one
query, and then for each object actually being dumped, we retrieve the
definition in dumpStatisticsExt.  This is necessary to ensure that
schema-qualification in the generated CREATE STATISTICS command happens
with respect to the search path that pg_dump will now be using at restore
time (ie, the schema the stats object is in, not that of the underlying
table).  It's probably also significantly faster in the typical scenario
where only a minority of tables have extended stats.

Back-patch to v10 where extended stats were introduced.

Discussion: https://postgr.es/m/18272.1518328606@sss.pgh.pa.us

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/1298fccefc9cf8090f57aba72244e26ddbc62f84

Modified Files
--------------
src/bin/pg_dump/common.c             |   2 +-
src/bin/pg_dump/pg_backup_archiver.c |   5 +-
src/bin/pg_dump/pg_dump.c            | 132 ++++++++++++++++-------------------
src/bin/pg_dump/pg_dump.h            |   5 +-
src/bin/pg_dump/t/002_pg_dump.pl     |   2 +-
5 files changed, 68 insertions(+), 78 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Avoid premature free of pass-by-reference CALL arguments.
Next
From: Bruce Momjian
Date:
Subject: pgsql: psql: give ^D hint for \q in place where \q is ignored