[PROPOSAL] Backup and recovery of pg_statistic - Mailing list pgsql-hackers

From Dmitry Ivanov
Subject [PROPOSAL] Backup and recovery of pg_statistic
Date
Msg-id 724322880.K8vzik8zPz@abook
Whole thread Raw
Responses Re: [PROPOSAL] Backup and recovery of pg_statistic
Re: [PROPOSAL] Backup and recovery of pg_statistic
List pgsql-hackers
Hi hackers,

This is my proposal for 'pg_dump' utility enhancements which enable backup and
recovery of statistical data stored in the 'pg_statistic' table. Statistical
data is very valuable for the query planner, so support engineers have to
manually generate it using ANALYZE whenever they migrate or restore a database
from backup, which is unpleasant. This proposal aims to improve the overall
situation.


Problem description
===================

Currently there is no way to backup 'pg_statistic' because columns of
'anyarray' type cannot be reconstructed solely with their textual
representation. Meanwhile, all that is needed to solve this problem is a small
extension capable of retrieving the element type of an 'anyarray' object and
recreating this particular 'anyarray' object using the 'array_in' procedure.
Another vital feature is the ability to transform various object identificators
that are stored in this table to textual representations in order to make them
portable. Such functionality could be easily implemented, which is why I've
made up a tiny proof of concept extension that is able to demonstrate the
possibility of recovery.


Design
======

Several things come to mind when we think of the 'pg_statistic' recovery:

* The procedure written in the C language is needed in order to determine the
element type of a composite 'anyarray' type. The returned 'oid' will be used
to reconstruct the 'anyarray' object using the 'array_in' procedure.

    arr := array_in('{1,2,3}', 'text'::regtype::oid, -1);
    anyarray_elemtype(arr) -> 25 ('text'::regtype::oid)

* The columns 'starelid' (relation identifier) and 'staop' (operator identifier)
have type 'oid', so their values could be invalid within a new DB, because the
object IDs of newly recovered relations and operators might have changed
during recovery. These kinds of values should be substituted with proper casts
to internal types, for example:

    65554 (relid) -> 'public.test'::regclass::oid
    15 (staopN) -> 'public.=(pg_catalog.=(pg_catalog.int4, pg_catalog.int8)'

Note that every type is schema-qualified in order to avoid naming conflicts.

* The type of a column which is referenced by the 'staattnum' column also
needs to be checked for the sake of consistency. It should remain the same,
otherwise collected stats won't be of any use.

* The main procedure will simply generate a bunch of INSERT queries (one query
per each row of the 'pg_statistic') which can be saved to a text file.


Proof of concept
================

Interface
---------

Currently there's a PoC extension which contains several functions:

dump_statistic() - returns a set of INSERT queries;

anyarray_elemtype(anyarray) - returns the object identificator of an element
type;

to_schema_qualified_operator(opid oid) - converts the 'opid' (operator ID) to a
schema-qualified operator name;
to_schema_qualified_relname(relid oid) - converts the 'relid' (relation ID) to
a schema-qualified relation name;
to_schema_qualified_type(typid oid) - converts the 'typid' (type ID) to a
schema-qualified type name;

to_attname(rel text, colnum smallint) - returns the name of the Nth column of
the specified table 'rel';
to_attnum(rel text, col text) - converts the table name 'rel' and the column
name 'col' to a column number;

to_atttype(rel text, col text) - returns the type of the column 'col';
to_atttype(rel text, colnum smallint) - overloaded for the column number
'colnum';

The extension is compatible with versions 9.4 and above.

Usage example
-------------

DB=# \copy (select dump_statistic()) to 'stat_backup.sql'
$ psql DB < stat_backup.sql


Proposed changes to pg_dump
===========================

Now that the approach has been developed, it may be applied to improve the
'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit
specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade'
mode, thus allowing us to restore saved stats after an upgrade.


Conclusion
==========

I've attached a tarball with sources so that anyone could try the extension.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: Use pg_rewind when target timeline was switched
Next
From: Tom Lane
Date:
Subject: Re: Errors in our encoding conversion tables