pgsql: Improve pg_dump's dependency-sorting logic to enforce section du - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Improve pg_dump's dependency-sorting logic to enforce section du
Date
Msg-id E1SjKTJ-0003bW-3w@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Improve pg_dump's dependency-sorting logic to enforce section dump order.

As of 9.2, with the --section option, it is very important that the concept
of "pre data", "data", and "post data" sections of the output be honored
strictly; else a dump divided into separate sectional files might be
unrestorable.  However, the dependency-sorting logic knew nothing of
sections and would happily select output orderings that didn't fit that
structure.  Doing so was mostly harmless before 9.2, but now we need to be
sure it doesn't do that.  To fix, create dummy objects representing the
section boundaries and add dependencies between them and all the normal
objects.  (This might sound expensive but it seems to only add a percent or
two to pg_dump's runtime.)

This also fixes a problem introduced in 9.1 by the feature that allows
incomplete GROUP BY lists when a primary key is given in GROUP BY.
That means that views can depend on primary key constraints.  Previously,
pg_dump would deal with that by simply emitting the primary key constraint
before the view definition (and hence before the data section of the
output).  That's bad enough for simple serial restores, where creating an
index before the data is loaded works, but is undesirable for speed
reasons.  But it could lead to outright failure of parallel restores, as
seen in bug #6699 from Joe Van Dyk.  That happened because pg_restore would
switch into parallel mode as soon as it reached the constraint, and then
very possibly would try to emit the view definition before the primary key
was committed (as a consequence of another bug that causes the view not to
be correctly marked as depending on the constraint).  Adding the section
boundary constraints forces the dependency-sorting code to break the view
into separate table and rule declarations, allowing the rule, and hence the
primary key constraint it depends on, to revert to their intended location
in the post-data section.  This also somewhat accidentally works around the
bogus-dependency-marking problem, because the rule will be correctly shown
as depending on the constraint, so parallel pg_restore will now do the
right thing.  (We will fix the bogus-dependency problem for real in a
separate patch, but that patch is not easily back-portable to 9.1, so the
fact that this patch is enough to dodge the only known symptom is
fortunate.)

Back-patch to 9.1, except for the hunk that adds verification that the
finished archive TOC list is in correct section order; the place where
it was convenient to add that doesn't exist in 9.1.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/a1ef01fe163b304760088e3e30eb22036910a495

Modified Files
--------------
src/bin/pg_dump/pg_backup_archiver.c |   34 +++++++++
src/bin/pg_dump/pg_dump.c            |  135 +++++++++++++++++++++++++++++++++-
src/bin/pg_dump/pg_dump.h            |    7 +-
src/bin/pg_dump/pg_dump_sort.c       |   92 +++++++++++++++++------
4 files changed, 238 insertions(+), 30 deletions(-)


pgsql-committers by date:

Previous
From: Alvaro Herrera
Date:
Subject: pgsql: Tighten up includes in sinvaladt.h, twophase.h, proc.h
Next
From: Tom Lane
Date:
Subject: pgsql: Make pg_dump emit more accurate dependency information.