pg_dump and premature optimizations for objects not to be dumped - Mailing list pgsql-hackers

From Tom Lane
Subject pg_dump and premature optimizations for objects not to be dumped
Date
Msg-id 19767.1452279786@sss.pgh.pa.us
Whole thread Raw
Responses Re: pg_dump and premature optimizations for objects not to be dumped  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I looked into Karsten Hilbert's report here
http://www.postgresql.org/message-id/20160108114529.GB22446@hermes.hilbert.loc
of being unable to run pg_upgrade on a database containing the pg_trgm
extension.  After some investigation, the problem is explained like this:

1. Karsten had installed pg_trgm into pg_catalog rather than some user
schema.

2. When getTypes() processes the gtrgm type created by pg_trgm, it
decides the type won't be dumped (cf selectDumpableType, which quite
properly rejects types in pg_catalog).  This causes it not to generate
a ShellTypeInfo subsidiary object.

3. Later, getExtensionMembership decides that we *should* dump gtrgm
and associated I/O routines, since we're in binary-upgrade mode.

4. Later still, repairTypeFuncLoop breaks the dependency loops between
gtrgm and its I/O routines, but it doesn't mark the shell type as dumpable
because there isn't one.

5. So we end up with no shell type being dumped, which does not work
in binary-upgrade mode because we can't create the shell type when
we see the first I/O function; we don't know what OID to give it.


The core of the problem, therefore, is the choice in getTypes to not
create a ShellTypeInfo if it thinks the type is not to be dumped.  That
was probably fine when it was written, but now that we can change our
minds about object dumpability later, it's clearly broken.  It's really
just unnecessary optimization anyway, since if we create the object and
don't use it, we've not wasted much but cycles.  I've verified that a
patch like this:

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 56c0528..6acbf4a 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** getTypes(Archive *fout, int *numTypes)
*** 3664,3671 ****          * should copy the base type's catId, but then it might capture the          * pg_depend
entriesfor the type, which we don't want.          */
 
!         if (tyinfo[i].dobj.dump && (tyinfo[i].typtype == TYPTYPE_BASE ||
!                                     tyinfo[i].typtype == TYPTYPE_RANGE))         {             stinfo =
(ShellTypeInfo*) pg_malloc(sizeof(ShellTypeInfo));             stinfo->dobj.objType = DO_SHELL_TYPE;
 
--- 3664,3671 ----          * should copy the base type's catId, but then it might capture the          * pg_depend
entriesfor the type, which we don't want.          */
 
!         if (tyinfo[i].typtype == TYPTYPE_BASE ||
!             tyinfo[i].typtype == TYPTYPE_RANGE)         {             stinfo = (ShellTypeInfo *)
pg_malloc(sizeof(ShellTypeInfo));            stinfo->dobj.objType = DO_SHELL_TYPE;
 

fixes Karsten's symptom without obvious other problems.

However ... there are a whole bunch of *other* places where pg_dump
skips work on objects that are not to be dumped, and some of them
would add quite a bit more cost than this one.  An example is just
above this code, where we skip getDomainConstraints() for domains
we think aren't to be dumped.  A fix for that wouldn't be a one-liner
either, because if we do pull in the domain's constraints unconditionally,
we'd have to do something to cause the domain's own dumpability flag
(and updates thereof) to get propagated to them.

Even if we fix all these issues today, it's not hard to imagine new bugs
of similar ilk sneaking in.  Exacerbating this is that some of those
checks are fine because they occur after getExtensionMembership(), at
which point the dump-or-not decisions won't change.  But in most places
in pg_dump, it's not instantly obvious whether you're looking at a flag
that is still subject to change or not.

The thing that seems possibly most robust to me is to pull in the
extension membership data *first* and incorporate it into the initial
selectDumpableObject tests, thus getting us back to the pre-9.1 state
of affairs where the initial settings of the object dump flags could
be trusted.  This might be expensive though; and it would certainly add
a good chunk of work to the race-condition window where we've taken
pg_dump's transaction snapshot but not yet acquired lock on any of
the tables.

Thoughts?  Anybody have another idea about what to do about this?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: PostgreSQL 9.5.0 regress tests fails with Python 3.5.0
Next
From: Tom Lane
Date:
Subject: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+