pg_dump versus views and opclasses - Mailing list pgsql-hackers

From Brendan Jurd
Subject pg_dump versus views and opclasses
Date
Msg-id 37ed240d0901151305k18bd98aciabf78a7207dbab33@mail.gmail.com
Whole thread Raw
Responses Re: pg_dump versus views and opclasses  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hey all,

I recently had pg_dump produce a non-restorable dump for one of my
databases.  I can't share the dump itself, but I can describe what
went wrong.

This database had a user-defined composite type in it.  I wanted to be
able to GROUP BY this type, so I added a family of comparison
functions and operators, and created an opclass for btree.  Then I
created a view which grouped on the composite type.  All was well.

Then I tried to create a copy of the database.  Nothing out of the
ordinary, I just did a pg_dump and piped the results into a psql
connecting to a different database name.  The restore failed on
creation of the view; the opclass hadn't been created yet and
therefore the view couldn't figure out how to group on that composite
type.

I had a closer look at the output of pg_dump, and yes indeed, the
opclass and all its operators were listed after the view which needed
them to live.

Then I poked around in the pg_dump source code to get a better feel
for how it orders objects.  That lead me to poke around in pg_depend.
A couple of observations that might be relevant to this case:
* It seems there's no pg_depend entry for
types/functions/operators/opclasses that the view depends on, unless
they are part of the SELECT list.  I don't yet know enough about the
dependency system to understand if this is really a defect (and if so
whether there is a pratical fix for it), but it does seem a bit
limited.  If, for example, you call a user-defined operator in the
WHERE clause of a view, Postgres has no idea about that dependency.
* pg_dump queries the information from pg_depend sorting first by
classid -- the oid of the system catalog the dependent object lives
in, then by the oid of the dependent object.  I ran a similar query in
psql and the results were reminiscent of the order of objects produced
by pg_dump.  pg_opclass and pg_operator have larger oids than
pg_class, and in the absence of dependency information to guide it,
pg_dump output the view (pg_class) first, followed by the opclass.
Coincidence?

Assuming for the moment that I'm on the right track with these
hunches, it seems that pg_dump needs some help.  We could try to make
the dependency information about views more complete, in which case
everything should Just Work in pg_dump.  If that's not practical (and
perhaps even if it is practical) we could also try to give opclasses
and operators a higher default priority in the dump order than views.
This seems logical since a view can depend on an opclass, but an
opclass can't depend on a view.  At least, not directly.

So, am I on to something here, or is it just the ridiculously hot .au
weather talking?

If it would be helpful, I could write up a proper minimal test case to
demonstrate the error.  Let me know.

Cheers,
BJ


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Updated backslash consistency patch
Next
From: Robert Haas
Date:
Subject: Re: FWD: Re: Updated backslash consistency patch