Thread: pg_dump tries to do too much per query

pg_dump tries to do too much per query

From
Tom Lane
Date:
I was experimenting today with pg_dump's reaction to missing
dependencies, such as a rule that refers to a no-longer-existing
table.  It's pretty bad.  For example:

create table test (f1 int);
create view v_test as select f1+1 as f11 from test;
drop table test;

then run pg_dump:

getTables(): SELECT failed.  Explanation from backend: 'ERROR:  cache lookup of attribute 1 in relation 400384 failed
'.

This is just about entirely useless as an error message, wouldn't you
say?

The immediate cause of this behavior is the initial data fetch in
getTables():
   appendPQExpBuffer(query,              "SELECT pg_class.oid, relname, relkind, relacl, usename, "
"relchecks,reltriggers, relhasindex, pg_get_viewdef(relname) as viewdef "                     "from pg_class, pg_user "
                   "where relowner = usesysid and relname !~ '^pg_' "                     "and relkind in ('%c', '%c',
'%c')"                     "order by oid",               RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW);
 
   res = PQexec(g_conn, query->data);   if (!res ||       PQresultStatus(res) != PGRES_TUPLES_OK)   {
fprintf(stderr,"getTables(): SELECT failed.  Explanation from backend: '%s'.\n",
PQerrorMessage(g_conn));      exit_nicely(g_conn);   }
 

This can be criticized on a couple of points:

1. It invokes pg_get_viewdef() on every table and sequence, which is a
big waste of time even when it doesn't fail outright.  When it does fail
outright, as above, you have no way to identify which view it failed
for.  pg_get_viewdef() should be invoked retail, for one view at a time,
and only for things you have determined are indeed views.

2. As somebody pointed out a few days ago, pg_dump silently loses tables
whose owners can't be identified.  The cause is the inner join being
done here against pg_user --- pg_dump will never even notice that a
table exists if there's not a matching pg_user row for it.  This is not
robust.

You should be able to fix the latter problem by doing an outer join,
though it doesn't quite work yet in current sources.  pg_get_userbyid()
offers a different solution, although it won't return NULL for unknown
IDs, which might be an easier failure case to check for.

More generally I think there are comparable problems elsewhere in
pg_dump, caused by trying to do too much per query and not thinking
about what will happen if there's a failure.  It looks like the join-
against-pg_user problem exists for all object types, not just tables.
It'd be worth examining all the queries closely with an eye to failure
modes and whether you can give a usefully specific error message when
something is wrong.
        regards, tom lane


Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 16:29 17/09/00 -0400, Tom Lane wrote:
>
>getTables(): SELECT failed.  Explanation from backend: 'ERROR:  cache
lookup of attribute 1 in relation 400384 failed
>'.
>
>This is just about entirely useless as an error message, wouldn't you
>say?

I agree but the is representative of the error handling throughout pg_dump
(eg. the notorious 'you are hosed' error message). Over time I will try to
clean it up where possible.

There are a number of different kinds of errors to deal with; ones
resulting a corrupt database seem to be low on the list. In fact I would
argue that 'DROP TABLE' should not work on a view relation. Secondly, your
comments probably highlight the need for a database verification utility.

That said, I will address your posints below.


>
>1. It invokes pg_get_viewdef() on every table and sequence, which is a
>big waste of time even when it doesn't fail outright. 

Either pg_get_viewdef is a lot less efficient that I expected, or this is
an exageration. If it helps, I can replace it with a case statement:
  case       when relkind='v' then pg_get_viewdef()       else ''  end

but this seems a little pointless and won't prevent errors when the db is
corrupt.

Being forced to break up SQL statements because the backend produces
unclear errors from a function seems to be a case of the tail wagging the
dog: perhaps pg_get_viewdef should at least identify itself as the source
of the error, if that is what is happening.


> When it does fail
>outright, as above, you have no way to identify which view it failed
>for.

Good point. This is going to affect anybody who calls get_viewdef. Maybe it
can be modified to indicate (a) that the error occurred in get_viewdef, and
(b) which view is corrupt.

Try:
   select * from pg_views;

Same error.


>pg_get_viewdef() should be invoked retail, for one view at a time,
>and only for things you have determined are indeed views.

Do you truly, ruly believe the first part?


>2. As somebody pointed out a few days ago, pg_dump silently loses tables
>whose owners can't be identified.  The cause is the inner join being
>done here against pg_user --- pg_dump will never even notice that a
>table exists if there's not a matching pg_user row for it.  This is not
>robust.
>
>You should be able to fix the latter problem by doing an outer join,
>though it doesn't quite work yet in current sources.  pg_get_userbyid()
>offers a different solution, although it won't return NULL for unknown
>IDs, which might be an easier failure case to check for.

This sounds sensible; and I think you are right - pg_dump crosses with user
info relations all the time. I'll look at using pg_get_userbyid, LOJ and/or
column selects now that they are available.

Based on this suggestion, maybe pg_get_viewdef should return NULL if the
view table does not exist. But I would still prefer a meaningful error
message, since it really does reflect DB corruption.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 12:48 18/09/00 +1000, Philip Warner wrote:
>>
>>You should be able to fix the latter problem by doing an outer join,
>>though it doesn't quite work yet in current sources.  pg_get_userbyid()
>>offers a different solution, although it won't return NULL for unknown
>>IDs, which might be an easier failure case to check for.
>
>This sounds sensible; and I think you are right - pg_dump crosses with user
>info relations all the time. I'll look at using pg_get_userbyid, LOJ and/or
>column selects now that they are available.
>

I've just made these changes, and will commit them once I put in some
warnings about NULL usernames - my guess is that this should not stop
pg_dump, just warn the user.

It's a real pity about pg_get_userbyid - the output for non-existant users
is pretty near useless. I presume it's there for convenience of a specific
piece of code.

Would you see any value in creating a pg_get_usernamebyid() or similar that
returns NULL when there is no match?

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 16:29 17/09/00 -0400, Tom Lane wrote:
>As somebody pointed out a few days ago, pg_dump silently loses tables
>whose owners can't be identified.  

This is now fixed in CVS. The owner of all objects are now retrieved by
using column select expressions.

If you can recall where it was, I'd be interested in seeing the report
since I have no recollection of it, and I try to keep abreast of pg_dump
issues...



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 16:29 17/09/00 -0400, Tom Lane wrote:
>
>create table test (f1 int);
>create view v_test as select f1+1 as f11 from test;
>drop table test;
>
>then run pg_dump:
>
>getTables(): SELECT failed.  Explanation from backend: 'ERROR:  cache
lookup of attribute 1 in relation 400384 failed
>'.
>

FWIW, the patch below causes get_viewdef to produce:
   ERROR:  pg_get_viewdef: cache lookup of attribute 1 in relation 19136
failed for rule v_test

when a table has been deleted.

----------------------------------------------------
--- ruleutils.c.orig    Wed Sep 13 22:08:04 2000
+++ ruleutils.c    Mon Sep 18 20:59:25 2000
@@ -72,6 +72,7 @@ * ---------- */static char *rulename = NULL;
+static char *toproutinename = NULL;static void *plan_getrule = NULL;static char *query_getrule = "SELECT * FROM
pg_rewriteWHERE rulename = $1";static void *plan_getview = NULL;
 
@@ -134,6 +135,12 @@    int            len;    /* ----------
+     * We use this in reporting errors.
+     * ----------
+     */
+    toproutinename = "pg_get_ruledef";
+
+    /* ----------     * We need the rules name somewhere deep down: rulename is global     * ----------     */
@@ -234,6 +241,12 @@    char       *name;    /* ----------
+     * We use this in reporting errors.
+     * ----------
+     */
+    toproutinename = "pg_get_viewdef";
+
+    /* ----------     * We need the view name somewhere deep down     * ----------     */
@@ -337,6 +350,13 @@    char       *sep;    /* ----------
+     * We use this in reporting errors.
+     * ----------
+     */
+    toproutinename = "pg_get_indexdef";
+    rulename = NULL;
+
+    /* ----------     * Connect to SPI manager     * ----------     */
@@ -554,6 +574,13 @@    Form_pg_shadow user_rec;    /* ----------
+     * We use this in reporting errors.
+     * ----------
+     */
+    toproutinename = "pg_get_userbyid";
+    rulename = NULL;
+
+    /* ----------     * Allocate space for the result     * ----------     */
@@ -2014,8 +2041,16 @@                                 ObjectIdGetDatum(relid), (Datum) attnum,
       0, 0);    if (!HeapTupleIsValid(atttup))
 
-        elog(ERROR, "cache lookup of attribute %d in relation %u failed",
-             attnum, relid);
+    {
+        if (rulename != NULL)
+        {
+            elog(ERROR, "%s: cache lookup of attribute %d in relation %u failed for
rule %s",
+                toproutinename, attnum, relid, rulename);
+        } else {
+            elog(ERROR, "%s: cache lookup of attribute %d in relation %u failed",
+                toproutinename, attnum, relid);
+        }
+    }    attStruct = (Form_pg_attribute) GETSTRUCT(atttup);    return pstrdup(NameStr(attStruct->attname));

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump tries to do too much per query

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> FWIW, the patch below causes get_viewdef to produce:
>     ERROR:  pg_get_viewdef: cache lookup of attribute 1 in relation 19136
> failed for rule v_test
> when a table has been deleted.

Not much of a solution --- or do you propose to go through and hack up
every elog in every routine that could potentially be called during
pg_get_ruledef?

The reason that changing pg_dump is a superior solution for this problem
is that there's only one place to change, not umpteen dozen ...
        regards, tom lane


Re: Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 11:37 18/09/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> FWIW, the patch below causes get_viewdef to produce:
>>     ERROR:  pg_get_viewdef: cache lookup of attribute 1 in relation 19136
>> failed for rule v_test
>> when a table has been deleted.
>
>Not much of a solution --- or do you propose to go through and hack up
>every elog in every routine that could potentially be called during
>pg_get_ruledef?

Well, I had assumed that most routines/subsystems would identify themselves
as sources of errors, and that the behaviour of pg_get_viewdef was unusual.
Even the existing code for get_viewdef tries to output it's name in most
places, just not all.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 11:37 18/09/00 -0400, Tom Lane wrote:
>
>The reason that changing pg_dump is a superior solution for this problem
>is that there's only one place to change, not umpteen dozen ...
>

Well at least two, unless you like the following:
   zzz=# select * from pg_views;   ERROR:  cache lookup of attribute 1 in relation 3450464 failed


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 10:42 19/09/00 +1000, Philip Warner wrote:
>At 11:37 18/09/00 -0400, Tom Lane wrote:
>>
>>The reason that changing pg_dump is a superior solution for this problem
>>is that there's only one place to change, not umpteen dozen ...
>>
>
>Well at least two, unless you like the following:
>
>    zzz=# select * from pg_views;
>    ERROR:  cache lookup of attribute 1 in relation 3450464 failed
>

Apologies - I just noticed you fixed this in CVS, so it now manages
(somehow!) to output a valid view definition even without the underlying
table. A little scary, though.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Cascade delete views?

From
Philip Warner
Date:
Doing the following:
   create table test (f1 int);   create view v_test as select f1+1 as f11 from test;   drop table test;

then selecting from the view results in:
   ERROR:  Relation 'test' does not exist

which is fine.

Then I try:
   create table test (f1 int);   select * from v_test;

and I get:
   ERROR:  has_subclass: Relation 19417 not found

which not very helpful for the person who does not know the history, and
leads me to believe that there may be a few issues here. 

Should a 'DROP TABLE' drop the views, fail, or be recoverable from by
recreating the table?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Cascade delete views?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>     create table test (f1 int);
>     create view v_test as select f1+1 as f11 from test;
>     drop table test;
>     create table test (f1 int);
>     select * from v_test;
>     ERROR:  has_subclass: Relation 19417 not found

> which not very helpful for the person who does not know the history, and
> leads me to believe that there may be a few issues here. 

Yes, this mistake needs to be detected earlier.  The stored view
contains both the name and the OID of the referenced table.  It should
*not* accept a new table with same name and different OID, since there's
no guarantee that the new table has anything like the same column set.
(ALTER TABLE has some issues here too...)

> Should a 'DROP TABLE' drop the views, fail, or be recoverable from by
> recreating the table?

Yes ;-).

Any of those behaviors would be better than what we have now.  However,
none of them is going to be easy to implement.  There will need to be
more info stored about views than there is now.
        regards, tom lane


Re: Re: pg_dump tries to do too much per query

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>>> The reason that changing pg_dump is a superior solution for this problem
>>> is that there's only one place to change, not umpteen dozen ...
>> 
>> Well at least two, unless you like the following:
>> 
>> zzz=# select * from pg_views;
>> ERROR:  cache lookup of attribute 1 in relation 3450464 failed

> Apologies - I just noticed you fixed this in CVS, so it now manages
> (somehow!) to output a valid view definition even without the underlying
> table. A little scary, though.

Say what?  (... tries it ...)  Fascinating.  I wouldn't rely on this
behavior however; the fact that it works today is a totally unintended
consequence of a change I made for column alias support.  Next week
ruleutils.c might try to access the underlying tables again.

The general issue still remains: if a database contains an inconsistency
or error, introduced by whatever means (and there'll always be bugs),
a pg_dump failure is likely to be the first notice a dbadmin has about it.
So it behooves us to make sure that pg_dump issues error messages that
are as specific as possible.  In particular, if there is a specific
object such as a view or rule that's broken, pg_dump should take care
that it can finger that particular object, not have to report a generic
"SELECT failed" error message.

This problem has been around for a long time, of course, but now that
we have someone who's taking an active interest in fixing pg_dump ;-)
I'm hoping something will get done about it...
        regards, tom lane


Re: Re: Cascade delete views?

From
Stephan Szabo
Date:
On Tue, 19 Sep 2000, Tom Lane wrote:

> Yes, this mistake needs to be detected earlier.  The stored view
> contains both the name and the OID of the referenced table.  It should
> *not* accept a new table with same name and different OID, since there's
> no guarantee that the new table has anything like the same column set.
> (ALTER TABLE has some issues here too...)
> 
> > Should a 'DROP TABLE' drop the views, fail, or be recoverable from by
> > recreating the table?
> 
> Yes ;-).
> 
> Any of those behaviors would be better than what we have now.  However,
> none of them is going to be easy to implement.  There will need to be
> more info stored about views than there is now.

This is an example of a place where the dependencies chart will come
in handy. :)  I do actually hope to get to it (if noone else does it)
after my work job has their official release and I get a chance to take
time off and after I've figured out match partial for the referential
integrity stuff (which is more of a pain than I could have ever imagined).



RE: Re: pg_dump tries to do too much per query

From
Matthew
Date:
......

> The general issue still remains: if a database contains an inconsistency
> or error, introduced by whatever means (and there'll always be bugs),
> a pg_dump failure is likely to be the first notice a dbadmin has about it.
> So it behooves us to make sure that pg_dump issues error messages that
> are as specific as possible.  In particular, if there is a specific
> object such as a view or rule that's broken, pg_dump should take care
> that it can finger that particular object, not have to report a generic
> "SELECT failed" error message.
> 
> This problem has been around for a long time, of course, but now that
> we have someone who's taking an active interest in fixing pg_dump ;-)
> I'm hoping something will get done about it...
> 
>             regards, tom lane`-----Original Message-----From:    Tom Lane [SMTP:tgl@sss.pgh.pa.us]Sent:    Tuesday,
September19, 2000 11:13 AMTo:    Philip WarnerCc:    pgsql-hackers@postgresql.orgSubject:    Re: [HACKERS] Re: pg_dump
triesto do too much per
 
query 
I can't agree with this more...  On several occasions I have had
databases that apparently were working fine with no problems, and the only
indication of the problem was that pg_dump would fail.  while this is nice
in that I now know there is a problem, I had very small info on where to
start looking.


Re: Re: pg_dump tries to do too much per query

From
Philip Warner
Date:
At 12:13 19/09/00 -0400, Tom Lane wrote:
>
>Say what?  (... tries it ...)  Fascinating.  I wouldn't rely on this
>behavior however; the fact that it works today is a totally unintended
>consequence of a change I made for column alias support.  Next week
>ruleutils.c might try to access the underlying tables again.

In this particular instance, I'd still be much happier to see the code that
fails identify itself properly, identify the cause, and explain the error.
In the case in point, all of this information is available. I always prefer
to fix two bugs with one code change where possible (ie. 'select * from
pg_views' as well as pg_dump).

To carry this to extremes, at one point pg_dump crosses pg_index, pg_class
and pg_am somewhere else it crosses pg_rewrite, pg_class, and pg_rules etc.

If we want to allow for database corruption, pg_dump should really be doing
this sort of cross internally to verify that each part is present (or use
LOJ, and test for NULLs - but you hinted that there was a problem with this).


>This problem has been around for a long time, of course, but now that
>we have someone who's taking an active interest in fixing pg_dump ;-)
>I'm hoping something will get done about it...

Part of my interest in pg_dump is in trying to make it know less about the
database structure, and so make it a lot more independant of versions. If I
could use one pg_* view or function for each entity type, I'd be very
happy. The not-so-recent discussions about definition schemas should
indicate where pg_dump wants to go, and the growing use of functions like
'format_type' make it increasingly hard to know why an error occurred - I
haven't looked to see what format_type does when it encounters an internal
inconsistency, but I hope it returns NULL.

[Thought: What would be the impact of pg_get_ruledef etc returning NULL
when it can't find the relevant data? This sort of approach seems to me
more consistent with DB-ish things, and pg_dump could easily test for a
NULL defn of a view relation]

I'd prefer to see a  'pg_verify' (or similar). The idea being that it would
know about and be totally anal about interrelationships - and successfully
report a missing view table - unlike the current (or proposed) situation.
In some ways, like a vacuum, but on metadata.

I'm not toally opposed to your suggestion, and I certainly agree that
making pg_dump more carefull about the data it retrieves is a good idea.
But I still need more convincing about the calls to get_viewdef. 

In the current situation, I think ruleutils.c might need to be looked at
more closely: eg. several error messages prepend the main routine name,
some do not. Some display the problem rule name, other do not. There are
only three or four external routines defined, so by using (more) globals it
is easily possible for all routines to indicate the primary routine
responsible. Similarly, the rule/view name is already a global, and is
available to all routines in ruleutils.c.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump tries to do too much per query

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> It's a real pity about pg_get_userbyid - the output for non-existant users
> is pretty near useless. I presume it's there for convenience of a specific
> piece of code.

No, I imagine it's just that way because it seemed like a good idea at
the time.  Returning NULL for an unknown userid seems at least as
plausible as what it does now.  Comments anyone?
        regards, tom lane


Re: Cascade delete views?

From
Peter Eisentraut
Date:
Philip Warner writes:

> Doing the following:
> 
>     create table test (f1 int);
>     create view v_test as select f1+1 as f11 from test;
>     drop table test;
> 
> then selecting from the view results in:
> 
>     ERROR:  Relation 'test' does not exist
> 
> which is fine.

If you peak into the standard, all DROP commands have a trailing
RESTRICT/CASCADE (mandatory, btw.), which will tell what to do. But it's
extremely hard to implement and keep up to date.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/