Thread: pg_dump 8.0.3 failing against PostgreSQL 7.3.2
I am experiencing a problem with dumping one specific database on a cluster. All 143 other databases dump without giving errors. The server runs PostgreSQL 7.3.2 on OpenBSD (I know :). pg_dump is version 8.0.3 on Windows (upgraded from 8.0.1 which had the same problem). The error message is: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: Cannot cast type "unknown" to text pg_dump: The command was: SELECT proretset, prosrc, probin, null::text as proargnames, provolatile, proisstrict, prosecdef, (SELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) as lanname FROM pg_catalog.pg_proc WHERE oid = '8331054'::pg_catalog.oid Below I have copy pasted the console log which has some additional information. This bug, or a related one, seems to have been registered previously as bug #1455 http://archives.postgresql.org/pgsql-bugs/2005-02/msg00021.php Jochem > C:\Program Files\PostgreSQL\8.0\bin>psql --version > psql (PostgreSQL) 8.0.3 > > C:\Program Files\PostgreSQL\8.0\bin>pg_dump wedstrijdzeilen > "z:\backup\databases\2005-06-06\wedstrijdzeilen.sql" > Password: > pg_dump: SQL command failed > pg_dump: Error message from server: ERROR: Cannot cast type "unknown" to text > pg_dump: The command was: SELECT proretset, prosrc, probin, null::text as proargnames, provolatile, proisstrict, prosecd > ef, (SELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) as lanname FROM pg_catalog.pg_proc WHERE oid = '833 > 1054'::pg_catalog.oid > > C:\Program Files\PostgreSQL\8.0\bin>psql wedstrijdzeilen > Password: > Welcome to psql 8.0.3, the PostgreSQL interactive terminal. > > Type: \copyright for distribution terms > \h for help with SQL commands > \? for help with psql commands > \g or terminate with semicolon to execute query > \q to quit > > Warning: Console code page (437) differs from Windows code page (1252) > 8-bit characters may not work correctly. See psql reference > page "Notes for Windows users" for details. > > oli=> select version(); > version > --------------------------------------------------------------------- > PostgreSQL 7.3.2 on i386-unknown-openbsd3.2, compiled by GCC 2.95.3 > (1 row) > > wedstrijdzeilen=> SELECT proretset, prosrc, probin, null::text as proargnames, provolatile, proisstrict, prosecdef, (SEL > ECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) as lanname FROM pg_catalog.pg_proc WHERE oid = '8331054'::p > g_catalog.oid; > proretset | > > prosrc > > | probin | proargnames | provolatile | proisstrict > | prosecdef | lanname > -----------+------------------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------------------------------ > ----------------------------------------------------------------------+--------+-------------+-------------+------------ > -+-----------+--------- > t | > SELECT j.parent_id as id, > (SELECT singular FROM dswz.translations WHERE id = t.name AND language_id = (SELECT dswz.check_creator_i > d ($1, $2, $3, $4))), > street, > house_number, > zipcode, > city, > (SELECT singular FROM dswz.translations WHERE id = nations.name AND language_id = (SELECT dswz.check_cre > ator_id ($1, $2, $3, $4))) > FROM dswz.view_addresses a LEFT JOIN dswz.nations ON country = nations.id, > dswz.joins($1, $2,$3,$4) j, > dswz.object_types t > WHERE j.child_id = a.id > AND t.id = a.type > AND j.cancel_time > now(); > | - | | v | f | t | sql > (1 row)
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > Below I have copy pasted the console log which has some > additional information. This bug, or a related one, seems to have > been registered previously as bug #1455 > http://archives.postgresql.org/pgsql-bugs/2005-02/msg00021.php Yeah. We never heard back from that person, so I thought he'd resolved his problem, but maybe there's something in common. It's *really* odd that you can do the same command by hand and it doesn't fail. Just to get the Windows version out of the loop: if you run 7.3's own pg_dump directly on the BSD box, does it fail the same way? The only way I can think of to get more info is to try to get a stack trace for the error --- that is, attach to the backend process with gdb, set a breakpoint at elog, let it run till the error occurs, and then "bt" to see how control got to the failure. Can you do that? It will help if you do something like export PGOPTIONS="-W 30" on the client side before starting pg_dump --- that will provide a 30-second delay during connection to give you time to identify the connected backend and attach to it with gdb. regards, tom lane
Tom Lane wrote: > Jochem van Dieten <jochemd@oli.tudelft.nl> writes: >> >> Below I have copy pasted the console log which has some >> additional information. This bug, or a related one, seems to have >> been registered previously as bug #1455 >> http://archives.postgresql.org/pgsql-bugs/2005-02/msg00021.php > > Yeah. We never heard back from that person, so I thought he'd resolved > his problem, but maybe there's something in common. > > It's *really* odd that you can do the same command by hand and it > doesn't fail. Just to get the Windows version out of the loop: > if you run 7.3's own pg_dump directly on the BSD box, does it fail > the same way? That fails due to a pg_dump issue that was fixed in 7.4: http://archives.postgresql.org/pgsql-general/2003-01/msg00006.php > The only way I can think of to get more info is to try to get a > stack trace for the error --- that is, attach to the backend process > with gdb, set a breakpoint at elog, let it run till the error occurs, > and then "bt" to see how control got to the failure. Can you do that? OpenBSD man pages are said to be very good :) > It will help if you do something like > export PGOPTIONS="-W 30" > on the client side before starting pg_dump --- that will provide a > 30-second delay during connection to give you time to identify the > connected backend and attach to it with gdb. On Windows that would be 'set PGOPTIONS="-W 30"'. Once I do that, I get the following error: > C:\Program Files\PostgreSQL\8.0\bin>set PGOPTIONS="-W 30" > > "z:\backup\databases\2005-06-06\wedstrijdzeilen.sql" > WARNING: postgres: invalid command line arguments > Try -? for help. > pg_dump: [archiver (db)] connection to database "wedstrijdzeilen" failed: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. (pg_dump -? appears to suggest that -W is used for the "force password prompt" connection option.) Exclusively locking pg_proc from another connection is also a nice way to slow down pg_dump :) pgsql@hector:/usr/local/pgsql/bin> gdb attach 7761 warning: failed to install memory consistency checks; configuration should define NO_MMCHECK or MMCHECK_FORCE GNU gdb 4.16.1 Copyright 1996 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-openbsd3.2"... attach: No such file or directory. /usr/local/pgsql/bin/7761: No such file or directory. Attaching to process 7761 0x403ca553 in ?? () (gdb) symbol-file postmaster Reading symbols from postmaster...done. (gdb) break elog Breakpoint 1 at 0x16d8f8 (gdb) bt #0 0x403ca553 in ?? () from /usr/lib/libc.so.28.5 #1 0x10e604 in mdread () #2 0x10f31f in smgrread () #3 0x103827 in ReadBufferInternal () #4 0x1035ea in ReadBuffer () #5 0x16fc5 in _bt_getbuf () #6 0x19da0 in _bt_step () #7 0x1941d in _bt_next () #8 0x17573 in btgettuple () #9 0x170523 in FunctionCall2 () #10 0x1330d in index_getnext () #11 0x12dac in systable_getnext () #12 0xa5a3b in RelationBuildTriggers () #13 0x167849 in RelationBuildDesc () #14 0x168837 in RelationIdGetRelation () #15 0xd378 in relation_open () #16 0x992d8 in LockTableCommand () #17 0x114c11 in ProcessUtility () #18 0x110a3a in pg_exec_query_string () #19 0x112a91 in PostgresMain () #20 0xf4eae in DoBackend () #21 0xf463d in BackendStartup () #22 0xf3040 in ServerLoop () #23 0xf2502 in PostmasterMain () #24 0xc9926 in main () (gdb) detach Detaching from program: process 7761 (gdb) quit pgsql@hector:/usr/local/pgsql/bin> Jochem
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > I get the following error: >> C:\Program Files\PostgreSQL\8.0\bin>set PGOPTIONS="-W 30" >> >> "z:\backup\databases\2005-06-06\wedstrijdzeilen.sql" >> WARNING: postgres: invalid command line arguments Hm, that's odd ... but never mind, you thought of an alternative. > (gdb) break elog > Breakpoint 1 at 0x16d8f8 > (gdb) bt > #0 0x403ca553 in ?? () from /usr/lib/libc.so.28.5 > #1 0x10e604 in mdread () > #2 0x10f31f in smgrread () You forgot to "continue" until the breakpoint is reached --- this trace just indicates where the backend happened to be when you stopped it by attaching. regards, tom lane
Tom Lane wrote: > Jochem van Dieten <jochemd@oli.tudelft.nl> writes: >> >> (gdb) break elog >> Breakpoint 1 at 0x16d8f8 >> (gdb) bt >> #0 0x403ca553 in ?? () from /usr/lib/libc.so.28.5 >> #1 0x10e604 in mdread () >> #2 0x10f31f in smgrread () > > You forgot to "continue" until the breakpoint is reached --- this trace > just indicates where the backend happened to be when you stopped it by > attaching. That's different from backtracing a core dump :) This better? Attaching to process 22733 0x403827df in ?? () (gdb) symbol-file postmaster Reading symbols from postmaster...done. (gdb) break elog Breakpoint 1 at 0x16d8f8 (gdb) continue Continuing. Breakpoint 1, 0x16d8f8 in elog () (gdb) bt #0 0x16d8f8 in elog () #1 0x110abb in pg_exec_query_string () #2 0x112a91 in PostgresMain () #3 0xf4eae in DoBackend () #4 0xf463d in BackendStartup () #5 0xf3040 in ServerLoop () #6 0xf2502 in PostmasterMain () #7 0xc9926 in main () (gdb) detach Jochem
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > That's different from backtracing a core dump :) This better? > Breakpoint 1, 0x16d8f8 in elog () > (gdb) bt > #0 0x16d8f8 in elog () > #1 0x110abb in pg_exec_query_string () > #2 0x112a91 in PostgresMain () > #3 0xf4eae in DoBackend () > #4 0xf463d in BackendStartup () > #5 0xf3040 in ServerLoop () > #6 0xf2502 in PostmasterMain () > #7 0xc9926 in main () Drat. I forgot that in 7.3, elog will be entered even if the message ends up being too low-priority to be output. What you've got here is just a routine elog(DEBUG) or elog(LOG) call ... and there are probably a whole bunch more that will happen before the one with ERROR. [ studies 7.3 code a bit ] It might work better to set the breakpoint at elog_message_prefix, assuming you've got logging dialed down to the point where only actual ERRORs go to the log. However, I'm not certain gdb will let you do that in a backend compiled without debug support; it may not know about elog_message_prefix since that's just a static routine. Give it a try and see. If that doesn't work ... can anyone think of another way to isolate the elog(ERROR) call in a backend compiled without debug support? regards, tom lane
Tom Lane wrote: > > [ studies 7.3 code a bit ] It might work better to set the breakpoint at > elog_message_prefix, assuming you've got logging dialed down to the > point where only actual ERRORs go to the log. Attaching to process 10284 0x403827df in ?? () (gdb) symbol-file postmaster Reading symbols from postmaster...done. (gdb) break elog_message_prefix Breakpoint 1 at 0x16e73f (gdb) continue Continuing. Breakpoint 1, 0x16e73f in elog_message_prefix () (gdb) bt #0 0x16e73f in elog_message_prefix () #1 0x16da26 in elog () #2 0x82b3d in typecast_expression () #3 0x818d9 in transformExpr () #4 0x89d4d in transformTargetEntry () #5 0x8a021 in transformTargetList () #6 0x3cb78 in transformSelectStmt () #7 0x3ab6f in transformStmt () #8 0x3a79c in parse_analyze () #9 0x110574 in pg_analyze_and_rewrite () #10 0x110923 in pg_exec_query_string () #11 0x112a91 in PostgresMain () #12 0xf4eae in DoBackend () #13 0xf463d in BackendStartup () #14 0xf3040 in ServerLoop () #15 0xf2502 in PostmasterMain () #16 0xc9926 in main () (gdb) detach Detaching from program: process 10284 (gdb) quit pgsql@hector:/usr/local/pgsql/bin> Jochem
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > (gdb) bt > #0 0x16e73f in elog_message_prefix () > #1 0x16da26 in elog () > #2 0x82b3d in typecast_expression () > #3 0x818d9 in transformExpr () > #4 0x89d4d in transformTargetEntry () > #5 0x8a021 in transformTargetList () > #6 0x3cb78 in transformSelectStmt () > #7 0x3ab6f in transformStmt () > #8 0x3a79c in parse_analyze () Well, that trace makes it look like it's unhappy about the "null::text" in the command, because there is no other typecast in the SELECT target statement. Looking at the 7.3 code, the only very plausible reason for the failure is if either "unknown" or "text" has disappeared from pg_type, so that one of the typeidIsValid tests in can_coerce_type fails. But that doesn't explain why you don't see the failure interactively --- seems like "select null::text" should always fail in that database, if that's where the problem is. I confess to bewilderment ... anyone have a clue? regards, tom lane
Tom Lane wrote: > > Well, that trace makes it look like it's unhappy about the "null::text" > in the command, because there is no other typecast in the SELECT target > statement. Looking at the 7.3 code, the only very plausible reason for > the failure is if either "unknown" or "text" has disappeared from > pg_type, so that one of the typeidIsValid tests in can_coerce_type > fails. But that doesn't explain why you don't see the failure > interactively --- seems like "select null::text" should always fail in > that database, if that's where the problem is. > > I confess to bewilderment ... anyone have a clue? If I check pg_type in the database with the problem for typname unknown or text I get 3 rows. In other databases I get only 2 rows. Jochem Problem database: > wedstrijdzeilen=> select * from pg_type where typname in ('unknown','text') order by typname; > typname | typnamespace | typowner | typlen | typbyval | typtype | typisdefined | typdelim | typrelid | typelem | typinput | typoutput | t > ypalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typdefaultbin | typdefault > ---------+--------------+----------+--------+----------+---------+--------------+----------+----------+---------+-----------+------------+-- > --------+------------+------------+-------------+-----------+----------+---------------+------------ > text | 11 | 1 | -1 | f | b | t | , | 0 | 0 | textin | textout | i > | x | f | 0 | -1 | 0 | | > text | 7965528 | 152 | 4 | t | c | t | , | 8330945 | 0 | record_in| record_out | i > | p | f | 0 | -1 | 0 | | > unknown | 11 | 1 | -1 | f | b | t | , | 0 | 0 | unknownin| unknownout | i > | p | f | 0 | -1 | 0 | | > (3 rows) > > wedstrijdzeilen=> select * from pg_class where oid = 8330945; > relname | relnamespace | reltype | relowner | relam | relfilenode | relpages | reltuples | reltoastrelid | reltoastidxid| relhasindex | re > lisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules| relhassub > class | relacl > ---------+--------------+---------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+--- > ----------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+---------- > ------+---------------------------------------------------------- > text | 7965528 | 8330946 | 152 | 0 | 8330945 | 0 | 0 | 8330947 | 0 | t | f > | r | 2 | 0 | 0 | 0 | 0 | 0 | f | t | f | f > | {=,wedstrijdzeilen_admin=arwdRxt,wedstrijdzeilen_user=r} > (1 row) Reference database: > jochemd=> select * from pg_type where typname in ('unknown','text') order by typname; > typname | typnamespace | typowner | typlen | typbyval | typtype | typisdefined | typdelim | typrelid | typelem | typinput | typoutput | t > ypalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typdefaultbin | typdefault > ---------+--------------+----------+--------+----------+---------+--------------+----------+----------+---------+-----------+------------+-- > --------+------------+------------+-------------+-----------+----------+---------------+------------ > text | 11 | 1 | -1 | f | b | t | , | 0 | 0 | textin | textout | i > | x | f | 0 | -1 | 0 | | > unknown | 11 | 1 | -1 | f | b | t | , | 0 | 0 | unknownin| unknownout | i > | p | f | 0 | -1 | 0 | | > (2 rows)
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > Tom Lane wrote: >> Well, that trace makes it look like it's unhappy about the "null::text" >> in the command, because there is no other typecast in the SELECT target >> statement. > Problem database: >> wedstrijdzeilen=> select * from pg_type where typname in ('unknown','text') order by typname; >> typname | typnamespace | typowner | typlen | typbyval | typtype | typisdefined | typdelim | typrelid | typelem | typinput | typoutput | t >> ypalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typdefaultbin | typdefault >> ---------+--------------+----------+--------+----------+---------+--------------+----------+----------+---------+-----------+------------+-- >> --------+------------+------------+-------------+-----------+----------+---------------+------------ >> text | 11 | 1 | -1 | f | b | t | , | 0 | 0 | textin | textout | i >> | x | f | 0 | -1 | 0 | | >> text | 7965528 | 152 | 4 | t | c | t | , | 8330945 | 0 | record_in| record_out | i >> | p | f | 0 | -1 | 0 | | [ light dawns ] You've created a table named "text", haven't you? regards, tom lane
I wrote: > [ light dawns ] You've created a table named "text", haven't you? You need this patch. Thanks for the report! regards, tom lane Index: pg_dump.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.400.4.4 diff -c -r1.400.4.4 pg_dump.c *** pg_dump.c 30 Apr 2005 08:19:44 -0000 1.400.4.4 --- pg_dump.c 7 Jun 2005 14:01:42 -0000 *************** *** 5115,5121 **** { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null::text as proargnames, " "provolatile, proisstrict, prosecdef, " "(SELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) as lanname " "FROM pg_catalog.pg_proc " --- 5115,5121 ---- { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null as proargnames, " "provolatile, proisstrict, prosecdef, " "(SELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) as lanname " "FROM pg_catalog.pg_proc " *************** *** 5126,5132 **** { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null::text as proargnames, " "case when proiscachable then 'i' else 'v' end as provolatile, " "proisstrict, " "'f'::boolean as prosecdef, " --- 5126,5132 ---- { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null as proargnames, " "case when proiscachable then 'i' else 'v' end as provolatile, " "proisstrict, " "'f'::boolean as prosecdef, " *************** *** 5139,5145 **** { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null::text as proargnames, " "case when proiscachable then 'i' else 'v' end as provolatile, " "'f'::boolean as proisstrict, " "'f'::boolean as prosecdef, " --- 5139,5145 ---- { appendPQExpBuffer(query, "SELECT proretset, prosrc, probin, " ! "null as proargnames, " "case when proiscachable then 'i' else 'v' end as provolatile, " "'f'::boolean as proisstrict, " "'f'::boolean as prosecdef, "
Tom Lane wrote: > I wrote: >> >> [ light dawns ] You've created a table named "text", haven't you? Yes, there is such a table. But even if I put the schema with that table in the search_path I can't reproduce the error from psql. > You need this patch. I prefer the interpretation "My customer needs to change his schema if he wants backups" :) I will test the patch, but it will take a few days because I haven't got the toolchain set up. Thanx, Jochem
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > Tom Lane wrote: > [ light dawns ] You've created a table named "text", haven't you? > Yes, there is such a table. But even if I put the schema with > that table in the search_path I can't reproduce the error from psql. You can if you duplicate pg_dump's search path: regression=# create schema s1 regression-# create table text (f1 int); CREATE SCHEMA regression=# set search_path = s1, pg_catalog; SET regression=# select null::text; ERROR: Cannot cast type "unknown" to text regression=# Note that the above is using a 7.3 server --- more recent servers will take this, not because they parse the "text" type reference differently, but because they don't have a problem with null composite values. > I prefer the interpretation "My customer needs to change his > schema if he wants backups" :) > I will test the patch, but it will take a few days because I > haven't got the toolchain set up. A short-term workaround is to use 7.3 or 7.4 pg_dump --- this pg_dump error was introduced in 8.0. regards, tom lane