Thread: Unable to dump database using pg_dump
I am migrating to a new server. I am trying to get a dump of my database to import into postgres 8.1.11 on my new server. When I run the pg_dump commmand: pg_dump --clean --file=madison.sql --schema=public -U madisoncounty madisoncounty I get this message: pg_dump: query to obtain list of schemas failed: ERROR: more than one row returned by a subquery used as an expression I'm pretty new to postgres, and I'm not sure how to debug / fix this issue. I did a VACUUM FULL on the server before I tried to do the dump.
Adam Dear wrote: > pg_dump: query to obtain list of schemas failed: ERROR: more than one > row returned by a subquery used as an expression Is the pg_dump command from the old postgresql install or the new one? -- Craig Ringer
The pg_dump command is from version 7.4.11. This is the version of postgres I am running on my old server. Craig Ringer wrote: > Adam Dear wrote: > >> pg_dump: query to obtain list of schemas failed: ERROR: more than one >> row returned by a subquery used as an expression > > Is the pg_dump command from the old postgresql install or the new one? > > -- > Craig Ringer >
Adam Dear <adear@usnx.net> writes: > pg_dump: query to obtain list of schemas failed: ERROR: more than one > row returned by a subquery used as an expression > The pg_dump command is from version 7.4.11. This is the version of > postgres I am running on my old server. Just FYI, when doing a version upgrade it's considered better practice to dump with the newer version's pg_dump, if possible. Sometimes the newer pg_dump knows things that will smooth the migration. However, that's not related to your immediate problem. A look at the pg_dump source code shows that the subquery it's complaining about is trying to fetch the name of each view's owner. Apparently you've got multiple rows in pg_shadow with the same usesysid --- or perhaps just one row but the index on usesysid is corrupt and is returning it more than once. Try select ctid, usename, usesysid from pg_shadow; and see if you see any duplicate usesysid's. If so, try removing the extra copies (use WHERE ctid = '...' if there isn't any other difference between the rows). If you see no dups in a query like this, then the index is at fault --- try reindexing. regards, tom lane
I ran the select as instructed, and this is the output: madisoncounty=# select ctid, usename, usesysid from pg_shadow; ctid | usename | usesysid -------+---------------+---------- (0,1) | postgres | 1 (0,2) | postgres | 1 (0,5) | madisoncounty | 100 (3 rows) I then did: delete from pg_shadow where ctid='(0,2)'; DELETE 1 I had already reindexed the tables. Now when I run the command, I get this: pg_dump: [archiver (db)] connection to database "madisoncounty" failed: FATAL: user "postgres" does not exist Also, I have been connecting to the db as the postgres user. Now I am unable to connect to the database with the postgres user, or with the actual user for the site. Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> pg_dump: query to obtain list of schemas failed: ERROR: more than one >> row returned by a subquery used as an expression > >> The pg_dump command is from version 7.4.11. This is the version of >> postgres I am running on my old server. > > Just FYI, when doing a version upgrade it's considered better practice > to dump with the newer version's pg_dump, if possible. Sometimes the > newer pg_dump knows things that will smooth the migration. However, > that's not related to your immediate problem. > > A look at the pg_dump source code shows that the subquery it's > complaining about is trying to fetch the name of each view's owner. > Apparently you've got multiple rows in pg_shadow with the same usesysid > --- or perhaps just one row but the index on usesysid is corrupt and is > returning it more than once. Try > select ctid, usename, usesysid from pg_shadow; > and see if you see any duplicate usesysid's. If so, try removing the > extra copies (use WHERE ctid = '...' if there isn't any other difference > between the rows). If you see no dups in a query like this, then the > index is at fault --- try reindexing. > > regards, tom lane >
Adam Dear <adear@usnx.net> writes: > I ran the select as instructed, and this is the output: > madisoncounty=# select ctid, usename, usesysid from pg_shadow; > ctid | usename | usesysid > -------+---------------+---------- > (0,1) | postgres | 1 > (0,2) | postgres | 1 > (0,5) | madisoncounty | 100 > (3 rows) > I then did: > delete from pg_shadow where ctid='(0,2)'; > DELETE 1 > I had already reindexed the tables. Now when I run the command, I get this: > pg_dump: [archiver (db)] connection to database "madisoncounty" failed: > FATAL: user "postgres" does not exist How annoying :-(. And I suppose madisoncounty isn't a superuser, so you're now stuck with no working superuser. What you'll need to do is shut down the database and start up a standalone backend (read the "postgres" reference page about how to work in this mode). What I'd then do is delete the other postgres row (if you can still see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB. That should get you back to having a working postgres account. It's hard to be sure how you got into this state, although one possible theory is you got burnt by not vacuuming pg_shadow on a regular basis. The newer version should keep you out of that error, if so. regards, tom lane
I really appreciate you help so far. here is what I am seeing . I did: select * from pg_shadow; 1: usename (typeid = 19, len = 64, typmod = -1, byval = f) 2: usesysid (typeid = 23, len = 4, typmod = -1, byval = t) 3: usecreatedb (typeid = 16, len = 1, typmod = -1, byval = t) 4: usesuper (typeid = 16, len = 1, typmod = -1, byval = t) 5: usecatupd (typeid = 16, len = 1, typmod = -1, byval = t) 6: passwd (typeid = 25, len = -1, typmod = -1, byval = f) 7: valuntil (typeid = 702, len = 4, typmod = -1, byval = t) 8: useconfig (typeid = 1009, len = -1, typmod = -1, byval = f) ---- 1: usename = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f) 2: usesysid = "1" (typeid = 23, len = 4, typmod = -1, byval = t) 3: usecreatedb = "t" (typeid = 16, len = 1, typmod = -1, byval = t) 4: usesuper = "t" (typeid = 16, len = 1, typmod = -1, byval = t) 5: usecatupd = "t" (typeid = 16, len = 1, typmod = -1, byval = t) ---- I'm not seeing the madisoncounty user in there. Also, I tried starting the db using /etc/init.d/postgres start, and it fails. Is that the proper way to get the service going, or should I be doing something else? Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> I ran the select as instructed, and this is the output: >> madisoncounty=# select ctid, usename, usesysid from pg_shadow; >> ctid | usename | usesysid >> -------+---------------+---------- >> (0,1) | postgres | 1 >> (0,2) | postgres | 1 >> (0,5) | madisoncounty | 100 >> (3 rows) > >> I then did: > >> delete from pg_shadow where ctid='(0,2)'; >> DELETE 1 > >> I had already reindexed the tables. Now when I run the command, I get this: > >> pg_dump: [archiver (db)] connection to database "madisoncounty" failed: >> FATAL: user "postgres" does not exist > > How annoying :-(. And I suppose madisoncounty isn't a superuser, > so you're now stuck with no working superuser. What you'll need > to do is shut down the database and start up a standalone backend > (read the "postgres" reference page about how to work in this mode). > What I'd then do is delete the other postgres row (if you can still > see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB. > That should get you back to having a working postgres account. > > It's hard to be sure how you got into this state, although one possible > theory is you got burnt by not vacuuming pg_shadow on a regular basis. > The newer version should keep you out of that error, if so. > > regards, tom lane >
Adam Dear <adear@usnx.net> writes: > I'm not seeing the madisoncounty user in there. Odder and odder. It might be worth trying "vacuum freeze pg_shadow". > Also, I tried starting > the db using /etc/init.d/postgres start, and it fails. Fails how? In particular, what shows up in the postmaster log? > Is that the > proper way to get the service going, or should I be doing something else? The usual way to manually start/stop daemons on Linux is sudo /sbin/service postgresql start sudo /sbin/service postgresql stop (omitting sudo if you're already root). I'm not sure offhand if there's any real difference between that and just calling the init.d script directly, but I believe that's how you're Supposed To Do It. regards, tom lane
I managed to get the server started again. It was failing because the permissions on the data folder were not right. Now, though, I'm back to where I started. There are two postgres users in the pg_shadow table. One of them has a password, the other doesn't, but they both have the same sysid. I tried to delete the one without the password, I couldn't get it deleted. I tried: Delete from pg_shadow where ctid = '(0,1)'; DELETE 0 and Delete from pg_shadow where passwd = ''; DELETE 0 Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> I'm not seeing the madisoncounty user in there. > > Odder and odder. It might be worth trying "vacuum freeze pg_shadow". > >> Also, I tried starting >> the db using /etc/init.d/postgres start, and it fails. > > Fails how? In particular, what shows up in the postmaster log? > >> Is that the >> proper way to get the service going, or should I be doing something else? > > The usual way to manually start/stop daemons on Linux is > > sudo /sbin/service postgresql start > sudo /sbin/service postgresql stop > > (omitting sudo if you're already root). I'm not sure offhand if there's > any real difference between that and just calling the init.d script > directly, but I believe that's how you're Supposed To Do It. > > regards, tom lane > >
Adam Dear <adear@usnx.net> writes: > I tried: > Delete from pg_shadow where ctid = '(0,1)'; > DELETE 0 This is looking more and more like a transaction ID wraparound problem. Did you try the vacuum freeze suggestion? regards, tom lane
I tried: Vacuum Freeze pg_shadow; Then Reindex database madisoncounty; Then Delete from pg_shadow where ctid='(0,1)'; The record is still there. Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> I tried: >> Delete from pg_shadow where ctid = '(0,1)'; >> DELETE 0 > > This is looking more and more like a transaction ID wraparound problem. > Did you try the vacuum freeze suggestion? > > regards, tom lane >
Adam Dear <adear@usnx.net> writes: > The record is still there. Hmph. Could we see all the system columns from that table? select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow; regards, tom lane
madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow; ctid | xmin | xmax | cmin | cmax | usename -------+------+------+------+------+--------------- (0,1) | 1 | 596 | 596 | 1 | postgres (0,2) | 2 | 1 | 1 | 0 | postgres (0,5) | 2 | 0 | 0 | 0 | madisoncounty (3 rows) Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> The record is still there. > > Hmph. Could we see all the system columns from that table? > select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow; > > regards, tom lane > >
Adam Dear <adear@usnx.net> writes: > madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow; > ctid | xmin | xmax | cmin | cmax | usename > -------+------+------+------+------+--------------- > (0,1) | 1 | 596 | 596 | 1 | postgres > (0,2) | 2 | 1 | 1 | 0 | postgres > (0,5) | 2 | 0 | 0 | 0 | madisoncounty > (3 rows) Hm, I thought you deleted the (0,2) tuple ... did you restore a physical backup or something? Anyway, as far as I can see the way that you got into this state must have been 1. The (0,1) tuple must have been the one originally inserted by initdb; there's no other way it could have xmin=1. 2. Shortly after initdb (at transaction 596 to be exact) this tuple was updated --- probably by a password-assignment operation --- creating the tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the xmax overlays cmin in 7.4, so we can assume that column value is bogus). 3. Much time passes, and pg_shadow never gets vacuumed so the dead tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2 billion + 596, and suddenly transaction 596 appears to be in the future, so the tuple at (0,1) starts to be seen by SELECTs again. 4. At this point you ran VACUUM FREEZE, which replaced the xmins of the second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does not think it could ever need to freeze xmax, and at this point VACUUM wouldn't touch the (0,1) tuple anyway because it considers the tuple as RECENTLY_DEAD. So VACUUM won't help you, at least not for another 2 billion transactions. And the DELETE doesn't work either because it correctly perceives (0,1) as an updated tuple that's been superseded by (0,2), which doesn't meet the WHERE clause so DELETE doesn't touch it. You could delete (0,2) but that leaves you with no working postgres user (since the system's SnapshotNow rules consider (0,1) as dead), and if you create another one you're back to having 2 entries in pg_shadow. Nasty :-( I can't think of any way out of this using plain 7.4 SQL operations. You could maybe hack a special case into VACUUM to make it nuke the dead tuple, but what's probably going to be easier is to manipulate the data on disk. Are you comfortable enough with editing binary data to find the "596" and replace it with "2"? It'd be somewhere near the end of the first (and probably only) block of pg_shadow, and a few bytes before one of the occurrences of the string "postgres". BTW, pg_shadow is $PGDATA/global/1260. (If you try this, do the editing while the postmaster is stopped, else you might have problems with it buffering the old data.) regards, tom lane
I'm comfortable enough with it that I can edit it if I can find exactly what to edit. Whats the best way to edit the file? I've opened it using: vim -b 1260 Here is what is at the end of the block in the file. If you could point me in the right direction, I'd appreciate it. @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^E^@^H^@^C)^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@^@md58fd917bc348d399f31056d33330ba74f^@<98>^B^@^@<81>^E^@^@^@^@^@^@^@^@^@^@ ^E^@^H^@^C%^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@ ^@md58fd917bc348d399f31056d33330ba74f^@<94>^B^@^@<98>^B^@^@^@^@^@^@^@^@^@^@^D^@^H^@^A^E^X^_madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@^B^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^B^@^H^@^C)^X?postgres^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^A^A^A^@'^@^@^@md506464ceceb2b5b44a27417bf6ac59c8a^@^A^@^@ ^@T^B^@^@^A^@^@^@^@^@^@^@^B^@^H^@^A^E^X^_postgres^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^A^@^@^@^A^A^A^@ Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow; >> ctid | xmin | xmax | cmin | cmax | usename >> -------+------+------+------+------+--------------- >> (0,1) | 1 | 596 | 596 | 1 | postgres >> (0,2) | 2 | 1 | 1 | 0 | postgres >> (0,5) | 2 | 0 | 0 | 0 | madisoncounty >> (3 rows) > > Hm, I thought you deleted the (0,2) tuple ... did you restore a physical > backup or something? > > Anyway, as far as I can see the way that you got into this state must > have been > > 1. The (0,1) tuple must have been the one originally inserted by initdb; > there's no other way it could have xmin=1. > > 2. Shortly after initdb (at transaction 596 to be exact) this tuple was > updated --- probably by a password-assignment operation --- creating the > tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the > xmax overlays cmin in 7.4, so we can assume that column value is bogus). > > 3. Much time passes, and pg_shadow never gets vacuumed so the dead > tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2 > billion + 596, and suddenly transaction 596 appears to be in the future, > so the tuple at (0,1) starts to be seen by SELECTs again. > > 4. At this point you ran VACUUM FREEZE, which replaced the xmins of the > second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does > not think it could ever need to freeze xmax, and at this point VACUUM > wouldn't touch the (0,1) tuple anyway because it considers the tuple as > RECENTLY_DEAD. > > So VACUUM won't help you, at least not for another 2 billion > transactions. And the DELETE doesn't work either because it correctly > perceives (0,1) as an updated tuple that's been superseded by (0,2), > which doesn't meet the WHERE clause so DELETE doesn't touch it. > You could delete (0,2) but that leaves you with no working postgres user > (since the system's SnapshotNow rules consider (0,1) as dead), and if > you create another one you're back to having 2 entries in pg_shadow. > Nasty :-( > > I can't think of any way out of this using plain 7.4 SQL operations. > You could maybe hack a special case into VACUUM to make it nuke the > dead tuple, but what's probably going to be easier is to manipulate the > data on disk. Are you comfortable enough with editing binary data > to find the "596" and replace it with "2"? It'd be somewhere near > the end of the first (and probably only) block of pg_shadow, and a > few bytes before one of the occurrences of the string "postgres". > BTW, pg_shadow is $PGDATA/global/1260. > > (If you try this, do the editing while the postmaster is stopped, > else you might have problems with it buffering the old data.) > > regards, tom lane >
Adam Dear wrote: > I'm comfortable enough with it that I can edit it if I can find exactly > what to edit. Whats the best way to edit the file? I've opened it > using: > > vim -b 1260 > > Here is what is at the end of the block in the file. If you could point > me in the right direction, I'd appreciate it. Huh, you really need a proper hex editor. Try "hexedit" for example. And you need to convert the values to hexadecimal. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Adam Dear <adear@usnx.net> writes: > I'm comfortable enough with it that I can edit it if I can find exactly > what to edit. Whats the best way to edit the file? I've opened it using: > vim -b 1260 > Here is what is at the end of the block in the file. If you could point > me in the right direction, I'd appreciate it. I think you need to change T^B to ^B^@ ... though I agree with Alvaro that a hex editor would be a more comfortable tool. regards, tom lane
I downloaded a hex editor, but honestly, I can't make heads or tails of what I'm looking at. I'm not sure what needs changing. Also, FYI, you mentioned that you thought I had deleted the record with ctid=(0,2). I did delete that, but thats what caused the database to not work. I restored a backup copy of the data directory I had made yesterday morning before I started messing with it. If I restored the data directory again wouldn't that undo the Vacuum Freeze command that I did that has locked the database into the state that it is in now? Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> I'm comfortable enough with it that I can edit it if I can find exactly >> what to edit. Whats the best way to edit the file? I've opened it using: > >> vim -b 1260 > >> Here is what is at the end of the block in the file. If you could point >> me in the right direction, I'd appreciate it. > > I think you need to change T^B to ^B^@ ... though I agree with Alvaro > that a hex editor would be a more comfortable tool. > > regards, tom lane >
Adam Dear <adear@usnx.net> writes: > If I restored the data directory again wouldn't that undo the Vacuum > Freeze command that I did that has locked the database into the state > that it is in now? It'd undo the freeze, but that doesn't get you any closer to a solution. What you need is to get rid of the wrapped-around xmax in the (0,1) tuple. regards, tom lane
Let me ask this question. Is there any other option for moving data from one server to another that doesn't involve pg_dump? If I can get the data out of this server, and onto my new one I don't care if the old server gets fixed. I just need the data. The new server already has the table structure, I just need the most current data to put into the server. Tom Lane wrote: > Adam Dear <adear@usnx.net> writes: >> If I restored the data directory again wouldn't that undo the Vacuum >> Freeze command that I did that has locked the database into the state >> that it is in now? > > It'd undo the freeze, but that doesn't get you any closer to a solution. > What you need is to get rid of the wrapped-around xmax in the (0,1) > tuple. > > regards, tom lane >
On Thu, 2008-06-12 at 11:33 -0500, Adam Dear wrote: > Let me ask this question. Is there any other option for moving data > from one server to another that doesn't involve pg_dump? If I can get > the data out of this server, and onto my new one I don't care if the old > server gets fixed. I just need the data. The new server already has > the table structure, I just need the most current data to put into the > server. Slony but you need a super user to install it. Joshua D. Drake
Adam Dear wrote: > Let me ask this question. Is there any other option for moving data > from one server to another that doesn't involve pg_dump? If I can get > the data out of this server, and onto my new one I don't care if the old > server gets fixed. I just need the data. The new server already has > the table structure, I just need the most current data to put into the > server. Yes, you can use COPY table TO STDOUT. This can probably be piped directly from one server to the other. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Adam Dear <adear@usnx.net> writes: > Let me ask this question. Is there any other option for moving data > from one server to another that doesn't involve pg_dump? If I can get > the data out of this server, and onto my new one I don't care if the old > server gets fixed. I just need the data. The new server already has > the table structure, I just need the most current data to put into the > server. Well, you could just manually COPY data out of the tables you care about. regards, tom lane
What is Slony? I have a super user account for the server and the database. Joshua D. Drake wrote: > > On Thu, 2008-06-12 at 11:33 -0500, Adam Dear wrote: >> Let me ask this question. Is there any other option for moving data >> from one server to another that doesn't involve pg_dump? If I can get >> the data out of this server, and onto my new one I don't care if the old >> server gets fixed. I just need the data. The new server already has >> the table structure, I just need the most current data to put into the >> server. > > Slony but you need a super user to install it. > > Joshua D. Drake > > > >
Adam Dear wrote: > Let me ask this question. Is there any other option for moving data > from one server to another that doesn't involve pg_dump? If I can get > the data out of this server, and onto my new one I don't care if the old > server gets fixed. I just need the data. The new server already has > the table structure, I just need the most current data to put into the > server. If that's all you need you may be able to use COPY (server side files) or psql's \copy command (client side files) to dump and load each table. Since your database is pretty messed up you might not even be able to connect with psql - so I'd go for using server side COPY. I don't know if it's safe to run COPY on a server in recovery mode, but I don't really see why it wouldn't be. You have already made a backup of your failed data directory anyway, after all, so the risk should be pretty minimal anyway. If you haven't taken a backup, I VERY strongly suggest doing so before doing anything more. The database server should not be running when you make the backup of the pg data directory (if you're doing a simple file copy, that is). For how to use COPY see "\h COPY" in psql or "man copy". In its simplest form it's just (eg): COPY customer TO '/some/path/on/server'; ... where the postgres user has the right to write to /some/path . If you wanted to save the data into /home/myname, you might do something like the following in a root shell: mkdir /home/myname/pgdump chmod 700 /home/myname/pgdump chown postgres /home/myname/pgdump ... then use: COPY mytable TO '/home/myname/pgdump/mytable'; COPY myothertable TO '/home/myname/pgdump/myothertable'; ... etc. The target path can even be on NFS mount or similar, so if you want you can dump your data directly onto your new server for easy loading. For loading, COPY FROM works much like COPY TO. It might be necessary to disable/drop triggers when you load the data, then re-enable/recreate them afterwards. For decent performance you might also want to drop foreign key constraints and indexes during bulk load with COPY, then recreate them afterwards. pg_dump and pg_restore do all this for you, but nothing stops you from doing it manually. -- Craig Ringer