Thread: pg_dump and LOs (another proposal)
Having now flirted with recreating BLOBs (and even DBs) with matching OIDs, I find myself thinking it may be a waste of effort for the moment. A modified version of the system used by Pavel Janik in pg_dumplo may be substantially more reliable than my previous proposal: To Dump ------- Dump all LOs by looking in pg_class for relkind='l'. Don't bother cross-referencing with actual table entries, since we are trying to do a backup rather than a consistency check. The dump will consist of the LO and it's original OID. To Load ------- Create a temporary table, lo_xref, with appropriate indexes Reload the LOs, storing old & new oid in lo_xref. Now, disable triggers and sequentially search through all tables that have one or more oid columns: for each oid column, see if the column value is in lo_xref, if it is, update it with the new value. For large databases, this system will rely heavily on lo_xref, so my main worries are: 1. How are temp tables stored? (eg. if in memory this is a problem - Dec/Rdb stores temp tables in memory). 2. Are there any limitation on indexes of temp tables (I seem to be able to create them at least - Dec/Rdb won't even let you do that). ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Having now flirted with recreating BLOBs (and even DBs) with matching OIDs, > I find myself thinking it's a waste of effort for the moment. A modified > version of the system used by Pavel Janik in pg_dumplo may be substantially > more reliable than my previous proposal: I like this a lot better than trying to restore the original OIDs. For one thing, the restore-original-OIDs idea cannot be made to work if what we want to do is load additional tables into an existing database. > For large databases, this system will rely heavily on lo_xref, so my main > worries are: > 1. How are temp tables stored? (eg. if in memory this is a problem - > Dec/Rdb stores temp tables in memory). > 2. Are there any limitation on indexes of temp tables (I seem to be able to > create them at least - Dec/Rdb won't even let you do that). No problem. A temp table is a table, it's just got a unique name under the hood. (So do its indices, IIRC...) regards, tom lane
At 11:09 5/07/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> Having now flirted with recreating BLOBs (and even DBs) with matching OIDs, >> I find myself thinking it's a waste of effort for the moment. A modified >> version of the system used by Pavel Janik in pg_dumplo may be substantially >> more reliable than my previous proposal: > >I like this a lot better than trying to restore the original OIDs. For >one thing, the restore-original-OIDs idea cannot be made to work if what >we want to do is load additional tables into an existing database. > The thing that bugs me about this if for 30,000 rows, I do 30,000 updates after the restore. It seems *really* inefficient, not to mention slow. I'll also have to modify pg_restore to talk to the database directly (for lo import). As a result I will probably send the entire script directly from withing pg_restore. Do you know if comment parsing ('--') is done in the backend, or psql? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > The thing that bugs me about this if for 30,000 rows, I do 30,000 updates > after the restore. It seems *really* inefficient, not to mention slow. Shouldn't be a problem. For one thing, I can assure you there are no databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO infrastructure won't support it. (I think Denis Perchine has started to work on a replacement one-table-for-all-LOs solution, btw.) Possibly more to the point, there's no reason for pg_restore to grovel through the individual rows for itself. Having identified a column that contains (or might contain) LO OIDs, you can do something like UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE oidcolumn = tmptable.oldLOoid; which should be quick enough, especially given indexes. > I'll also have to modify pg_restore to talk to the database directly (for > lo import). As a result I will probably send the entire script directly > from withing pg_restore. Do you know if comment parsing ('--') is done in > the backend, or psql? Both, I believe --- psql discards comments, but so will the backend. Not sure you really need to abandon use of psql, though. regards, tom lane
Hello Tom, Wednesday, July 05, 2000, 9:06:33 PM, you wrote: TL> Philip Warner <pjw@rhyme.com.au> writes: >> The thing that bugs me about this if for 30,000 rows, I do 30,000 updates >> after the restore. It seems *really* inefficient, not to mention slow. TL> Shouldn't be a problem. For one thing, I can assure you there are no TL> databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO Hmmm... I have 127865 LOs at the moment. :-))) But with my patch where all LOs are usual files on FS. I will move it to one-table-for-all-LOs after my holidays. TL> infrastructure won't support it. (I think Denis Perchine has started TL> to work on a replacement one-table-for-all-LOs solution, btw.) Possibly You can try it. I sent it to pgsql-patches some time ago. TL> more to the point, there's no reason for pg_restore to grovel through TL> the individual rows for itself. Having identified a column that TL> contains (or might contain) LO OIDs, you can do something like -- Best regards,Denis mailto:dyp@perchine.com
At 13:06 5/07/00 -0400, Tom Lane wrote: >Shouldn't be a problem. For one thing, I can assure you there are no >databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO >infrastructure won't support it. (I think Denis Perchine has started Eeek! Not so long ago I was going to use PG for a database will far more than that many documents (mainly becuase of no backup, horrible storage etc). Glad I didn't. >> I'll also have to modify pg_restore to talk to the database directly (for >> lo import). As a result I will probably send the entire script directly >> from withing pg_restore. Do you know if comment parsing ('--') is done in >> the backend, or psql? > >Both, I believe --- psql discards comments, but so will the backend. >Not sure you really need to abandon use of psql, though. Don't plan to abandon it, but I did plan to use lo_creat, lo_write to add the LOs, and that requires no psql, I think. I want this utility to run direct from tape, without lots of temp files. I'll probably just have a new arg, --blobs, and another --db, which makes a direct DB connection, and --blobs without --db will not be supported. Does this sound OK? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 13:06 5/07/00 -0400, Tom Lane wrote: > > UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE > oidcolumn = tmptable.oldLOoid; > It's actually nastier than this since there could be multiple oid columns, implying, potentially, multiple scans of the table. I suppose update userTable setoidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1 ), oidCol1 ),oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2 ), oidCol2 ),... would work, or at least only update each row once, but it looks slow. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 13:06 5/07/00 -0400, Tom Lane wrote: >> UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE >> oidcolumn = tmptable.oldLOoid; > It's actually nastier than this since there could be multiple oid columns, > implying, potentially, multiple scans of the table. So? > I suppose > update userTable set > oidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1 > ), oidCol1 ), > oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2 > ), oidCol2 ), > ... > would work, or at least only update each row once, but it looks slow. Almost certainly slower than processing each column in a separate UPDATE. It does not pay to try to be smarter than the planner is ;-) regards, tom lane
Philip Warner writes: > I'll also have to modify pg_restore to talk to the database directly (for > lo import). psql has \lo_import. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 18:12 6/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> I'll also have to modify pg_restore to talk to the database directly (for >> lo import). > >psql has \lo_import. > This is true, but if there are 30000 blobs on an archive tape, I cant dump them into /tmp and wait for the user to run the script. At the current time pg_restore just sends a script to a file or stdout - it has no guarantee of when a \lo_import command will be run, so dumping blobs into the same file between lo_import calls would not be appropriate, since I am in effect requiring a psql attachment. So the plan is, in the first pass, to make BLOB restoration dependant on having a DB connection. Does this make more sense? P.S. I have only half-written the lo dumping code, so this is all quite open... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 18:12 6/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> I'll also have to modify pg_restore to talk to the database directly (for >> lo import). > >psql has \lo_import. > P.S. Another, possibly minor, advantage of using a direct db connection is I can allow the user to stop restoring the database on the first error, unlike a script file to psql. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> P.S. I have only half-written the lo dumping code, so this is all quite > open... A "blasphemy" question, is really needful LO dump if we will have TOAST and LO will lonely past? If anyone still need dump LO (for example I) is possible use pg_dumplo from contrib tree, that (as some users say) works very well. Not is work on LO dump, after several years and during LO funeral loss of time? (sorry). Karel
At 09:30 7/07/00 +0200, Karel Zak wrote: > >> P.S. I have only half-written the lo dumping code, so this is all quite >> open... > > A "blasphemy" question, is really needful LO dump if we will have TOAST and >LO will lonely past? If anyone still need dump LO (for example I) is possible >use pg_dumplo from contrib tree, that (as some users say) works very well. >Not is work on LO dump, after several years and during LO funeral loss of >time? (sorry). There are three reasons why I continue: 1. To learn 2. Because I believe that BLOBs will exist after TOAST, although the implementation will have changed. The code to handle the current format will be at least 70% reusable (assuming a similar set of lo_open/read/write/close calls). 3. We will need a way of exporting old BLOBs and importing them as TOAST BLOBs. I could be wrong about (2), but I think binary data can not be easily loaded from a pure text file, and (3) could ultimately be handled by pg_dump_lo, but I like the idea of an integrated tool. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner writes: > >psql has \lo_import. > This is true, but if there are 30000 blobs on an archive tape, I cant dump > them into /tmp and wait for the user to run the script. At the current time > pg_restore just sends a script to a file or stdout - it has no guarantee of > when a \lo_import command will be run, so dumping blobs into the same file > between lo_import calls would not be appropriate, since I am in effect > requiring a psql attachment. I don't understand. How else would you restore a large object if not using libpq's lo_import() call? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Philip Warner writes: > >psql has \lo_import. > P.S. Another, possibly minor, advantage of using a direct db connection is > I can allow the user to stop restoring the database on the first error, > unlike a script file to psql. \set ON_ERROR_STOP -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 18:15 7/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> >psql has \lo_import. > >> This is true, but if there are 30000 blobs on an archive tape, I cant dump >> them into /tmp and wait for the user to run the script. At the current time >> pg_restore just sends a script to a file or stdout - it has no guarantee of >> when a \lo_import command will be run, so dumping blobs into the same file >> between lo_import calls would not be appropriate, since I am in effect >> requiring a psql attachment. > >I don't understand. How else would you restore a large object if not using >libpq's lo_import() call? > Direct connection to DB and use lo_creat, lo_open, lo_write & lo_close - ie. what lo_import does under the hood. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/