Thread: pg_dump and LOs (another proposal)

pg_dump and LOs (another proposal)

From
Philip Warner
Date:
Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
I find myself thinking it may be a waste of effort for the moment. A
modified version of the system used by Pavel Janik in pg_dumplo may be
substantially more reliable than my previous proposal:

To Dump
-------

Dump all LOs by looking in pg_class for relkind='l'. 

Don't bother cross-referencing with actual table entries, since we are
trying to do a backup rather than a consistency check. 

The dump will consist of the LO and it's original OID.


To Load
-------

Create a temporary table, lo_xref, with appropriate indexes

Reload the LOs, storing old & new oid in lo_xref.

Now, disable triggers and sequentially search through all tables that have
one or more oid columns: for each oid column, see if the column value is in
lo_xref, if it is, update it with the new value.

For large databases, this system will rely heavily on lo_xref, so my main
worries are:

1. How are temp tables stored? (eg. if in memory this is a problem -
Dec/Rdb stores temp tables in memory).

2. Are there any limitation on indexes of temp tables (I seem to be able to
create them at least - Dec/Rdb won't even let you do that).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump and LOs (another proposal)

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
> I find myself thinking it's a waste of effort for the moment. A modified
> version of the system used by Pavel Janik in pg_dumplo may be substantially
> more reliable than my previous proposal:

I like this a lot better than trying to restore the original OIDs.  For
one thing, the restore-original-OIDs idea cannot be made to work if what
we want to do is load additional tables into an existing database.

> For large databases, this system will rely heavily on lo_xref, so my main
> worries are:

> 1. How are temp tables stored? (eg. if in memory this is a problem -
> Dec/Rdb stores temp tables in memory).

> 2. Are there any limitation on indexes of temp tables (I seem to be able to
> create them at least - Dec/Rdb won't even let you do that).

No problem.  A temp table is a table, it's just got a unique name under
the hood.  (So do its indices, IIRC...)
        regards, tom lane


Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 11:09 5/07/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> Having now flirted with recreating BLOBs (and even DBs) with matching OIDs,
>> I find myself thinking it's a waste of effort for the moment. A modified
>> version of the system used by Pavel Janik in pg_dumplo may be substantially
>> more reliable than my previous proposal:
>
>I like this a lot better than trying to restore the original OIDs.  For
>one thing, the restore-original-OIDs idea cannot be made to work if what
>we want to do is load additional tables into an existing database.
>

The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
after the restore. It seems *really* inefficient, not to mention slow.

I'll also have to modify pg_restore to talk to the database directly (for
lo import). As a result I will probably send the entire script directly
from withing pg_restore. Do you know if comment parsing ('--') is done in
the backend, or psql?




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
> after the restore. It seems *really* inefficient, not to mention slow.

Shouldn't be a problem.  For one thing, I can assure you there are no
databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO
infrastructure won't support it.  (I think Denis Perchine has started
to work on a replacement one-table-for-all-LOs solution, btw.)  Possibly
more to the point, there's no reason for pg_restore to grovel through
the individual rows for itself.  Having identified a column that
contains (or might contain) LO OIDs, you can do something like
UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE    oidcolumn = tmptable.oldLOoid;

which should be quick enough, especially given indexes.

> I'll also have to modify pg_restore to talk to the database directly (for
> lo import). As a result I will probably send the entire script directly
> from withing pg_restore. Do you know if comment parsing ('--') is done in
> the backend, or psql?

Both, I believe --- psql discards comments, but so will the backend.
Not sure you really need to abandon use of psql, though.
        regards, tom lane


Re[2]: Re: pg_dump and LOs (another proposal)

From
Denis Perchine
Date:
Hello Tom,

Wednesday, July 05, 2000, 9:06:33 PM, you wrote:

TL> Philip Warner <pjw@rhyme.com.au> writes:
>> The thing that bugs me about this if for 30,000 rows, I do 30,000 updates
>> after the restore. It seems *really* inefficient, not to mention slow.

TL> Shouldn't be a problem.  For one thing, I can assure you there are no
TL> databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO

Hmmm... I have 127865 LOs at the moment. :-))) But with my patch where
all LOs are usual files on FS. I will move it to one-table-for-all-LOs
after my holidays.

TL> infrastructure won't support it.  (I think Denis Perchine has started
TL> to work on a replacement one-table-for-all-LOs solution, btw.)  Possibly

You can try it. I sent it to pgsql-patches some time ago.

TL> more to the point, there's no reason for pg_restore to grovel through
TL> the individual rows for itself.  Having identified a column that
TL> contains (or might contain) LO OIDs, you can do something like

-- 
Best regards,Denis                            mailto:dyp@perchine.com




Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 13:06 5/07/00 -0400, Tom Lane wrote:
>Shouldn't be a problem.  For one thing, I can assure you there are no
>databases with 30,000 LOs in them ;-) --- the existing two-tables-per-LO
>infrastructure won't support it.  (I think Denis Perchine has started

Eeek! Not so long ago I was going to use PG for a database will far more
than that many documents (mainly becuase of no backup, horrible storage
etc). Glad I didn't.


>> I'll also have to modify pg_restore to talk to the database directly (for
>> lo import). As a result I will probably send the entire script directly
>> from withing pg_restore. Do you know if comment parsing ('--') is done in
>> the backend, or psql?
>
>Both, I believe --- psql discards comments, but so will the backend.
>Not sure you really need to abandon use of psql, though.

Don't plan to abandon it, but I did plan to use lo_creat, lo_write to add
the LOs, and that requires no psql, I think. I want this utility to run
direct from tape, without lots of temp files.

I'll probably just have a new arg, --blobs, and another --db, which makes a
direct DB connection, and --blobs without --db will not be supported. Does
this sound OK?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 13:06 5/07/00 -0400, Tom Lane wrote:
>
>    UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE
>        oidcolumn = tmptable.oldLOoid;
>

It's actually nastier than this since there could be multiple oid columns,
implying, potentially, multiple scans of the table.

I suppose

update userTable setoidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1
), oidCol1 ),oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2
), oidCol2 ),...

would work, or at least only update each row once, but it looks slow.






----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 13:06 5/07/00 -0400, Tom Lane wrote:
>> UPDATE userTable SET oidcolumn = tmptable.newLOoid WHERE
>> oidcolumn = tmptable.oldLOoid;

> It's actually nastier than this since there could be multiple oid columns,
> implying, potentially, multiple scans of the table.

So?

> I suppose

> update userTable set
>     oidCol1 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol1
> ), oidCol1 ),
>     oidCol2 = Coalesce( (Select newLOoid from oidxref where oldLOoid = oidCol2
> ), oidCol2 ),
>     ...

> would work, or at least only update each row once, but it looks slow.

Almost certainly slower than processing each column in a separate
UPDATE.  It does not pay to try to be smarter than the planner is ;-)
        regards, tom lane


Re: Re: pg_dump and LOs (another proposal)

From
Peter Eisentraut
Date:
Philip Warner writes:

> I'll also have to modify pg_restore to talk to the database directly (for
> lo import).

psql has \lo_import.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 18:12 6/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> I'll also have to modify pg_restore to talk to the database directly (for
>> lo import).
>
>psql has \lo_import.
>

This is true, but if there are 30000 blobs on an archive tape, I cant dump
them into /tmp and wait for the user to run the script. At the current time
pg_restore just sends a script to a file or stdout - it has no guarantee of
when a \lo_import command will be run, so dumping blobs into the same file
between lo_import calls would not be appropriate, since I am in effect
requiring a psql attachment. 

So the plan is, in the first pass, to make BLOB restoration dependant on
having a DB connection.

Does this make more sense?

P.S. I have only half-written the lo dumping code, so this is all quite
open...




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 18:12 6/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> I'll also have to modify pg_restore to talk to the database directly (for
>> lo import).
>
>psql has \lo_import.
>

P.S. Another, possibly minor, advantage of using a direct db connection is
I can allow the user to stop restoring the database on the first error,
unlike a script file to psql.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Karel Zak
Date:
> P.S. I have only half-written the lo dumping code, so this is all quite
> open...
A "blasphemy" question, is really needful LO dump if we will have TOAST and
LO will lonely past? If anyone still need dump LO (for example I) is possible 
use pg_dumplo from contrib tree, that (as some users say) works very well. 
Not is work on LO dump, after several years and during LO funeral loss of
time? (sorry).
                    Karel



Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 09:30 7/07/00 +0200, Karel Zak wrote:
>
>> P.S. I have only half-written the lo dumping code, so this is all quite
>> open...
>
> A "blasphemy" question, is really needful LO dump if we will have TOAST and
>LO will lonely past? If anyone still need dump LO (for example I) is
possible 
>use pg_dumplo from contrib tree, that (as some users say) works very well. 
>Not is work on LO dump, after several years and during LO funeral loss of
>time? (sorry).

There are three reasons why I continue:

1. To learn

2. Because I believe that BLOBs will exist after TOAST, although the
implementation will have changed. The code to handle the current format
will be at least 70% reusable (assuming a similar set of
lo_open/read/write/close calls).

3. We will need a way of exporting old BLOBs and importing them as TOAST
BLOBs.

I could be wrong about (2), but I think binary data can not be easily
loaded from a pure text file, and (3) could ultimately be handled by
pg_dump_lo, but I like the idea of an integrated tool.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: pg_dump and LOs (another proposal)

From
Peter Eisentraut
Date:
Philip Warner writes:

> >psql has \lo_import.

> This is true, but if there are 30000 blobs on an archive tape, I cant dump
> them into /tmp and wait for the user to run the script. At the current time
> pg_restore just sends a script to a file or stdout - it has no guarantee of
> when a \lo_import command will be run, so dumping blobs into the same file
> between lo_import calls would not be appropriate, since I am in effect
> requiring a psql attachment. 

I don't understand. How else would you restore a large object if not using
libpq's lo_import() call?


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Re: pg_dump and LOs (another proposal)

From
Peter Eisentraut
Date:
Philip Warner writes:

> >psql has \lo_import.

> P.S. Another, possibly minor, advantage of using a direct db connection is
> I can allow the user to stop restoring the database on the first error,
> unlike a script file to psql.

\set ON_ERROR_STOP


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Re: pg_dump and LOs (another proposal)

From
Philip Warner
Date:
At 18:15 7/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> >psql has \lo_import.
>
>> This is true, but if there are 30000 blobs on an archive tape, I cant dump
>> them into /tmp and wait for the user to run the script. At the current time
>> pg_restore just sends a script to a file or stdout - it has no guarantee of
>> when a \lo_import command will be run, so dumping blobs into the same file
>> between lo_import calls would not be appropriate, since I am in effect
>> requiring a psql attachment. 
>
>I don't understand. How else would you restore a large object if not using
>libpq's lo_import() call?
>

Direct connection to DB and use lo_creat, lo_open, lo_write & lo_close -
ie. what lo_import does under the hood.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/