Thread: RE: pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

RE: pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

From
Peter Mount
Date:
See below...

--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council


-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Friday, August 04, 2000 2:29 AM
To: Tom Lane
Cc: pgsql-hackers@postgreSQL.org; pgsql-general@postgreSQL.org
Subject: Re: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT
(optional!)


At 21:10 3/08/00 -0400, Tom Lane wrote:
>As well as break the semantics: if you have a multiply-referenced BLOB
>then you can update it through any reference and the changes are visible
>through all the references.  Not so after you convert the data into
>non-BLOB values.

That's what I meant. People *shouldn't* expect BLOB fields to be updated in
more than one table, but the implementation currently allow it (since BLOBs
are not implemented as fields).

Peter: I dissagree. There are dozens of instances where you would use a
single BLOB but refer to it in more than one table. If you have a 1Mb blob
refered to in 3 different tables, you don't want to store 3 instances of it.
Say you were implementing some form of DIP system (Document Image
Processing), then you only want one copy of the document stored, so that if
that document changes, then every instance is changed.

>I don't see that pg_dump can help meaningfully,
>and I'd just as soon resist feature bloat in pg_dump.

Fine. Thinking about it, even *if* it was implemented as a utility, I
suspect (for the reasons you outlined), conversion would be a multi-step
process. And a more useful utility would be one that converted an existing
database, rather than trying to everything in the 'restore'...

Peter: It might be useful to have the utility and put it under contrib. It
would then save people from reinventing the wheel.

Forget I even mentioned it.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

From
"Ross J. Reedstrom"
Date:
On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote:
> See below...
>
> Peter: I dissagree. There are dozens of instances where you would use a
> single BLOB but refer to it in more than one table. If you have a 1Mb blob
> refered to in 3 different tables, you don't want to store 3 instances of it.
> Say you were implementing some form of DIP system (Document Image
> Processing), then you only want one copy of the document stored, so that if
> that document changes, then every instance is changed.
>

But Peter, the relational way to avoid redundant storage should apply. For
every other type, one does this by storing the data in one place, with
a unique ID, and using the ID to refer to the data item, and joining when
you need the item itself.

So, once large data items are promoted to first class types, they should
act just like every other first class type. Otherwise, we violate the
principle of least surprise. Having software that tries to second guess
the developer is always frustrating.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

From
Bruce Momjian
Date:
> On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote:
> > See below...
> >
> > Peter: I dissagree. There are dozens of instances where you would use a
> > single BLOB but refer to it in more than one table. If you have a 1Mb blob
> > refered to in 3 different tables, you don't want to store 3 instances of it.
> > Say you were implementing some form of DIP system (Document Image
> > Processing), then you only want one copy of the document stored, so that if
> > that document changes, then every instance is changed.
> >
>
> But Peter, the relational way to avoid redundant storage should apply. For
> every other type, one does this by storing the data in one place, with
> a unique ID, and using the ID to refer to the data item, and joining when
> you need the item itself.
>
> So, once large data items are promoted to first class types, they should
> act just like every other first class type. Otherwise, we violate the
> principle of least surprise. Having software that tries to second guess
> the developer is always frustrating.

I totally agree.  Because large objects exist aas separate file, this
was required, but after TOAST, the proper relational way should be used.


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026