Thread: RE: pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)
See below... -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Friday, August 04, 2000 2:29 AM To: Tom Lane Cc: pgsql-hackers@postgreSQL.org; pgsql-general@postgreSQL.org Subject: Re: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT (optional!) At 21:10 3/08/00 -0400, Tom Lane wrote: >As well as break the semantics: if you have a multiply-referenced BLOB >then you can update it through any reference and the changes are visible >through all the references. Not so after you convert the data into >non-BLOB values. That's what I meant. People *shouldn't* expect BLOB fields to be updated in more than one table, but the implementation currently allow it (since BLOBs are not implemented as fields). Peter: I dissagree. There are dozens of instances where you would use a single BLOB but refer to it in more than one table. If you have a 1Mb blob refered to in 3 different tables, you don't want to store 3 instances of it. Say you were implementing some form of DIP system (Document Image Processing), then you only want one copy of the document stored, so that if that document changes, then every instance is changed. >I don't see that pg_dump can help meaningfully, >and I'd just as soon resist feature bloat in pg_dump. Fine. Thinking about it, even *if* it was implemented as a utility, I suspect (for the reasons you outlined), conversion would be a multi-step process. And a more useful utility would be one that converted an existing database, rather than trying to everything in the 'restore'... Peter: It might be useful to have the utility and put it under contrib. It would then save people from reinventing the wheel. Forget I even mentioned it. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote: > See below... > > Peter: I dissagree. There are dozens of instances where you would use a > single BLOB but refer to it in more than one table. If you have a 1Mb blob > refered to in 3 different tables, you don't want to store 3 instances of it. > Say you were implementing some form of DIP system (Document Image > Processing), then you only want one copy of the document stored, so that if > that document changes, then every instance is changed. > But Peter, the relational way to avoid redundant storage should apply. For every other type, one does this by storing the data in one place, with a unique ID, and using the ID to refer to the data item, and joining when you need the item itself. So, once large data items are promoted to first class types, they should act just like every other first class type. Otherwise, we violate the principle of least surprise. Having software that tries to second guess the developer is always frustrating. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote: > > See below... > > > > Peter: I dissagree. There are dozens of instances where you would use a > > single BLOB but refer to it in more than one table. If you have a 1Mb blob > > refered to in 3 different tables, you don't want to store 3 instances of it. > > Say you were implementing some form of DIP system (Document Image > > Processing), then you only want one copy of the document stored, so that if > > that document changes, then every instance is changed. > > > > But Peter, the relational way to avoid redundant storage should apply. For > every other type, one does this by storing the data in one place, with > a unique ID, and using the ID to refer to the data item, and joining when > you need the item itself. > > So, once large data items are promoted to first class types, they should > act just like every other first class type. Otherwise, we violate the > principle of least surprise. Having software that tries to second guess > the developer is always frustrating. I totally agree. Because large objects exist aas separate file, this was required, but after TOAST, the proper relational way should be used. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026