Re: Storing large files in multiple schemas: BLOB or BYTEA - Mailing list pgsql-general

From Chris Travers
Subject Re: Storing large files in multiple schemas: BLOB or BYTEA
Date
Msg-id CAKt_Zft9Aqwc2pWoFr1BBWyt8GqugYX9u=ERhqD0YOd7icGzHw@mail.gmail.com
Whole thread Raw
In response to Re: Storing large files in multiple schemas: BLOB or BYTEA  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-general


On Wed, Oct 10, 2012 at 10:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 10/11/2012 01:35 PM, tigran2-postgres@riatest.com wrote:
Using files stored outside the database creates all sorts of problems.
For starters you lose ACID guaranties. I would prefer to keep them in
database. We did a lot of experiments with Large Objects and they really
worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB).
Postgres does a really good job with Large Objects. If it was not the
pg_dump problem I would not hesitate to use LOBs.

Yeah, a pg_dump mode that dumped everything but large objects would be nice.

Remembering when that was the only way pg_dump worked and it caused plenty of problems.

But yeah, --only-lobs and --no-lobs might be nice switches.

Right now I find storing large objects in the DB such a pain from a backup management point of view that I avoid it where possible.


I'm now wondering about the idea of implementing a pg_dump option that dumped large objects into a directory tree like
  lobs/[loid]/[lob_md5]
and wrote out a restore script that loaded them using `lo_import`.

Thinking of the problems that occurred when we used to require lobs to be backed up to binary archive formats....


During dumping temporary copies could be written to something like lobs/[loid]/.tmp.nnnn with the md5 being calculated on the fly as the byte stream is read. If the dumped file had the same md5 as the existing one it'd just delete the tempfile; otherwise the tempfile would be renamed to the calculated md5.

 

That way incremental backup systems could manage the dumped LOB tree without quite the same horrible degree of duplication as is currently faced when using lo in the database with pg_dump.

How do incremental backup systems work with lots of data anyway with pg_dump?  I would think thats not the approach I would take to incremental backups and PostgreSQL.... 

A last_modified timestamp on `pg_largeobject_metadata` would be even better, allowing the cost of reading and discarding rarely-changed large objects to be avoided.

It might be interesting to look at the issue of large objects from a total backup perspective.    I do wonder though where the end would be.  You could have 500MB text fields and those might pose backup issues as well.  I suppose with better LOB support on backups, it would give additional options for management.

Best Wishes,
Chris Travers

pgsql-general by date:

Previous
From: JC de Villa
Date:
Subject: Re: auto-increment field : in a simple way
Next
From: Craig Ringer
Date:
Subject: Re: moving from MySQL to pgsql