Thread: pg_dump & blobs - editable dump?
I have been speaking to Pavel about pg_dump support of blobs, and he thinks it is important to allow for some kind of human-readable version of the dump to be created. My guess is that this will involve a plain text schema dump, followed by all BLOBs in separate files, and a script to load them. To implement this I'll obviosly need to be passed a directory/file location for the script since I can't pipe seperate files to stdout. I'd be interested in knowing what features people think are important in this kind of format; what do you need to do with the blob files, what do peple want to edit, etc etc. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Why not have it using something like tar, and the first file being stored in ascii? That way, you could extract easily the human readable SQL but still pipe the blobs to stdout. Peter -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Wednesday, July 12, 2000 2:51 PM To: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Cc: Pavel@Janik.cz Subject: [HACKERS] pg_dump & blobs - editable dump? I have been speaking to Pavel about pg_dump support of blobs, and he thinks it is important to allow for some kind of human-readable version of the dump to be created. My guess is that this will involve a plain text schema dump, followed by all BLOBs in separate files, and a script to load them. To implement this I'll obviosly need to be passed a directory/file location for the script since I can't pipe seperate files to stdout. I'd be interested in knowing what features people think are important in this kind of format; what do you need to do with the blob files, what do peple want to edit, etc etc. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > My guess is that this will involve a plain text schema dump, followed by > all BLOBs in separate files, and a script to load them. To implement this > I'll obviosly need to be passed a directory/file location for the script > since I can't pipe seperate files to stdout. uuencode the blobs, perhaps, using a shar-like format? -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
At 10:13 12/07/00 -0400, Lamar Owen wrote: >Philip Warner wrote: >> My guess is that this will involve a plain text schema dump, followed by >> all BLOBs in separate files, and a script to load them. To implement this >> I'll obviosly need to be passed a directory/file location for the script >> since I can't pipe seperate files to stdout. > >uuencode the blobs, perhaps, using a shar-like format? For the human readable version, the request was to make it editable and sendable to psql. As a result the BLOBs need to be in their binary format OR psql needs to support BLOB import from stdin. As a first pass I was hoping for the simple 'dump them into files' solution. What I am confused by is what people actually want to do with a load of BLOBs sitting in a directory; if there are specific needs, then I'd also like to cater for them in the custom file formats. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Which is why having them on stdout is still a nice option to have. You can pipe the lot through your favourite compressor (gzip, bzip2 etc) and straight on to tape, or whatever. I don't know why you would want them as separate files - just think what would happen to directory search times!! How about this as an idea: * Option to dump sql to stdout and blobs to a designated file * option to dump sql & blobs to stdout * option to dump just sql to stdout * option to dump just blobs to stdout That way (depending on the database design), you could handle the sql & blobs separately but still have everything backed up. PS: Backups is formost on my mind at the moment - had an NT one blow up in my face on Monday and it wasn't nice :-( Peter -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Wednesday, July 12, 2000 3:22 PM To: Lamar Owen Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org; Pavel@Janik.cz Subject: Re: [HACKERS] pg_dump & blobs - editable dump? At 10:13 12/07/00 -0400, Lamar Owen wrote: >Philip Warner wrote: >> My guess is that this will involve a plain text schema dump, followed by >> all BLOBs in separate files, and a script to load them. To implement this >> I'll obviosly need to be passed a directory/file location for the script >> since I can't pipe seperate files to stdout. > >uuencode the blobs, perhaps, using a shar-like format? For the human readable version, the request was to make it editable and sendable to psql. As a result the BLOBs need to be in their binary format OR psql needs to support BLOB import from stdin. As a first pass I was hoping for the simple 'dump them into files' solution. What I am confused by is what people actually want to do with a load of BLOBs sitting in a directory; if there are specific needs, then I'd also like to cater for them in the custom file formats. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 10:13 12/07/00 -0400, Lamar Owen wrote: > >Philip Warner wrote: > >> I'll obviosly need to be passed a directory/file location for the script > >> since I can't pipe seperate files to stdout. > > > >uuencode the blobs, perhaps, using a shar-like format? > For the human readable version, the request was to make it editable and > sendable to psql. As a result the BLOBs need to be in their binary format > OR psql needs to support BLOB import from stdin. As a first pass I was > hoping for the simple 'dump them into files' solution. If in a shell archive format, shouldn't it be easy enough for pg_restore to be made to do the stdin-to-blob thing (through whatever mechanisms you're already using to get the blob back in in the first place, combined with some steering/deshar-ing/uudecoding logic)? The backup could even be made 'self-extracting' as shars usually are... :-) Of course, you then have to be on the watch for the usual shar trojans... If we simply know that the backup cannot be sent to psql, but a deshar-ed version can have the schema sent to psql, would that ameliorate most concerns? -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
At 15:32 12/07/00 +0100, Peter Mount wrote: >Which is why having them on stdout is still a nice option to have. You can >pipe the lot through your favourite compressor (gzip, bzip2 etc) and >straight on to tape, or whatever. Well, the custom format does that, it also does compression and can go to stdout. >I don't know why you would want them as separate files - just think what >would happen to directory search times!! I agree; the request was based on a desire to do something like pg_dump_lo, which puts them all in a directory, I think. >How about this as an idea: > * Option to dump sql to stdout and blobs to a designated file > * option to dump sql & blobs to stdout > * option to dump just sql to stdout > * option to dump just blobs to stdout > The sql is *tiny* compared to most BLOB contents. The new pg_dump currently supports: * schema, table data, & blobs * schema, table data * schema * table data & blobs * table data BLOBS without table data are not recomended since the process of relinking the BLOBs to the tables is *only* performed on tables that are restored. This is to allow import of BLOBS & tables into existing DBs. As a result your fourth option is not really an option. The other three are already covered. Any single-file format (tar would be one of those) can be sent to stdout, and BLOBs are not supported in plain-text output (for obvious reasons). >That way (depending on the database design), you could handle the sql & >blobs separately but still have everything backed up. Unfortunately the data and BLOBS need to go together. >PS: Backups is formost on my mind at the moment - had an NT one blow up in >my face on Monday and it wasn't nice :-( With the current version you should be able to do: pg_dump -Fc --blobs | /dev/myfavoritetapedrive to backup the entire database, with compressed data, to tape. And cat /dev/mt | pg_restore --db=dbname to restore the entire db into the specified database Or, pg_dump -Fc --blobs | pg_restore --db=dbname to copy a database with blobs... So, in summary, I think most of what you want is already there. It's just the human-readable part that's a problem. *Please* let me know if there is some issue I have not considered... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 10:38 12/07/00 -0400, Lamar Owen wrote: > >If we simply know that the backup cannot be sent to psql, but a >deshar-ed version can have the schema sent to psql, would that >ameliorate most concerns? > In the current version pg_restore --schema will send the schema to stdout Is that sufficient? Or are you strictly interested in the text output side of things? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > will send the schema to stdout > Is that sufficient? Or are you strictly interested in the text output side > of things? Strictly interested in the text output side of things, for various not-necessarily-good reasons (:-)). -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
[This is a multi-reply, CCs and -general removed.] From: Peter Mount <petermount@it.maidstone.gov.uk> Date: Wed, 12 Jul 2000 14:58:50 +0100 Hi, > Why not have it using something like tar, and the first file being > stored in ascii? some filesystems do not allow you to have files bigger then 2G :-( I do not think that one file (even gzipped tar file) is good. From: Peter Mount <petermount@it.maidstone.gov.uk> Date: Wed, 12 Jul 2000 15:32:10 +0100 > I don't know why you would want them as separate files - just think what > would happen to directory search times!! No problem, you can use one index file and hashes in it so files are then stored as: AA/AA/AA/00 AA/AA/AA/01 See how squid (http://www.squid-proxy.org/) does his job here. No problem, I think. I really prefer this solution over one big file. You can easily swap files with other databases, you can even create md5sum of md5sums of each file so you can have a multi-md5sum of your database (you can be really sure that your backup is OK, etc. :-). > That way (depending on the database design), you could handle the sql > & blobs separately but still have everythingbacked up. > > PS: Backups is formost on my mind at the moment - had an NT one blow > up in my face on Mondayand it wasn't nice :-( No one (I hope) is arguing about the need for backing BLOBs from the DB :-) -- Pavel Janík ml. Pavel.Janik@linux.cz