Thread: designing tables for blobs - what are the guidelines?
Hi there, I've been reading documentation and manuals and things and haven't been able to find a discussion of this in a clear manner. (If anyone can point me to a reference I've missed I'd be grateful.) Here are some questions I have: Can I put an arbitrary binary file in a bytea field? And should I? Is this the best way to store arbitrary binaries? My motivating example: I have a largish table, with potentially tens of thousands of rows. Each one has some simple numeric or text data, and an OPTIONAL associated binary archive (.tar.gz) containing truly arbitrary types of files. Sometimes a set of images in one format, sometimes a single image in another format, sometimes a proprietary file produced by certain lab equipment - and who knows where it might go. So I thought maybe it would make retrievals slow if I packed the binaries in there. Or is that only if I'm lazy and "select * from table"? Perhaps I should make a special archive table and refer to it by some key. Say, id, [array of archive content file names], bytea for the archive. Is there a recommended way? Is there something else I should try? thanks for any feedback, Cath Cath Lawrence, Cath.Lawrence@anu.edu.au Senior Scientific Programmer, Centre for Bioinformation Science, John Curtin School of Medical Research (room 4088) Australian National University, Canberra ACT 0200 ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595
OK, a followup to my question now before anyone replies... On Wednesday, July 30, 2003, at 04:01 PM, Cath Lawrence wrote: > Perhaps I should make a special archive table and refer to it by some > key. Say, id, [array of archive content file names], bytea for the > archive. Actually this now strikes me as a bad idea, since I think I can't search my table by filename to find what archive it's in, and retrieve that archive. With a plaintext list I could do "select archive from bin_archives where file_name_list like ('%my_wanted_file%')" Or am I being clueless about arrays, too? thanks, Cath Cath Lawrence, Cath.Lawrence@anu.edu.au Senior Scientific Programmer, Centre for Bioinformation Science, John Curtin School of Medical Research (room 4088) Australian National University, Canberra ACT 0200 ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595
Cath Lawrence <Cath.Lawrence@anu.edu.au> writes: > Can I put an arbitrary binary file in a bytea field? > And should I? > Is this the best way to store arbitrary binaries? You could, but it doesn't scale very well to large binary files, because there's not good provisions for fetching or storing portions of a large bytea field. If the files in question average no more than a megabyte or two, with an upper limit of perhaps 100meg, then I'd recommend this. Otherwise you should do something different. Reasonable values of "something different" include: 1. put the data in a "large object", and store the LO's OID as a reference. 2. keep the data in a plain filesystem file outside the database, and store its pathname in the database. (2) is only workable if your clients are on the same machine as the database, or at least can NFS-mount the area where you're keeping the files. Also, it doesn't give you any help with protection or transaction-safety issues for the files. On the plus side, it'll probably allow clients to read and write the files faster than if they went through the database. regards, tom lane
On Thursday, July 31, 2003, at 01:13 AM, Tom Lane wrote: > Cath Lawrence <Cath.Lawrence@anu.edu.au> writes: >> Can I put an arbitrary binary file in a bytea field? >> And should I? >> Is this the best way to store arbitrary binaries? > You could, but it doesn't scale very well to large binary files, > because > there's not good provisions for fetching or storing portions of a large > bytea field. If the files in question average no more than a megabyte > or two, with an upper limit of perhaps 100meg, then I'd recommend this. > Otherwise you should do something different. > > Reasonable values of "something different" include: > 1. put the data in a "large object", and store the LO's OID as a > reference. > 2. keep the data in a plain filesystem file outside the database, and > store > its pathname in the database. Thanks Tom, that's a good start. I'm getting the idea, I think. My problem comes from finding references to large objects scattered all over the place in the postgresql docs - there are oids and the pg_largeobject table, and there are byteas - and when you should use which I don't really know. Why would you choose to use pg_largeobject over a bytea field, or vice versa? cheers Cath Cath Lawrence, Cath.Lawrence@anu.edu.au Senior Scientific Programmer, Centre for Bioinformation Science, John Curtin School of Medical Research (room 4088) Australian National University, Canberra ACT 0200 ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595
On Wednesday, July 30, 2003, at 12:14 AM, Cath Lawrence wrote: > OK, a followup to my question now before anyone replies... > > On Wednesday, July 30, 2003, at 04:01 PM, Cath Lawrence wrote: >> Perhaps I should make a special archive table and refer to it by some >> key. Say, id, [array of archive content file names], bytea for the >> archive. > > Actually this now strikes me as a bad idea, since I think I can't > search my table by filename to find what archive it's in, and retrieve > that archive. With a plaintext list I could do > "select archive from bin_archives where file_name_list like > ('%my_wanted_file%')" > > Or am I being clueless about arrays, too? This sounds like you want to have a one-to-many relationship between your record and its associated files. What you need to do is create a separate table that will reference your record id: CREATE TABLE associated_files ( fileid serial PRIMARY KEY, recordid int REFERENCES master_records (id), filepath text NOT NULL ); Does that help? --Andrew Vit