Thread: designing tables for blobs - what are the guidelines?

designing tables for blobs - what are the guidelines?

From
Cath Lawrence
Date:
Hi there,

I've been reading documentation and manuals and things and haven't been
able to find a discussion of this in a clear manner. (If anyone can
point me to a reference I've missed I'd be grateful.)

Here are some questions I have:
Can I put an arbitrary binary file in a bytea field?
And should I?
Is this the best way to store arbitrary binaries?

My motivating example: I have a largish table, with potentially tens of
thousands of rows. Each one has some simple numeric or text data, and
an OPTIONAL associated binary archive (.tar.gz) containing truly
arbitrary types of files. Sometimes a set of images in one format,
sometimes a single image in another format, sometimes a proprietary
file produced by certain lab equipment - and who knows where it might
go.

So I thought maybe it would make retrievals slow if I packed the
binaries in there. Or is that only if I'm lazy and "select * from
table"? Perhaps I should make a special archive table and refer to it
by some key. Say, id, [array of archive content file names], bytea for
the archive.

Is there a recommended way?
Is there something else I should try?

thanks for any feedback,
Cath
Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595


Re: designing tables - sidetrack into arrays

From
Cath Lawrence
Date:
OK, a followup to my question now before anyone replies...

On Wednesday, July 30, 2003, at 04:01  PM, Cath Lawrence wrote:
> Perhaps I should make a special archive table and refer to it by some
> key. Say, id, [array of archive content file names], bytea for the
> archive.

Actually this now strikes me as a bad idea, since I think I can't
search my table by filename to find what archive it's in, and retrieve
that archive. With a plaintext list I could do
"select archive from bin_archives where file_name_list like
('%my_wanted_file%')"

Or am I being clueless about arrays, too?

thanks,
Cath
Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595


Re: designing tables for blobs - what are the guidelines?

From
Tom Lane
Date:
Cath Lawrence <Cath.Lawrence@anu.edu.au> writes:
> Can I put an arbitrary binary file in a bytea field?
> And should I?
> Is this the best way to store arbitrary binaries?

You could, but it doesn't scale very well to large binary files, because
there's not good provisions for fetching or storing portions of a large
bytea field.  If the files in question average no more than a megabyte
or two, with an upper limit of perhaps 100meg, then I'd recommend this.
Otherwise you should do something different.

Reasonable values of "something different" include:
1. put the data in a "large object", and store the LO's OID as a reference.
2. keep the data in a plain filesystem file outside the database, and store
   its pathname in the database.

(2) is only workable if your clients are on the same machine as the
database, or at least can NFS-mount the area where you're keeping the
files.  Also, it doesn't give you any help with protection or
transaction-safety issues for the files.  On the plus side, it'll
probably allow clients to read and write the files faster than if they
went through the database.

            regards, tom lane

Re: designing tables for blobs - what are the guidelines?

From
Cath Lawrence
Date:
On Thursday, July 31, 2003, at 01:13  AM, Tom Lane wrote:
> Cath Lawrence <Cath.Lawrence@anu.edu.au> writes:
>> Can I put an arbitrary binary file in a bytea field?
>> And should I?
>> Is this the best way to store arbitrary binaries?
> You could, but it doesn't scale very well to large binary files,
> because
> there's not good provisions for fetching or storing portions of a large
> bytea field.  If the files in question average no more than a megabyte
> or two, with an upper limit of perhaps 100meg, then I'd recommend this.
> Otherwise you should do something different.
>
> Reasonable values of "something different" include:
> 1. put the data in a "large object", and store the LO's OID as a
> reference.
> 2. keep the data in a plain filesystem file outside the database, and
> store
>    its pathname in the database.

Thanks Tom, that's a good start.

I'm getting the idea, I think. My problem comes from finding references
to large objects scattered all over the place in the postgresql docs -
there are oids and the pg_largeobject table, and there are byteas - and
when you should use which I don't really know. Why would you choose to
use pg_largeobject over a bytea field, or vice versa?

cheers
Cath
Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595


Re: designing tables - sidetrack into arrays

From
Andrew Vit
Date:
On Wednesday, July 30, 2003, at 12:14  AM, Cath Lawrence wrote:

> OK, a followup to my question now before anyone replies...
>
> On Wednesday, July 30, 2003, at 04:01  PM, Cath Lawrence wrote:
>> Perhaps I should make a special archive table and refer to it by some
>> key. Say, id, [array of archive content file names], bytea for the
>> archive.
>
> Actually this now strikes me as a bad idea, since I think I can't
> search my table by filename to find what archive it's in, and retrieve
> that archive. With a plaintext list I could do
> "select archive from bin_archives where file_name_list like
> ('%my_wanted_file%')"
>
> Or am I being clueless about arrays, too?

This sounds like you want to have a one-to-many relationship between
your record and its associated files. What you need to do is create a
separate table that will reference your record id:

CREATE TABLE associated_files (
   fileid    serial  PRIMARY KEY,
   recordid  int     REFERENCES master_records (id),
   filepath  text    NOT NULL
);

Does that help?

--Andrew Vit