Re: How does TOAST compare to other databases' mechanisms? - Mailing list pgsql-general

From Tom Lane
Subject Re: How does TOAST compare to other databases' mechanisms?
Date
Msg-id 21428.971065297@sss.pgh.pa.us
Whole thread Raw
In response to Re: How does TOAST compare to other databases' mechanisms?  (Frank Joerdens <frank@joerdens.de>)
Responses Re: How does TOAST compare to other databases' mechanisms?  (Philip Warner <pjw@rhyme.com.au>)
List pgsql-general
Frank Joerdens <frank@joerdens.de> writes:
> another recent thread:

>>>>> For more than 10,000 BLOBs, the database was a clear winner.
>>>>> That is, it took less time to locate and read an 8k-16k BLOB
>>>>> from the MySQL database than from the local disk.  For smaller
>>>>> numbers of BLOBs, the filesystem was faster.
>>
>> This is undoubtedly right for existing Postgres releases, because
>> currently PG stores each BLOB as an independent table --- and just to
>> add insult to injury, creates an index for it :-(.  So you actually have
>> *two* added files in the DB directory per BLOB.  Needless to say, this
>> does not scale real well to large numbers of BLOBs.

> . . . Tom goes on to say that TOAST is a solution to most of these
> issues whilst a better BLOB implementation is waiting in the wings
> (scheduled for 7.1 also)

BTW, just in case I wasn't being perfectly clear: TOAST doesn't depend
on the existing table-per-BLOB implementation of "large objects".
All TOAST-created BLOBs are stored in auxiliary tables, where there is
one auxiliary table per normal table that contains TOASTable columns.
So TOAST shouldn't create problems with directory bloat, even if we
don't get around to fixing the large-object problem for 7.1.  (I still
hope to check over and commit Denis Perchine's proposed fix for LOs,
though.)

> Which leads to another question in the context of my little content
> management app: Part of it is composed of smaller and larger texts,
> for which TOAST will be great, because I can still search them,
> they're not opaque for a, for instance, LIKE query parameter in a
> SELECT statement (right?), like BLOBs, if I want to search for texts
> that contain particular words or statements (this will probably be
> slow if the texts are big and a lot of them are included in the
> query). But I also have images, most of which will _probably_ not
> exceed a few hundred K. At which point do I cross the line where TOAST
> becomes silly and BLOBs are The Way?

Good question.  The answer probably depends on operational experience
that we don't have yet (in other words, please try it both ways and
report back ;-)).  However, I'd venture the following observation:
using special BLOB syntax makes the most sense if you either need to
access sub-sections of a BLOB (because your BLOB is so big you really
don't want to fetch/store it as a unit) or need to link to the same
physical BLOB from multiple rows (again, this is only critical for
really big BLOBs).  It doesn't sound like either of these issues
applies to your problem.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: snapshot: dot or dash?
Next
From: John Tsombakos
Date:
Subject: Re: Trying to use PGSql with PHP