Re: to BLOB or not to BLOB - Mailing list pgsql-general

From KuroiNeko
Subject Re: to BLOB or not to BLOB
Date
Msg-id 3AFFB834.nail8N11QH45@ed.ed
Whole thread Raw
In response to to BLOB or not to BLOB  (Aristide Aragon <aristide@lionking.org>)
List pgsql-general
> I am going  to write an application tht writes big  amounts of plain text
into a database.

 What kind of data are those? Articles? News/mail messages? Other?

> I thought of using  the text type for this, but I don't  know if it has a
maxlenght, and given that these will  be very long texts I started wondered
if these would have to be blobs... but they aren't binary.

 Well,  plain text  is a  `subset' of  `binary,' so  storing it  as a  BLOB
shouldn't be an issue. My home-brewn email archive stores message bodies as
BLOBs with absolutely no problem.
 On length limit of text type field,  for older versions of PGSQL it should
be less  that page size  (default 8 KB,  configurable at compile  time). In
TOASTed PGSQL there's probably no limit set by DBMS itself.

> So, does the text daya type have a maxium lenght? Do BLOBs? What should I
use? For the time being, at least,  I won't be searching these texts with a
search  engine or  anything... but  if I  were to  be, what  considerations
should I take into account when designing the tables?

 As you already figured it, the answer  to your question depends on what do
you want from  your system. Maybe you'll  be OK with BLOBs but  you have to
keep in  mind that  pg_dump couldn't  handle them and  you need  to perform
somewhat tricky things to backup and restore PGSQL DB with BLOBs.
 Another  thing to  consider is  searchability and  indexes. If  you really
don't want (and never going to) search  you big fields, your best way would
be to  store just file names  and to keep  long chunks of text  in external
files. Sure,  you can  store them  in TOASTed fields  if you  need indexes,
but  _IMHO_,  this is  yet  to  be proven  that  simplicity  of design  and
implementation is  worth _possiblie_ preformance degradation.  I believe we
still have to see good and fast index built on a 100 KB text field.
 Maybe what  you need  is FTS, because  IMNSHO, there's no  much use  of an
attribute that can't be indexed  and searched (updated, joined, grouped) on
swiftly. Everything relatively  small and fast to process goes  to DB, huge
portions of not-easily-indexable data should stay outside.
 Actually, FTS is a  last resort, kind of. You may wish  to split your text
into smaller fragments with similar semantical (logical, whatever) load.


--

 ������������������


pgsql-general by date:

Previous
From: Alexander Lohse
Date:
Subject: Speeding up Query
Next
From: Vince Vielhaber
Date:
Subject: Re: Invoices