Thread: Toast, Text, blob bytea Huh?

Toast, Text, blob bytea Huh?

From
jason.ory@ndchealth.com
Date:
I'm trying my best to convert from MySQL to PgSQL but I cant get a good
clear answer about 
certian issures.Mainly  TEXT, TOAST,BLOB , BYTEA etc.
It was an easy task in mysql but everything in the archives about , text ,
toast and bytea is just
confusing me with postgresql. I have Bruces's book and I've searched the
archives years back with all the right keywords with not luck.Here is my
situation-->


WHAT I WAS DOING IN MYSQL
Via the web my clients are uploading basic  text/data files, sometimes >
than 30MB. In the past ,via CGI I have been parsing the file
into one STL string, using mysql_escape_string to escape it and then using
an INSERT  to place the   ,\'+"stlstring+"\' ,    into a BLOB column. 
"dont want to use a temp. file or files in general anywhere. The data will
always be passed via the database and buffers for certian reasons."Thus no
OID's


THIS IS WHAT I CANT SEEM TO FIGURE OUT IN POSTGRESQL
1. I cant get a clear answer on what kind of data type to use for my large
text string?  TEXT, ???, ??? or something about TOAST
I have seen in the e-mail archive but cant find any documentaion?

2. I've written my own escape method ,("cant find one for Pgsql") , BUT i
don't know what 
to escape and not to escape. So it keeps failing. I cand find any docs. on
what to escape either?


SUMMARY
What is the best datatype to use, for large raw text and/or  binary if i
choose? 
Once I know this,how Im a supposed to escape my string and get it through
the parser correctly so i can retrieve it correctly?

Thanks for your time.

PS: Using RedHat.



Jason H. Ory
Medprint+
Software Developer
jason.ory@ndchealth.com



Re: Toast, Text, blob bytea Huh?

From
Rene Pijlman
Date:
On Thu, 23 Aug 2001 13:47:01 -0400, you wrote:
>1. I cant get a clear answer on what kind of data type to use for my large
>text string?  TEXT, ???, ??? or something about TOAST
>I have seen in the e-mail archive but cant find any documentaion?

TOAST is not a data type, but a project that extended the
capacity of PostgreSQL from version 7.1 onwards to support
fields up to 1 GB in length and rows of (practically) unlimited
length. This makes TEXT and BYTEA good data types for storing
large character and binary data. In the hands of a marketing
department toasted data would be a great feature that justified
the release of version 8 all by itself :-)

However, unlike SQL3 Lobs, values of type TEXT and BYTEA are
always transfered as one unit to the client. As a Java
programmer I like that feature. A string is a string and a byte
array is a byte array, no matter what it's length may be.

Large Objects in PostgreSQL
(http://www.ca.postgresql.org/users-lounge/docs/7.1/programmer/largeobjects.html)
support file oriented access to large data. This may be a more
convenient programming model, depending on the client interface
that you use and the requirements of your application.

However, Large Objects in PostgreSQL are objects that exist
independently from the rows in which you hold a reference to
them. If you delete or update a row, your application may need
to delete certain Large Objects as well. In this respect,
PostgreSQL Large Objects differ from SQL3 Lobs.

>2. I've written my own escape method ,("cant find one for Pgsql") , BUT i
>don't know what 
>to escape and not to escape. So it keeps failing. I cand find any docs. on
>what to escape either?

I'm not sure what you mean by that. What characters would you
want to escape in what way and why?

And by the way, what client interface are you using?

Regards,
René Pijlman


Re: Toast, Text, blob bytea Huh?

From
"Joe Conway"
Date:
> I'm trying my best to convert from MySQL to PgSQL but I cant get a good
> clear answer about
> certian issures.Mainly  TEXT, TOAST,BLOB , BYTEA etc.
> It was an easy task in mysql but everything in the archives about , text ,
> toast and bytea is just
> confusing me with postgresql. I have Bruces's book and I've searched the

Quick glossary:

TEXT is a datatype which stores character data of unspecified length (up to
the max value of a 4 byte integer in length, although I've seen comments
indicating that the practical limit is closer to 1 GB -- not sure why). TEXT
is not intended to hold arbitrary binary data. If you want to store binary
in a text column, encode it to hex or base64 or something first.

TOAST is an internal database concept. If I understand it correctly, it
refers to a combination of compression and out-of-line storage for large
length values of a charater datatype. This happens transparently to you.

BLOB is otherwise known as LO or Large Object datatype in PostgreSQL. These
are always stored out-of-line, I don't believe they are compressed, and they
have their own special access methods (for dealing with data a "chunk" at a
time).

BYTEA is very similar to TEXT, except that it is intended for binary data. I
recently posted a PHP function which escapes binary in order to allow
inserting it into a bytea column (on the pgsql-general list). At a minimum
there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and
92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134
respectively (actually 39 could be \' and 92 could be \\\\, but it is
simpler to be consistent).

> THIS IS WHAT I CANT SEEM TO FIGURE OUT IN POSTGRESQL
> 1. I cant get a clear answer on what kind of data type to use for my large
> text string?  TEXT, ???, ??? or something about TOAST
> I have seen in the e-mail archive but cant find any documentaion?

So, you can use TEXT if you encode to hex or base64 in your app first, or
you bytea if you escape as I described above in your app. Or you can use the
LO functions to manipulate large objects (see
http://www.postgresql.org/idocs/index.php?lo-interfaces.html).

>
> 2. I've written my own escape method ,("cant find one for Pgsql") , BUT i
> don't know what
> to escape and not to escape. So it keeps failing. I cand find any docs. on
> what to escape either?

See above.

>
>
> SUMMARY
> What is the best datatype to use, for large raw text and/or  binary if i
> choose?
> Once I know this,how Im a supposed to escape my string and get it through
> the parser correctly so i can retrieve it correctly?

If you use TEXT, you will have to decode the hex/base64 back into binary in
your app. Similarly, if you use bytea, although stored as binary, the data
is returned with "unprintable" values escaped as octals*. Your app will have
to decode the octals back into binary.

*NOTE to hackers: is there a good reason for this? ISTM that the client
should be responsible for any encoding needed when bytea is returned. The
server should return bytea as straight varlena.

If you use LO, you have to use the interface functions instead of standard
SQL.

Hope this helps,

-- Joe



Re: Toast, Text, blob bytea Huh?

From
Peter T Mount
Date:
Quoting Joe Conway <joseph.conway@home.com>:

> TEXT is a datatype which stores character data of unspecified length (up
> to
> the max value of a 4 byte integer in length, although I've seen
> comments
> indicating that the practical limit is closer to 1 GB -- not sure why).

It may be something to do with the 1Gb splitting of the physical files 
representing a table... Unless it changed recently, a table was split over 
multiple files at the 1Gb mark.

Peter

-- 
Peter Mount peter@retep.org.uk
PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/
RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/


Re: Toast, Text, blob bytea Huh?

From
Tom Lane
Date:
Peter T Mount <peter@retep.org.uk> writes:
> Quoting Joe Conway <joseph.conway@home.com>:
>> indicating that the practical limit is closer to 1 GB -- not sure why).

> It may be something to do with the 1Gb splitting of the physical files 
> representing a table...

No, that's just a coincidence.  The reason that TOAST limits fields to
1Gb is that the high-order two bits of the varlena length word were
commandeered as TOAST state indicators.  There are now only thirty bits
available to represent the length of a variable-length Datum; hence the
hard limit on field width is 1Gb.

I'd think that the "practical" limit is quite a bit less than that, at
least until we devise an API that lets you read and write toasted values
in sections.
        regards, tom lane


Re: Toast, Text, blob bytea Huh?

From
Bruce Momjian
Date:
> No, that's just a coincidence.  The reason that TOAST limits fields to
> 1Gb is that the high-order two bits of the varlena length word were
> commandeered as TOAST state indicators.  There are now only thirty bits
> available to represent the length of a variable-length Datum; hence the
> hard limit on field width is 1Gb.
> 
> I'd think that the "practical" limit is quite a bit less than that, at
> least until we devise an API that lets you read and write toasted values
> in sections.

Yes, passing around multi-gigabytes memory chunks in a process is pretty
slow.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Toast, Text, blob bytea Huh?

From
Jan Wieck
Date:
Peter T Mount wrote:
> Quoting Joe Conway <joseph.conway@home.com>:
>
> > TEXT is a datatype which stores character data of unspecified length (up
> > to
> > the max value of a 4 byte integer in length, although I've seen
> > comments
> > indicating that the practical limit is closer to 1 GB -- not sure why).
>
> It may be something to do with the 1Gb splitting of the physical files
> representing a table... Unless it changed recently, a table was split over
> multiple files at the 1Gb mark.
   No,  it's  because  the  upper  two bits of the variable size   field are used as flags.
   But in practice there are other limits that force you to keep   the objects you throw into text or bytea fields alot
smaller.  When your INSERT query is received,  parsed,  planned  and  a   heap  tuple  created,  there are at least
fourcopies of that   object in the backends memory. How much virtual  memory  does   your OS support for one single
process?
   And  by  the  way,  TOAST is not only used for character data   types.  All variable size data types in the base
system are   toastable. Well, arrays might be considered sort of pop-tarts   here, but anyway, they get toasted.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: Toast, Text, blob bytea Huh?

From
ryan_rs@c4.com (Ryan)
Date:
joseph.conway@home.com ("Joe Conway") wrote in message news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>...
... snip
> BYTEA is very similar to TEXT, except that it is intended for binary data. I
> recently posted a PHP function which escapes binary in order to allow
> inserting it into a bytea column (on the pgsql-general list). At a minimum
> there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and
> 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134
> respectively (actually 39 could be \' and 92 could be \\\\, but it is
> simpler to be consistent).
... snip

Is it actually necessary to escape \012 (linefeed) in a query?  My
brief testing using psql or python pygresql seemed to work ok with
only \000, \', and \\ escaped.  Gosh, maybe all my data is corrupted
(!!)

Ryan


Re: Toast, Text, blob bytea Huh?

From
"Joe Conway"
Date:
> joseph.conway@home.com ("Joe Conway") wrote in message
news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>...
> ... snip
> > BYTEA is very similar to TEXT, except that it is intended for binary
data. I
> > recently posted a PHP function which escapes binary in order to allow
> > inserting it into a bytea column (on the pgsql-general list). At a
minimum
> > there are 4 characters which must be escaped. They are ACSII 0, 10, 39,
and
> > 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and
\\134
> > respectively (actually 39 could be \' and 92 could be \\\\, but it is
> > simpler to be consistent).
> ... snip
>
> Is it actually necessary to escape \012 (linefeed) in a query?  My
> brief testing using psql or python pygresql seemed to work ok with
> only \000, \', and \\ escaped.  Gosh, maybe all my data is corrupted
> (!!)
>

Sorry the response is a week late, but your post just hit the list (at least
I just got it). I found after sending this, that the problem with linefeeds
was in my PHP code, so you should be OK :-)

Sorry for the confusion I may have caused!

Joe




Re: Toast, Text, blob bytea Huh?

From
Bruce Momjian
Date:
> joseph.conway@home.com ("Joe Conway") wrote in message news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>...
> ... snip
> > BYTEA is very similar to TEXT, except that it is intended for binary data. I
> > recently posted a PHP function which escapes binary in order to allow
> > inserting it into a bytea column (on the pgsql-general list). At a minimum
> > there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and
> > 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134
> > respectively (actually 39 could be \' and 92 could be \\\\, but it is
> > simpler to be consistent).
> ... snip
> 
> Is it actually necessary to escape \012 (linefeed) in a query?  My
> brief testing using psql or python pygresql seemed to work ok with
> only \000, \', and \\ escaped.  Gosh, maybe all my data is corrupted
> (!!)

The linefeed escape was reported by a PHP users and perhaps there is an
issue with PHP only.  Not sure.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026