Re: Toast, Text, blob bytea Huh? - Mailing list pgsql-hackers

From Joe Conway
Subject Re: Toast, Text, blob bytea Huh?
Date
Msg-id 02a101c12c0e$37856d60$48d210ac@jecw2k1
Whole thread Raw
In response to Toast, Text, blob bytea Huh?  (jason.ory@ndchealth.com)
Responses Re: Toast, Text, blob bytea Huh?
List pgsql-hackers
> I'm trying my best to convert from MySQL to PgSQL but I cant get a good
> clear answer about
> certian issures.Mainly  TEXT, TOAST,BLOB , BYTEA etc.
> It was an easy task in mysql but everything in the archives about , text ,
> toast and bytea is just
> confusing me with postgresql. I have Bruces's book and I've searched the

Quick glossary:

TEXT is a datatype which stores character data of unspecified length (up to
the max value of a 4 byte integer in length, although I've seen comments
indicating that the practical limit is closer to 1 GB -- not sure why). TEXT
is not intended to hold arbitrary binary data. If you want to store binary
in a text column, encode it to hex or base64 or something first.

TOAST is an internal database concept. If I understand it correctly, it
refers to a combination of compression and out-of-line storage for large
length values of a charater datatype. This happens transparently to you.

BLOB is otherwise known as LO or Large Object datatype in PostgreSQL. These
are always stored out-of-line, I don't believe they are compressed, and they
have their own special access methods (for dealing with data a "chunk" at a
time).

BYTEA is very similar to TEXT, except that it is intended for binary data. I
recently posted a PHP function which escapes binary in order to allow
inserting it into a bytea column (on the pgsql-general list). At a minimum
there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and
92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134
respectively (actually 39 could be \' and 92 could be \\\\, but it is
simpler to be consistent).

> THIS IS WHAT I CANT SEEM TO FIGURE OUT IN POSTGRESQL
> 1. I cant get a clear answer on what kind of data type to use for my large
> text string?  TEXT, ???, ??? or something about TOAST
> I have seen in the e-mail archive but cant find any documentaion?

So, you can use TEXT if you encode to hex or base64 in your app first, or
you bytea if you escape as I described above in your app. Or you can use the
LO functions to manipulate large objects (see
http://www.postgresql.org/idocs/index.php?lo-interfaces.html).

>
> 2. I've written my own escape method ,("cant find one for Pgsql") , BUT i
> don't know what
> to escape and not to escape. So it keeps failing. I cand find any docs. on
> what to escape either?

See above.

>
>
> SUMMARY
> What is the best datatype to use, for large raw text and/or  binary if i
> choose?
> Once I know this,how Im a supposed to escape my string and get it through
> the parser correctly so i can retrieve it correctly?

If you use TEXT, you will have to decode the hex/base64 back into binary in
your app. Similarly, if you use bytea, although stored as binary, the data
is returned with "unprintable" values escaped as octals*. Your app will have
to decode the octals back into binary.

*NOTE to hackers: is there a good reason for this? ISTM that the client
should be responsible for any encoding needed when bytea is returned. The
server should return bytea as straight varlena.

If you use LO, you have to use the interface functions instead of standard
SQL.

Hope this helps,

-- Joe



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Assessment on namespace clean include file names
Next
From: Ned Wolpert
Date:
Subject: Re: [JDBC] New backend functions? [was Re: JDBC changes for 7.2.