Thread: Bytea vs. BLOB (what's the motivation behind the former?)

Bytea vs. BLOB (what's the motivation behind the former?)

From
Frank Joerdens
Date:
Leading up to 7.2 there was quite some noise on both pgsql-general and
hackers regarding the new bytea type. It *appears* that this is now the
recommended choice for binary data over BLOBs. I didn't manage to dig up
an explanation though why bytea would be better than BLOB - besides that
the interface to deal with large objects is somewhat more convoluted.

Is this all about the cleaner interface? I also saw that bytea is a
proprietary Postgres type, a replacement(?) for the SQL99 BLOB type.
Does this mean that bytea will eventually supersede the BLOB type in
Postgres? Is bytea faster?

This probably has all been explained before somewhere but I dug through the
archives till about mid-2001 and couldn't find a clear explanation.

Regards, Frank

Re: Bytea vs. BLOB (what's the motivation behind the former?)

From
Joe Conway
Date:
Frank Joerdens wrote:
> Leading up to 7.2 there was quite some noise on both pgsql-general and
> hackers regarding the new bytea type. It *appears* that this is now the
> recommended choice for binary data over BLOBs. I didn't manage to dig up
> an explanation though why bytea would be better than BLOB - besides that
> the interface to deal with large objects is somewhat more convoluted.
>
> Is this all about the cleaner interface? I also saw that bytea is a
> proprietary Postgres type, a replacement(?) for the SQL99 BLOB type.
> Does this mean that bytea will eventually supersede the BLOB type in
> Postgres? Is bytea faster?
>
> This probably has all been explained before somewhere but I dug through the
> archives till about mid-2001 and couldn't find a clear explanation.
>

Bytea is not actually new, but it was poorly documented and lacked a
good bit of functionality before 7.2. My take is that bytea is much
closer to SQL99 BLOB than PostgreSQL large objects. See table 3.9 at:
http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype-binary.html

Also see function and operator support at:
http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-binarystring.html

Each type (bytea and LO) has its own advantages. For instance, bytea
manipulates the entire string in memory (up to 4 copies from what I
understand), which may be a problem if your data is very large (I've
only tried to store 10s of MB in bytea fields myself). But I do find the
LO interface a bit more convoluted and prefer the bytea approach myself.

I haven't done any testing wrt speed, but would be very interested in
hearing from others if they have.

Hope this helps,

Joe




Bytea, ASCII-only encoding and pg_dumpall (Was: Bytea vs. BLOB)

From
Frank Joerdens
Date:
On Thu, Mar 28, 2002 at 07:00:30AM -0800, Joe Conway wrote:
> Frank Joerdens wrote:
[ . . . ]
> Bytea is not actually new, but it was poorly documented and lacked a
> good bit of functionality before 7.2. My take is that bytea is much
> closer to SQL99 BLOB than PostgreSQL large objects. See table 3.9 at:
> http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype-binary.html
>
> Also see function and operator support at:
> http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-binarystring.html

Ah. I hadn't seen that. Doesn't that fact that you can encode the entire
binary string to ASCII-only mean then that you don't have to rely on the
-b option with pg_dump, i.e. that you can dump to text files? This would
mean that we'd have a very clear advantage over LO in that you can use
pg_dumpall again to backup your entire server (pg_dumpall can't deal with
blobs because it needs to create text output which is then
concatenated). What I don't understand yet is if that also applies if
you don't use ASCII-only encoding (how is data representet internally
without it?) and when you'd decide to encode or not, or if the fact that
you can dump to text would be sufficient grounds to decide to encode
everything bytea to ASCII-only.

>
> Each type (bytea and LO) has its own advantages. For instance, bytea
> manipulates the entire string in memory

This probably means that it's faster for data that fits into physical
memory at least because the OS, and Postgres, can do they their caching
properly (I would presume that any sort of caching mechanism is voided
if the app requires data to be written to disk straight away while it's
fed into it, as it appears to be the case with LOs).

The approach sounds more friendly on the whole (provided the data
doesn't get too large, as you mention), so I'll give it a spin.

Regards, Frank

Re: Bytea, ASCII-only encoding and pg_dumpall (Was: Bytea vs. BLOB)

From
Joe Conway
Date:
Frank Joerdens wrote:
 > concatenated). What I don't understand yet is if that also applies if
 > you don't use ASCII-only encoding (how is data representet internally
 > without it?) and when you'd decide to encode or not, or if the fact
 > that you can dump to text would be sufficient grounds to decide to
 > encode everything bytea to ASCII-only.
 >

Actually, since bytea is truly treating the data as a string of bytes,
there is no notion of encoding at all. An input octet, say '\000' is
turned into exactly one byte with a value of 0. In fact, the major
diffenence between the "normal" (text, varchar, etc) functions and the
bytea ones is that the bytea ones have the multi-byte encoding specific
code removed.

A small experiment shows what the dump output would look like:

test=# CREATE TABLE foo2 (f1 bytea);
CREATE
test=# insert into foo2 values('\\003\\002\\001\\000abcdefg\\377');
INSERT 16594 1
test=# select f1 from foo2;
              f1
-----------------------------
  \003\002\001\000abcdefg\377
(1 row)

test=# select length(f1) from foo2;
  length
--------
      12
(1 row)

test=# \q
[postgres@jec-linux postgres]$ pg_dump -t foo2 test
--
-- Selected TOC Entries:
--
\connect - postgres

--
-- TOC Entry ID 2 (OID 16589)
--
-- Name: foo2 Type: TABLE Owner: postgres
--

CREATE TABLE "foo2" (
         "f1" bytea
);

--
-- Data for TOC Entry ID 3 (OID 16589)
--
-- Name: foo2 Type: TABLE DATA Owner: postgres
--


COPY "foo2" FROM stdin;
\\003\\002\\001\\000abcdefg\\377
\.


So it seems you can avoid a pg_dump -b by using bytea.

Regards,
Joe