Thread: Bytea vs. BLOB (what's the motivation behind the former?)
Leading up to 7.2 there was quite some noise on both pgsql-general and hackers regarding the new bytea type. It *appears* that this is now the recommended choice for binary data over BLOBs. I didn't manage to dig up an explanation though why bytea would be better than BLOB - besides that the interface to deal with large objects is somewhat more convoluted. Is this all about the cleaner interface? I also saw that bytea is a proprietary Postgres type, a replacement(?) for the SQL99 BLOB type. Does this mean that bytea will eventually supersede the BLOB type in Postgres? Is bytea faster? This probably has all been explained before somewhere but I dug through the archives till about mid-2001 and couldn't find a clear explanation. Regards, Frank
Frank Joerdens wrote: > Leading up to 7.2 there was quite some noise on both pgsql-general and > hackers regarding the new bytea type. It *appears* that this is now the > recommended choice for binary data over BLOBs. I didn't manage to dig up > an explanation though why bytea would be better than BLOB - besides that > the interface to deal with large objects is somewhat more convoluted. > > Is this all about the cleaner interface? I also saw that bytea is a > proprietary Postgres type, a replacement(?) for the SQL99 BLOB type. > Does this mean that bytea will eventually supersede the BLOB type in > Postgres? Is bytea faster? > > This probably has all been explained before somewhere but I dug through the > archives till about mid-2001 and couldn't find a clear explanation. > Bytea is not actually new, but it was poorly documented and lacked a good bit of functionality before 7.2. My take is that bytea is much closer to SQL99 BLOB than PostgreSQL large objects. See table 3.9 at: http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype-binary.html Also see function and operator support at: http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-binarystring.html Each type (bytea and LO) has its own advantages. For instance, bytea manipulates the entire string in memory (up to 4 copies from what I understand), which may be a problem if your data is very large (I've only tried to store 10s of MB in bytea fields myself). But I do find the LO interface a bit more convoluted and prefer the bytea approach myself. I haven't done any testing wrt speed, but would be very interested in hearing from others if they have. Hope this helps, Joe
On Thu, Mar 28, 2002 at 07:00:30AM -0800, Joe Conway wrote: > Frank Joerdens wrote: [ . . . ] > Bytea is not actually new, but it was poorly documented and lacked a > good bit of functionality before 7.2. My take is that bytea is much > closer to SQL99 BLOB than PostgreSQL large objects. See table 3.9 at: > http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype-binary.html > > Also see function and operator support at: > http://www.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-binarystring.html Ah. I hadn't seen that. Doesn't that fact that you can encode the entire binary string to ASCII-only mean then that you don't have to rely on the -b option with pg_dump, i.e. that you can dump to text files? This would mean that we'd have a very clear advantage over LO in that you can use pg_dumpall again to backup your entire server (pg_dumpall can't deal with blobs because it needs to create text output which is then concatenated). What I don't understand yet is if that also applies if you don't use ASCII-only encoding (how is data representet internally without it?) and when you'd decide to encode or not, or if the fact that you can dump to text would be sufficient grounds to decide to encode everything bytea to ASCII-only. > > Each type (bytea and LO) has its own advantages. For instance, bytea > manipulates the entire string in memory This probably means that it's faster for data that fits into physical memory at least because the OS, and Postgres, can do they their caching properly (I would presume that any sort of caching mechanism is voided if the app requires data to be written to disk straight away while it's fed into it, as it appears to be the case with LOs). The approach sounds more friendly on the whole (provided the data doesn't get too large, as you mention), so I'll give it a spin. Regards, Frank
Frank Joerdens wrote: > concatenated). What I don't understand yet is if that also applies if > you don't use ASCII-only encoding (how is data representet internally > without it?) and when you'd decide to encode or not, or if the fact > that you can dump to text would be sufficient grounds to decide to > encode everything bytea to ASCII-only. > Actually, since bytea is truly treating the data as a string of bytes, there is no notion of encoding at all. An input octet, say '\000' is turned into exactly one byte with a value of 0. In fact, the major diffenence between the "normal" (text, varchar, etc) functions and the bytea ones is that the bytea ones have the multi-byte encoding specific code removed. A small experiment shows what the dump output would look like: test=# CREATE TABLE foo2 (f1 bytea); CREATE test=# insert into foo2 values('\\003\\002\\001\\000abcdefg\\377'); INSERT 16594 1 test=# select f1 from foo2; f1 ----------------------------- \003\002\001\000abcdefg\377 (1 row) test=# select length(f1) from foo2; length -------- 12 (1 row) test=# \q [postgres@jec-linux postgres]$ pg_dump -t foo2 test -- -- Selected TOC Entries: -- \connect - postgres -- -- TOC Entry ID 2 (OID 16589) -- -- Name: foo2 Type: TABLE Owner: postgres -- CREATE TABLE "foo2" ( "f1" bytea ); -- -- Data for TOC Entry ID 3 (OID 16589) -- -- Name: foo2 Type: TABLE DATA Owner: postgres -- COPY "foo2" FROM stdin; \\003\\002\\001\\000abcdefg\\377 \. So it seems you can avoid a pg_dump -b by using bytea. Regards, Joe