Thread: storing binary data
Reply-To: sender Hi. I was surprised to discover today that postgres's character types don't support zero bytes. That is, Postgres isn't 8-bit clean. Why is that? More to the point, I need to store about 1k bytes per row of varying-length 8-bit binary data. I have a few options: + BLOBs. PostgreSQL BLOBs make me nervous. I worry about the BLOB not being deleted when the corresponding row in thetable is deleted. The documentation is vague. + What I really need is a binary *short* object type. I have heard rumors of a legendary "bytea" type that might help me,but it doesn't appear to be documented anywhere, so I hesitate to use it. + I can base64-encode the data and store it in a "text" field. But postgres is a great big data-storage system; surelyit can store binary data without resorting to this kind of hack. What should I do? Please help. Thanks! -- Jason Orendorff P.S. I would love to help improve PostgreSQL's documentation. Whom should I contact?
"Jason Orendorff" <jason@jorendorff.com> writes: > Reply-To: sender Just to be nice, I'll do this. ;) > Hi. I was surprised to discover today that postgres's > character types don't support zero bytes. That is, > Postgres isn't 8-bit clean. Why is that? As I understand it, the storage system itself is 8-bit clean; it's the parser layer that isn't (as it uses C strings everywhere). > More to the point, I need to store about 1k bytes per row > of varying-length 8-bit binary data. I have a few options: > > + BLOBs. PostgreSQL BLOBs make me nervous. I worry about > the BLOB not being deleted when the corresponding row in > the table is deleted. The documentation is vague. This is an issue. There is definitely no automatic deletion of LOs. There is a 'vacuumlo' program in contrib/ that may be useful, or you can roll your own, or you can use triggers to make sure LOs get deleted. FWIW, I've been using LOBs in a couple of applications and haven't had too much trouble. > + What I really need is a binary *short* object type. > I have heard rumors of a legendary "bytea" type that might > help me, but it doesn't appear to be documented anywhere, > so I hesitate to use it. It is in 7.1, but is more fully documented in 7.2 (which is entering beta). See: http://candle.pha.pa.us/main/writings/pgsql/sgml/datatype-binary.html > + I can base64-encode the data and store it in a "text" > field. But postgres is a great big data-storage system; > surely it can store binary data without resorting to > this kind of hack. Since the only way to store or retrieve non-LOB data is to go through the SQL parser, you always have to do some escaping. The link above tells you how to do it for 'bytea' without having to go the base64 route. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Jason, BLOBs as you have correctly inferred do not get automatically deleted. You can add triggers to your tables to delete them automatically if you so desire. However 'bytea' is the datatype that is most appropriate for your needs. It has been around for a long time, but not welldocumented. I have been using it in my code since 7.0 of postgres and it works fine. In fact many of the internal postgres tables use it. The problem with bytea is that many of the client interfaces don't support it well or at all. So depending on how you intend to access the data you may not be able to use the bytea datatype. The situation is much improved in 7.2 with bytea documented and better support for it in the client interfaces (jdbc especially). Encoding the data into a text format will certainly work, if you can't work around the current limitations of the above two options. And I believe there is some contrib code to help in this area. thanks, --Barry Jason Orendorff wrote: > Reply-To: sender > > Hi. I was surprised to discover today that postgres's > character types don't support zero bytes. That is, > Postgres isn't 8-bit clean. Why is that? > > More to the point, I need to store about 1k bytes per row > of varying-length 8-bit binary data. I have a few options: > > + BLOBs. PostgreSQL BLOBs make me nervous. I worry about > the BLOB not being deleted when the corresponding row in > the table is deleted. The documentation is vague. > > + What I really need is a binary *short* object type. > I have heard rumors of a legendary "bytea" type that might > help me, but it doesn't appear to be documented anywhere, > so I hesitate to use it. > > + I can base64-encode the data and store it in a "text" > field. But postgres is a great big data-storage system; > surely it can store binary data without resorting to > this kind of hack. > > What should I do? Please help. Thanks! > >
Use bytea. Search archives. On Sun, 21 Oct 2001, Jason Orendorff wrote: > Reply-To: sender > > Hi. I was surprised to discover today that postgres's > character types don't support zero bytes. That is, > Postgres isn't 8-bit clean. Why is that? > > More to the point, I need to store about 1k bytes per row > of varying-length 8-bit binary data. I have a few options: > > + BLOBs. PostgreSQL BLOBs make me nervous. I worry about > the BLOB not being deleted when the corresponding row in > the table is deleted. The documentation is vague. > > + What I really need is a binary *short* object type. > I have heard rumors of a legendary "bytea" type that might > help me, but it doesn't appear to be documented anywhere, > so I hesitate to use it. > > + I can base64-encode the data and store it in a "text" > field. But postgres is a great big data-storage system; > surely it can store binary data without resorting to > this kind of hack. > > What should I do? Please help. Thanks! > >
"Jason Orendorff" <jason@jorendorff.com> writes: > Hi. I was surprised to discover today that postgres's > character types don't support zero bytes. That is, > Postgres isn't 8-bit clean. Why is that? (a) because all our datatype I/O interfaces are based on C-style (null terminated) strings (b) because comparison of character datatypes is based on strcoll() (at least if you compiled with locale support) Fixing either of these is far more pain than is justified to allow people to store non-textual data in textual datatypes. I don't foresee it happening. > + What I really need is a binary *short* object type. > I have heard rumors of a legendary "bytea" type that might > help me, but it doesn't appear to be documented anywhere, > so I hesitate to use it. It's real and it's not going away. It is pretty poorly documented and doesn't have a wide variety of functions ... but hey, you can help improve that situation. This is an open source project after all ;-) regards, tom lane
>> + What I really need is a binary *short* object type. >> I have heard rumors of a legendary "bytea" type that might >> help me, but it doesn't appear to be documented anywhere, >> so I hesitate to use it. >> > > It's real and it's not going away. It is pretty poorly documented > and doesn't have a wide variety of functions ... but hey, you can help > improve that situation. This is an open source project after all ;-) > > regards, tom lane I'll take a shot at improving the documentation for bytea. I'm hoping documentation patches are accepted during beta though ;-) Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, ||, trim(), substring(), position(), length(), indexing, and various comparators. Joe
... > I'll take a shot at improving the documentation for bytea. I'm hoping > documentation patches are accepted during beta though ;-) Always. At least up until a week or so before release, when we need to firm up the docs and work on final cleanup etc. There are several announcements leading up to that point, so it will not be a suprise. - Thomas
Joe Conway <joseph.conway@home.com> writes: > I'll take a shot at improving the documentation for bytea. I'm hoping > documentation patches are accepted during beta though ;-) Of course. The only limitation we place during beta is "no new features added". I plan to spend a good deal of time on the docs during beta myself. regards, tom lane
Jason Orendorff writes: > Hi. I was surprised to discover today that postgres's > character types don't support zero bytes. That is, > Postgres isn't 8-bit clean. Why is that? PostgreSQL is 8-bit clean. The character types don't support zero bytes because the character types store characters, not bytes. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Quick question - I couldn't find this in the docs: What exactly is the advantage in using VIEWs? I get the impression that the SELECT query it is based on is cached (ie. a cached query plan). But, is this cached between db restarts, between connections, etc. Is it cached upon the first use of the view for a db instance for a particular connection, etc? Chris > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane > Sent: Thursday, 25 October 2001 1:00 AM > To: Joe Conway > Cc: Jason Orendorff; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] storing binary data > > > Joe Conway <joseph.conway@home.com> writes: > > I'll take a shot at improving the documentation for bytea. I'm hoping > > documentation patches are accepted during beta though ;-) > > Of course. The only limitation we place during beta is "no new features > added". I plan to spend a good deal of time on the docs during beta > myself. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > What exactly is the advantage in using VIEWs? A level of logical indirection between the application and the physical data schema. There are no performance benefits. > I get the impression that the > SELECT query it is based on is cached (ie. a cached query plan). Nope. If there's something in the docs that makes you think so, point out so I can fix it ;-) regards, tom lane
> > I get the impression that the > > SELECT query it is based on is cached (ie. a cached query plan). > > Nope. If there's something in the docs that makes you think so, > point out so I can fix it ;-) Hmmm...I could have sworn that you mentioned in passing something about cached query plans and VIEWs - I must have been in dream land. Chris
> >I'll take a shot at improving the documentation for bytea. I'm hoping >documentation patches are accepted during beta though ;-) > >Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, >||, trim(), substring(), position(), length(), indexing, and various >comparators. > Cool! Would it be practical to use substring for retrieving chunks of binary data in manageable sizes? Or would the overheads be too high? Cheerio, Link.
Christopher Kings-Lynne wrote: > What exactly is the advantage in using VIEWs? I get the impression that the > SELECT query it is based on is cached (ie. a cached query plan). I had the same impression but I've been told (with explanations) that the query plan for a view is not cached in any way. -- Alessio F. Bragadini alessio@albourne.com APL Financial Services http://village.albourne.com Nicosia, Cyprus phone: +357-2-755750 "It is more complicated than you think" -- The Eighth Networking Truth from RFC 1925
Lincoln Yeoh wrote: >>Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, >>||, trim(), substring(), position(), length(), indexing, and various >>comparators. >> >> > > Cool! > > Would it be practical to use substring for retrieving chunks of binary data > in manageable sizes? Or would the overheads be too high? > > Cheerio, > Link. I haven't done any performance testing, but it should be no different than the substring function used on TEXT fields. Try it out and let us know ;-) -- Joe