Thread: storing binary data

storing binary data

From
"Jason Orendorff"
Date:
Reply-To: sender

Hi.  I was surprised to discover today that postgres's
character types don't support zero bytes.  That is,
Postgres isn't 8-bit clean.  Why is that?

More to the point, I need to store about 1k bytes per row
of varying-length 8-bit binary data.  I have a few options:
+ BLOBs.  PostgreSQL BLOBs make me nervous.  I worry about  the BLOB not being deleted when the corresponding row in
thetable is deleted.  The documentation is vague.
 
+ What I really need is a binary *short* object type.  I have heard rumors of a legendary "bytea" type that might  help
me,but it doesn't appear to be documented anywhere,  so I hesitate to use it.
 
+ I can base64-encode the data and store it in a "text"  field.  But postgres is a great big data-storage system;
surelyit can store binary data without resorting to  this kind of hack.
 

What should I do?  Please help.  Thanks!

-- 
Jason Orendorff

P.S.  I would love to help improve PostgreSQL's documentation.     Whom should I contact?



Re: storing binary data

From
Doug McNaught
Date:
"Jason Orendorff" <jason@jorendorff.com> writes:

> Reply-To: sender

Just to be nice, I'll do this. ;)

> Hi.  I was surprised to discover today that postgres's
> character types don't support zero bytes.  That is,
> Postgres isn't 8-bit clean.  Why is that?

As I understand it, the storage system itself is 8-bit clean; it's the
parser layer that isn't (as it uses C strings everywhere).  

> More to the point, I need to store about 1k bytes per row
> of varying-length 8-bit binary data.  I have a few options:
> 
>  + BLOBs.  PostgreSQL BLOBs make me nervous.  I worry about
>    the BLOB not being deleted when the corresponding row in
>    the table is deleted.  The documentation is vague.

This is an issue.  There is definitely no automatic deletion of
LOs. There is a 'vacuumlo' program in contrib/ that may be useful, or
you can roll your own, or you can use triggers to make sure LOs get
deleted.

FWIW, I've been using LOBs in a couple of applications and haven't had 
too much trouble. 

>  + What I really need is a binary *short* object type.
>    I have heard rumors of a legendary "bytea" type that might
>    help me, but it doesn't appear to be documented anywhere,
>    so I hesitate to use it.

It is in 7.1, but is more fully documented in 7.2 (which is entering
beta).  See:

http://candle.pha.pa.us/main/writings/pgsql/sgml/datatype-binary.html

>  + I can base64-encode the data and store it in a "text"
>    field.  But postgres is a great big data-storage system;
>    surely it can store binary data without resorting to
>    this kind of hack.

Since the only way to store or retrieve non-LOB data is to go through
the SQL parser, you always have to do some escaping.  The link above
tells you how to do it for 'bytea' without having to go the base64
route. 

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: storing binary data

From
Barry Lind
Date:
Jason,

BLOBs as you have correctly inferred do not get automatically deleted. 
You can add triggers to your tables to delete them automatically if you 
so desire.

However 'bytea' is the datatype that is most appropriate for your needs.  It has been around for a long time, but not
welldocumented.  I have 
 
been using it in my code since 7.0 of postgres and it works fine.  In 
fact many of the internal postgres tables use it.

The problem with bytea is that many of the client interfaces don't 
support it well or at all.  So depending on how you intend to access the 
data you may not be able to use the bytea datatype.  The situation is 
much improved in 7.2 with bytea documented and better support for it in 
the client interfaces (jdbc especially).

Encoding the data into a text format will certainly work, if you can't 
work around the current limitations of the above two options.  And I 
believe there is some contrib code to help in this area.

thanks,
--Barry



Jason Orendorff wrote:

> Reply-To: sender
> 
> Hi.  I was surprised to discover today that postgres's
> character types don't support zero bytes.  That is,
> Postgres isn't 8-bit clean.  Why is that?
> 
> More to the point, I need to store about 1k bytes per row
> of varying-length 8-bit binary data.  I have a few options:
> 
>  + BLOBs.  PostgreSQL BLOBs make me nervous.  I worry about
>    the BLOB not being deleted when the corresponding row in
>    the table is deleted.  The documentation is vague.
> 
>  + What I really need is a binary *short* object type.
>    I have heard rumors of a legendary "bytea" type that might
>    help me, but it doesn't appear to be documented anywhere,
>    so I hesitate to use it.
> 
>  + I can base64-encode the data and store it in a "text"
>    field.  But postgres is a great big data-storage system;
>    surely it can store binary data without resorting to
>    this kind of hack.
> 
> What should I do?  Please help.  Thanks!
> 
> 




Re: storing binary data

From
Alex Pilosov
Date:
Use bytea. Search archives.

On Sun, 21 Oct 2001, Jason Orendorff wrote:

> Reply-To: sender
> 
> Hi.  I was surprised to discover today that postgres's
> character types don't support zero bytes.  That is,
> Postgres isn't 8-bit clean.  Why is that?
> 
> More to the point, I need to store about 1k bytes per row
> of varying-length 8-bit binary data.  I have a few options:
> 
>  + BLOBs.  PostgreSQL BLOBs make me nervous.  I worry about
>    the BLOB not being deleted when the corresponding row in
>    the table is deleted.  The documentation is vague.
> 
>  + What I really need is a binary *short* object type.
>    I have heard rumors of a legendary "bytea" type that might
>    help me, but it doesn't appear to be documented anywhere,
>    so I hesitate to use it.
> 
>  + I can base64-encode the data and store it in a "text"
>    field.  But postgres is a great big data-storage system;
>    surely it can store binary data without resorting to
>    this kind of hack.
> 
> What should I do?  Please help.  Thanks!
> 
> 



Re: storing binary data

From
Tom Lane
Date:
"Jason Orendorff" <jason@jorendorff.com> writes:
> Hi.  I was surprised to discover today that postgres's
> character types don't support zero bytes.  That is,
> Postgres isn't 8-bit clean.  Why is that?

(a) because all our datatype I/O interfaces are based on C-style   (null terminated) strings

(b) because comparison of character datatypes is based on strcoll()   (at least if you compiled with locale support)

Fixing either of these is far more pain than is justified to allow
people to store non-textual data in textual datatypes.  I don't foresee
it happening.

>  + What I really need is a binary *short* object type.
>    I have heard rumors of a legendary "bytea" type that might
>    help me, but it doesn't appear to be documented anywhere,
>    so I hesitate to use it.

It's real and it's not going away.  It is pretty poorly documented
and doesn't have a wide variety of functions ... but hey, you can help
improve that situation.  This is an open source project after all ;-)
        regards, tom lane


Re: storing binary data

From
Joe Conway
Date:
>> + What I really need is a binary *short* object type.
>>   I have heard rumors of a legendary "bytea" type that might
>>   help me, but it doesn't appear to be documented anywhere,
>>   so I hesitate to use it.
>>
> 
> It's real and it's not going away.  It is pretty poorly documented
> and doesn't have a wide variety of functions ... but hey, you can help
> improve that situation.  This is an open source project after all ;-)
> 
>             regards, tom lane

I'll take a shot at improving the documentation for bytea. I'm hoping 
documentation patches are accepted during beta though ;-)

Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, 
||, trim(), substring(), position(), length(), indexing, and various 
comparators.

Joe



Re: storing binary data

From
Thomas Lockhart
Date:
...
> I'll take a shot at improving the documentation for bytea. I'm hoping
> documentation patches are accepted during beta though ;-)

Always. At least up until a week or so before release, when we need to
firm up the docs and work on final cleanup etc. There are several
announcements leading up to that point, so it will not be a suprise.
                          - Thomas


Re: storing binary data

From
Tom Lane
Date:
Joe Conway <joseph.conway@home.com> writes:
> I'll take a shot at improving the documentation for bytea. I'm hoping 
> documentation patches are accepted during beta though ;-)

Of course.  The only limitation we place during beta is "no new features
added".  I plan to spend a good deal of time on the docs during beta
myself.
        regards, tom lane


Re: storing binary data

From
Peter Eisentraut
Date:
Jason Orendorff writes:

> Hi.  I was surprised to discover today that postgres's
> character types don't support zero bytes.  That is,
> Postgres isn't 8-bit clean.  Why is that?

PostgreSQL is 8-bit clean.  The character types don't support zero bytes
because the character types store characters, not bytes.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: storing binary data

From
"Christopher Kings-Lynne"
Date:
Quick question - I couldn't find this in the docs:

What exactly is the advantage in using VIEWs?  I get the impression that the
SELECT query it is based on is cached (ie. a cached query plan).

But, is this cached between db restarts, between connections, etc.  Is it
cached upon the first use of the view for a db instance for a particular
connection, etc?

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Thursday, 25 October 2001 1:00 AM
> To: Joe Conway
> Cc: Jason Orendorff; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] storing binary data
>
>
> Joe Conway <joseph.conway@home.com> writes:
> > I'll take a shot at improving the documentation for bytea. I'm hoping
> > documentation patches are accepted during beta though ;-)
>
> Of course.  The only limitation we place during beta is "no new features
> added".  I plan to spend a good deal of time on the docs during beta
> myself.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>



Re: storing binary data

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> What exactly is the advantage in using VIEWs?

A level of logical indirection between the application and the physical
data schema.  There are no performance benefits.

> I get the impression that the
> SELECT query it is based on is cached (ie. a cached query plan).

Nope.  If there's something in the docs that makes you think so,
point out so I can fix it ;-)
        regards, tom lane


Re: storing binary data

From
"Christopher Kings-Lynne"
Date:
> > I get the impression that the
> > SELECT query it is based on is cached (ie. a cached query plan).
>
> Nope.  If there's something in the docs that makes you think so,
> point out so I can fix it ;-)

Hmmm...I could have sworn that you mentioned in passing something about
cached query plans and VIEWs - I must have been in dream land.

Chris



Re: storing binary data

From
Lincoln Yeoh
Date:
>
>I'll take a shot at improving the documentation for bytea. I'm hoping 
>documentation patches are accepted during beta though ;-)
>
>Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, 
>||, trim(), substring(), position(), length(), indexing, and various 
>comparators.
>

Cool!

Would it be practical to use substring for retrieving chunks of binary data
in manageable sizes? Or would the overheads be too high?

Cheerio,
Link.



Re: storing binary data

From
Alessio Bragadini
Date:
Christopher Kings-Lynne wrote:

> What exactly is the advantage in using VIEWs?  I get the impression that the
> SELECT query it is based on is cached (ie. a cached query plan).

I had the same impression but I've been told (with explanations) that
the query plan for a view is not cached in any way.

-- 
Alessio F. Bragadini        alessio@albourne.com
APL Financial Services        http://village.albourne.com
Nicosia, Cyprus             phone: +357-2-755750

"It is more complicated than you think"    -- The Eighth Networking Truth from RFC 1925


Re: storing binary data

From
Joe Conway
Date:
Lincoln Yeoh wrote:

>>Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE, 
>>||, trim(), substring(), position(), length(), indexing, and various 
>>comparators.
>>
>>
> 
> Cool!
> 
> Would it be practical to use substring for retrieving chunks of binary data
> in manageable sizes? Or would the overheads be too high?
> 
> Cheerio,
> Link.

I haven't done any performance testing, but it should be no different 
than the substring function used on TEXT fields. Try it out and let us 
know ;-)

-- Joe