Thread: Bytea/Base64 encoders for libpq - interested?

Bytea/Base64 encoders for libpq - interested?

From

Joerg Hessdoerfer

Date:

28 August 2001, 05:07:47

Hi!

Please find attached some very simple encoders/decoders for bytea and base64.
Bytea encoder is very picky about what it leaves unescaped - basically the
base64
char set ;-)

Since this seems to be a very poorly documented but much asked-for thing, I
thought
you would maybe like to add this code to libpq (so that everyone benefits).

I'm aware that function renames might be necessary, though.
If you like, I could make the code fit into libpq, and send diffs.

Any comments/interests?

Greetings,
    Joerg

Attachment

simple-enc.tar.gz

Re: Bytea/Base64 encoders for libpq - interested?

From

Karel Zak

Date:

28 August 2001, 05:55:56

On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote:
> Hi!
> 
> Please find attached some very simple encoders/decoders for bytea and base64.
> Bytea encoder is very picky about what it leaves unescaped - basically the 
> base64
> char set ;-)
> 
> Since this seems to be a very poorly documented but much asked-for thing, I 
> thought
> you would maybe like to add this code to libpq (so that everyone benefits).
> 
> I'm aware that function renames might be necessary, though.
> If you like, I could make the code fit into libpq, and send diffs.
> 
> Any comments/interests?
What implement base64 PostgreSQL datetype that use externaly base64 and
internaly same things as bytea. It prevent FE and parser problems with
"bad" chars and internaly for data storage save less space than text
with base64. Of course it doesn't solve a problem with encoding/decoding 
data in your application to/from base64. May be implement for this
datetype cast to/from bytea too.
SELECT my_bytea::base64 FROM foo;
INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);
And you can still fetch all data directly in batea by binary cursor. 
Comments?
    Karel
-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Bytea/Base64 encoders for libpq - interested?

From

Lincoln Yeoh

Date:

28 August 2001, 21:51:48

At 11:55 AM 28-08-2001 +0200, Karel Zak wrote:
>
> What implement base64 PostgreSQL datetype that use externaly base64 and
>internaly same things as bytea. It prevent FE and parser problems with
>"bad" chars and internaly for data storage save less space than text
>with base64. Of course it doesn't solve a problem with encoding/decoding 
>data in your application to/from base64. May be implement for this
>datetype cast to/from bytea too.
>
> SELECT my_bytea::base64 FROM foo;
>
> INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);
>
> And you can still fetch all data directly in batea by binary cursor. 
>
> Comments?

Sounds good to me. Even better if the base64 parser is bulletproof and
tolerant of junk. That way base64 email attachments may not even need to be
processed much - just filter a bit and shove it in :).

But shouldn't there be a ::base64 somewhere in the insert statement?

Cheerio,
Link.

Re: Bytea/Base64 encoders for libpq - interested?

From

Lincoln Yeoh

Date:

28 August 2001, 22:16:11

At 11:55 AM 28-08-2001 +0200, Karel Zak wrote:
> What implement base64 PostgreSQL datetype that use externaly base64 and
>internaly same things as bytea. It prevent FE and parser problems with

Another point:

I have no problems with base64[1]. However I was thinking that it might be
far easier for the C/C++/Java (and other low level languages) bunch to do
hexadecimal. e.g. zero zero for null, zero A for line feed. 

It expands things in the input/output stream, but it might be worth some
consideration. Simplicity, cpu usage etc.

Cheerio,
Link.

[1] OK, I can't convert base64 to ASCII mentally yet. But I don't think
that should really a factor.

Re: Bytea/Base64 encoders for libpq - interested?

From

Bruce Momjian

Date:

03 September 2001, 20:17:17

Where did we leave this?

> On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote:
> > Hi!
> > 
> > Please find attached some very simple encoders/decoders for bytea and base64.
> > Bytea encoder is very picky about what it leaves unescaped - basically the 
> > base64
> > char set ;-)
> > 
> > Since this seems to be a very poorly documented but much asked-for thing, I 
> > thought
> > you would maybe like to add this code to libpq (so that everyone benefits).
> > 
> > I'm aware that function renames might be necessary, though.
> > If you like, I could make the code fit into libpq, and send diffs.
> > 
> > Any comments/interests?
> 
>  What implement base64 PostgreSQL datetype that use externaly base64 and
> internaly same things as bytea. It prevent FE and parser problems with
> "bad" chars and internaly for data storage save less space than text
> with base64. Of course it doesn't solve a problem with encoding/decoding 
> data in your application to/from base64. May be implement for this
> datetype cast to/from bytea too.
> 
>  SELECT my_bytea::base64 FROM foo;
> 
>  INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);
> 
>  And you can still fetch all data directly in batea by binary cursor. 
> 
>  Comments?
> 
>         Karel
> -- 
>  Karel Zak  <zakkr@zf.jcu.cz>
>  http://home.zf.jcu.cz/~zakkr/
>  
>  C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

03 September 2001, 20:48:57

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Where did we leave this?

I don't think adding a datatype just to provide base64 encoding is
a wise approach.  The overhead of a new datatype (in the sense of
providing operators/functions for it) will be much more than the
benefit.  I think providing encode/decode functions is sufficient...
and we have those already, don't we?
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

Bruce Momjian

Date:

03 September 2001, 22:15:11

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Where did we leave this?
> 
> I don't think adding a datatype just to provide base64 encoding is
> a wise approach.  The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit.  I think providing encode/decode functions is sufficient...
> and we have those already, don't we?

Agreed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Bytea/Base64 encoders for libpq - interested?

From

"Joe Conway"

Date:

03 September 2001, 23:33:44

> I don't think adding a datatype just to provide base64 encoding is
> a wise approach.  The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit.  I think providing encode/decode functions is sufficient...
> and we have those already, don't we?
>

It might be nice to have a PQbyteaEscape or some such function available in
the libpq client library so that arbitrary binary could be escaped on the
client side and used in a sql statement. I actually wrote this already as an
addition to the PHP PostgreSQL extension, but it would make more sense, now
that I think about it, for it to be in libpq and called from PHP (or
whatever). Comments?

On a related note, are there any other bytea functions we should have in the
backend before freezing for 7.2? I was thinking it would be nice to have a
way to cast bytea into text and vice-versa, so that the normal text
functions could be used for things like LIKE and concatenation. Any interest
in this? If so, any guidance WRT how it should be implemented?

-- Joe

Re: Bytea/Base64 encoders for libpq - interested?

From

Bruce Momjian

Date:

04 September 2001, 00:13:08

> > I don't think adding a datatype just to provide base64 encoding is
> > a wise approach.  The overhead of a new datatype (in the sense of
> > providing operators/functions for it) will be much more than the
> > benefit.  I think providing encode/decode functions is sufficient...
> > and we have those already, don't we?
> >
> 
> It might be nice to have a PQbyteaEscape or some such function available in
> the libpq client library so that arbitrary binary could be escaped on the
> client side and used in a sql statement. I actually wrote this already as an
> addition to the PHP PostgreSQL extension, but it would make more sense, now
> that I think about it, for it to be in libpq and called from PHP (or
> whatever). Comments?

Good idea.  I will commit the non-bytea escape in a day and you can base
a bytea one on that.  You will have to pass in the length of the field
because of course it is not null terminated.

> On a related note, are there any other bytea functions we should have in the
> backend before freezing for 7.2? I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa, so that the normal text
> functions could be used for things like LIKE and concatenation. Any interest
> in this? If so, any guidance WRT how it should be implemented?

I can't see why you can't do that.  The only problem is passing a \0
(null byte) back to the client.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

04 September 2001, 00:45:26

"Joe Conway" <joseph.conway@home.com> writes:
> I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa,

How will you handle a null byte in bytea data?  Transforming it directly
into an embedded null in a text object is NOT an acceptable answer,
because too many of the text functions will misbehave on such data.
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

"Joe Conway"

Date:

04 September 2001, 01:14:33

> > It might be nice to have a PQbyteaEscape or some such function available
in
> > the libpq client library so that arbitrary binary could be escaped on
the
> > client side and used in a sql statement. I actually wrote this already
as an
> > addition to the PHP PostgreSQL extension, but it would make more sense,
now
> > that I think about it, for it to be in libpq and called from PHP (or
> > whatever). Comments?
>
> Good idea.  I will commit the non-bytea escape in a day and you can base
> a bytea one on that.  You will have to pass in the length of the field
> because of course it is not null terminated.

OK.

>
> > On a related note, are there any other bytea functions we should have in
the
> > backend before freezing for 7.2? I was thinking it would be nice to have
a
> > way to cast bytea into text and vice-versa, so that the normal text
> > functions could be used for things like LIKE and concatenation. Any
interest
> > in this? If so, any guidance WRT how it should be implemented?
>
> I can't see why you can't do that.  The only problem is passing a \0
> (null byte) back to the client.

Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text
would be a function that takes the escaped string value from byteaout, and
creates a text value directly from it. The only danger I can think of is
that very long strings might need to be truncated in length, since the
escaped string could be significantly longer than the binary.

Text-to-bytea should be a straight copy, since nothing that can be
represented as text cannot be represented as bytea.

Any comments or concerns?

-- Joe

Re: Bytea/Base64 encoders for libpq - interested?

From

Karel Zak

Date:

04 September 2001, 04:22:19

On Mon, Sep 03, 2001 at 08:48:22PM -0400, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Where did we leave this?
> 
> I don't think adding a datatype just to provide base64 encoding is
> a wise approach.  The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit.  I think providing encode/decode functions is sufficient...
> and we have those already, don't we?
Agree too. But 1000 "bad" chars encoded by base64 vs. encoded by 
escape, what is longer and more expensive for transfer between FE 
and BE?
A base64 problem is that encode all chars in string, but in the 
real usage some data contains "bad" chars occasional only. 
        Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Bytea/Base64 encoders for libpq - interested?

From

Peter Eisentraut

Date:

04 September 2001, 06:21:53

Joe Conway writes:

> On a related note, are there any other bytea functions we should have in the
> backend before freezing for 7.2?

The SQL standards has a lot of functions for BLOB...

> I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa, so that the normal text
> functions could be used for things like LIKE and concatenation.

Better write a native LIKE function for bytea, now that some parts are
threatening to make the text-LIKE function use the locale collating
sequence.  (Multibyte aware text could also have interesting effects.)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

04 September 2001, 10:08:37

"Joe Conway" <joseph.conway@home.com> writes:
> Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text
> would be a function that takes the escaped string value from byteaout, and
> creates a text value directly from it. The only danger I can think of is
> that very long strings might need to be truncated in length, since the
> escaped string could be significantly longer than the binary.

> Text-to-bytea should be a straight copy, since nothing that can be
> represented as text cannot be represented as bytea.

Ugh ... if the conversion functions are not inverses then I think they
lose much of their value.  I could see doing either of these:

1. Conversion functions based on byteaout/byteain.

2. Bytea to text escapes *only* null bytes, text to bytea treats only
"\0" as an escape sequence.

Or maybe both, with two pairs of conversion functions.

In any case, we have to decide whether these coercion functions should
be named after the types --- ie, should they be made invokable as
implicit coercions?  I'm dubious that that's a good idea; if we do it
then all sorts of textual operations will suddenly be allowed for bytea
without any explicit conversion, which is likely to do more harm than
good.  The reason for having a separate bytea type is exactly so that
you *can't* apply text ops to it without thinking.
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

"Joe Conway"

Date:

04 September 2001, 11:55:18

> > On a related note, are there any other bytea functions we should have in
the
> > backend before freezing for 7.2?
>
> The SQL standards has a lot of functions for BLOB...
>

OK - thanks. I'll take a look.

> > I was thinking it would be nice to have a
> > way to cast bytea into text and vice-versa, so that the normal text
> > functions could be used for things like LIKE and concatenation.
>
> Better write a native LIKE function for bytea, now that some parts are
> threatening to make the text-LIKE function use the locale collating
> sequence.  (Multibyte aware text could also have interesting effects.)
>

Sounds like good advice. I'll try to get both the cast functions and a
native bytea LIKE function done.

-- Joe

Re: Bytea/Base64 encoders for libpq - interested?

From

"Joe Conway"

Date:

04 September 2001, 12:25:52

> Ugh ... if the conversion functions are not inverses then I think they
> lose much of their value.  I could see doing either of these:
>
> 1. Conversion functions based on byteaout/byteain.
>
> 2. Bytea to text escapes *only* null bytes, text to bytea treats only
> "\0" as an escape sequence.
>
> Or maybe both, with two pairs of conversion functions.
>
> In any case, we have to decide whether these coercion functions should
> be named after the types --- ie, should they be made invokable as
> implicit coercions?  I'm dubious that that's a good idea; if we do it
> then all sorts of textual operations will suddenly be allowed for bytea
> without any explicit conversion, which is likely to do more harm than
> good.  The reason for having a separate bytea type is exactly so that
> you *can't* apply text ops to it without thinking.
>
> regards, tom lane

You're right, as usual (I was tired when I wrote this last night ;). But I
think we have to escape/unescape both null and '\', don't we?

I agree that it would be better to *not* allow implicit coercions. Given
that, any preferences on function names? Are text_to_bytea() and
bytea_to_text() too ugly?

-- Joe

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

04 September 2001, 12:44:05

"Joe Conway" <joseph.conway@home.com> writes:
> You're right, as usual (I was tired when I wrote this last night ;). But I
> think we have to escape/unescape both null and '\', don't we?

Yeah, you're right.  My turn to have not thought hard enough.

> I agree that it would be better to *not* allow implicit coercions. Given
> that, any preferences on function names? Are text_to_bytea() and
> bytea_to_text() too ugly?

They're pretty ugly, but more importantly they're only suitable if we
have exactly one conversion function each way.  If we have two, what
will we call the second one?

I think it's okay to let the argument type be implicit in the function
argument list.  Something like text_escaped(bytea) and text_direct(bytea)
(with inverses bytea_escaped(text) and bytea_direct(text)) might do.
I'm not totally happy with "direct" to suggest minimum escaping, though.
Better ideas anyone?
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

Bruce Momjian

Date:

04 September 2001, 13:19:53

> You're right, as usual (I was tired when I wrote this last night ;). But I
> think we have to escape/unescape both null and '\', don't we?

Yes, I think backslashes need special escapes too.

Let me ask a bigger question.  We have the length of the text string in
the varlena header.  Are we concerned about backend code not handling
NULL in text fields, or frontend code returning strings with embedded
nulls?

I see problems in the text() functions for nulls, but is such a
limitation required for text types?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Bytea/Base64 encoders for libpq - interested?

From

Larry Rosenman

Date:

04 September 2001, 13:31:05

* Tom Lane <tgl@sss.pgh.pa.us> [010904 12:01]:
> They're pretty ugly, but more importantly they're only suitable if we
> have exactly one conversion function each way.  If we have two, what
> will we call the second one?
> 
> I think it's okay to let the argument type be implicit in the function
> argument list.  Something like text_escaped(bytea) and text_direct(bytea)
> (with inverses bytea_escaped(text) and bytea_direct(text)) might do.
> I'm not totally happy with "direct" to suggest minimum escaping, though.
> Better ideas anyone?
Cooked vs raw? 

LER

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

04 September 2001, 13:54:31

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Let me ask a bigger question.  We have the length of the text string in
> the varlena header.  Are we concerned about backend code not handling
> NULL in text fields, or frontend code returning strings with embedded
> nulls?

The former.

> I see problems in the text() functions for nulls, but is such a
> limitation required for text types?

Unless you want to re-implement strcoll() and friends from scratch.
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

Bruce Momjian

Date:

04 September 2001, 14:13:48

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Let me ask a bigger question.  We have the length of the text string in
> > the varlena header.  Are we concerned about backend code not handling
> > NULL in text fields, or frontend code returning strings with embedded
> > nulls?
> 
> The former.
> 
> > I see problems in the text() functions for nulls, but is such a
> > limitation required for text types?
> 
> Unless you want to re-implement strcoll() and friends from scratch.

Yes, I saw strcoll().

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Bytea/Base64 encoders for libpq - interested?

From

Tom Lane

Date:

05 September 2001, 21:18:30

Peter Eisentraut <peter_e@gmx.net> writes:
> Why not just stick these things into encode() and name them
> "my-cool-encoding" or whatever.

Sounds good to me ...
        regards, tom lane

Re: Bytea/Base64 encoders for libpq - interested?

From

"Joe Conway"

Date:

05 September 2001, 22:02:32

> Peter Eisentraut <peter_e@gmx.net> writes:
> > Why not just stick these things into encode() and name them
> > "my-cool-encoding" or whatever.
> 
> Sounds good to me ...
> 
> regards, tom lane
> 

Sounds good to me too. Patch forthcoming . . .

-- Joe

Re: Bytea/Base64 encoders for libpq - interested?

From

Peter Eisentraut

Date:

05 September 2001, 22:17:49

Tom Lane writes:

> > I agree that it would be better to *not* allow implicit coercions. Given
> > that, any preferences on function names? Are text_to_bytea() and
> > bytea_to_text() too ugly?
>
> They're pretty ugly, but more importantly they're only suitable if we
> have exactly one conversion function each way.  If we have two, what
> will we call the second one?

Why not just stick these things into encode() and name them
"my-cool-encoding" or whatever.  There is no truly natural conversion
between text and bytea, so encode/decode seem like the proper place.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter