Thread: [PATCH] by request: base64 for bytea

[PATCH] by request: base64 for bytea

From
Alex Pilosov
Date:
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.

Code was initially taken from public-domain base64.c by John Walker but
much simplified (such as, breaking up long string into multiple lines is
not done, EBCDIC support removed). 

--
Alex Pilosov            | http://www.acedsl.com/home.html
CTO - Acecape, Inc.     | AceDSL:The best ADSL in the world
325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
New York, NY 10018      |

Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

> Attached is a patch (including documentation this time :) containing two
> functions, base64_encode(bytea) and base64_decode(text) with obvious
> functionality.
> 
> Code was initially taken from public-domain base64.c by John Walker but
> much simplified (such as, breaking up long string into multiple lines is
> not done, EBCDIC support removed). 
> 
> --
> Alex Pilosov            | http://www.acedsl.com/home.html
> CTO - Acecape, Inc.     | AceDSL:The best ADSL in the world
> 325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
> New York, NY 10018      |

Content-Description: 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Alex Pilosov
Date:
On Sat, 23 Jun 2001, Marko Kreen wrote:

> On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
> > Your patch has been added to the PostgreSQL unapplied patches list at:
> 
> > > Attached is a patch (including documentation this time :) containing two
> > > functions, base64_encode(bytea) and base64_decode(text) with obvious
> > > functionality.
> 
> Btw, there are functions in form encode(data, 'base64'),
> decode(data, 'base64') in contrib/pgcrypto.  They do also
> encode(data, 'hex').  In the future I need to do probably
> encode(data, 'pgp-armor') too...
> 
> I agree those functionality should be in core code, and if
> the Alex ones get there, maybe he could use same interface?
Oy, I didn't notice them in contrib/pgcrypt.

Bruce, you can take my patch out of queue, stuff in pgcrypt is far more
comprehensive than what I done.

> Or I can extract it out from pgcrypto and submit to core ;)
> I simply had not a need for it because I used those with
> pgcrypto, but Alex seems to hint that there would be bit of
> interest otherwise too.
I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...

-alex



Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
Patch removed.
> Attached is a patch (including documentation this time :) containing two
> functions, base64_encode(bytea) and base64_decode(text) with obvious
> functionality.
> 
> Code was initially taken from public-domain base64.c by John Walker but
> much simplified (such as, breaking up long string into multiple lines is
> not done, EBCDIC support removed). 
> 
> --
> Alex Pilosov            | http://www.acedsl.com/home.html
> CTO - Acecape, Inc.     | AceDSL:The best ADSL in the world
> 325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
> New York, NY 10018      |

Content-Description: 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
> > On Sat, 23 Jun 2001, Marko Kreen wrote:
> > > Or I can extract it out from pgcrypto and submit to core ;)
> > > I simply had not a need for it because I used those with
> > > pgcrypto, but Alex seems to hint that there would be bit of
> > > interest otherwise too.
> > I think encode/decode should be part of core, as they are basic functions
> > to manipulate bytea data...
> 
> Ok, I think I look into it.  I am anyway preparing a big update
> to pgcrypto.
> 
> Question to -hackers: currently there is not possible to cast
> bytea to text and vice-versa.  Is this intentional or bug?
> 
> It is weird because internal representation is exactly the same.
> As I want my funtions to operate on both, do I need to create
> separate funtion entries to every combination of parameters?
> It gets crazy on encrypt_iv(data, key, iv, type) which has 3
> parameters that can be both bytea or text...

We just need to mark them as binary compatible.  I will do that now and
commit.  We really weren't sure what bytea was for in the past (or
forgot) so I am sure it was an oversight.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
> On Sat, 23 Jun 2001, Marko Kreen wrote:
> 
> > On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
> > > Your patch has been added to the PostgreSQL unapplied patches list at:
> > 
> > > > Attached is a patch (including documentation this time :) containing two
> > > > functions, base64_encode(bytea) and base64_decode(text) with obvious
> > > > functionality.
> > 
> > Btw, there are functions in form encode(data, 'base64'),
> > decode(data, 'base64') in contrib/pgcrypto.  They do also
> > encode(data, 'hex').  In the future I need to do probably
> > encode(data, 'pgp-armor') too...
> > 
> > I agree those functionality should be in core code, and if
> > the Alex ones get there, maybe he could use same interface?
> Oy, I didn't notice them in contrib/pgcrypt.
> 
> Bruce, you can take my patch out of queue, stuff in pgcrypt is far more
> comprehensive than what I done.

Sure.  Done.  Funny we didn't need them as much for crypto but we do
need them for binary insertion into the database.

> 
> > Or I can extract it out from pgcrypto and submit to core ;)
> > I simply had not a need for it because I used those with
> > pgcrypto, but Alex seems to hint that there would be bit of
> > interest otherwise too.
> I think encode/decode should be part of core, as they are basic functions
> to manipulate bytea data...

Agreed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Marko Kreen
Date:
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
> On Sat, 23 Jun 2001, Marko Kreen wrote:
> > Or I can extract it out from pgcrypto and submit to core ;)
> > I simply had not a need for it because I used those with
> > pgcrypto, but Alex seems to hint that there would be bit of
> > interest otherwise too.
> I think encode/decode should be part of core, as they are basic functions
> to manipulate bytea data...

Ok, I think I look into it.  I am anyway preparing a big update
to pgcrypto.

Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa.  Is this intentional or bug?

It is weird because internal representation is exactly the same.
As I want my funtions to operate on both, do I need to create
separate funtion entries to every combination of parameters?
It gets crazy on encrypt_iv(data, key, iv, type) which has 3
parameters that can be both bytea or text...


-- 
marko



Re: [PATCH] by request: base64 for bytea

From
Marko Kreen
Date:
On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
> Your patch has been added to the PostgreSQL unapplied patches list at:

> > Attached is a patch (including documentation this time :) containing two
> > functions, base64_encode(bytea) and base64_decode(text) with obvious
> > functionality.

Btw, there are functions in form encode(data, 'base64'),
decode(data, 'base64') in contrib/pgcrypto.  They do also
encode(data, 'hex').  In the future I need to do probably
encode(data, 'pgp-armor') too...

I agree those functionality should be in core code, and if
the Alex ones get there, maybe he could use same interface?

Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.

-- 
marko



Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
> > On Sat, 23 Jun 2001, Marko Kreen wrote:
> > > Or I can extract it out from pgcrypto and submit to core ;)
> > > I simply had not a need for it because I used those with
> > > pgcrypto, but Alex seems to hint that there would be bit of
> > > interest otherwise too.
> > I think encode/decode should be part of core, as they are basic functions
> > to manipulate bytea data...
> 
> Ok, I think I look into it.  I am anyway preparing a big update
> to pgcrypto.
> 
> Question to -hackers: currently there is not possible to cast
> bytea to text and vice-versa.  Is this intentional or bug?
> 
> It is weird because internal representation is exactly the same.
> As I want my funtions to operate on both, do I need to create
> separate funtion entries to every combination of parameters?
> It gets crazy on encrypt_iv(data, key, iv, type) which has 3
> parameters that can be both bytea or text...

I have commited code to CVS to make bytea binary compatible with text.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
> > On Sat, 23 Jun 2001, Marko Kreen wrote:
> > > Or I can extract it out from pgcrypto and submit to core ;)
> > > I simply had not a need for it because I used those with
> > > pgcrypto, but Alex seems to hint that there would be bit of
> > > interest otherwise too.
> > I think encode/decode should be part of core, as they are basic functions
> > to manipulate bytea data...
> 
> Ok, I think I look into it.  I am anyway preparing a big update
> to pgcrypto.
> 
> Question to -hackers: currently there is not possible to cast
> bytea to text and vice-versa.  Is this intentional or bug?
> 
> It is weird because internal representation is exactly the same.
> As I want my funtions to operate on both, do I need to create
> separate funtion entries to every combination of parameters?
> It gets crazy on encrypt_iv(data, key, iv, type) which has 3
> parameters that can be both bytea or text...

Sorry, backed out bytea binary compatibility code.  Tom says it will not
work.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Tom Lane
Date:
Marko Kreen <marko@l-t.ee> writes:
> Question to -hackers: currently there is not possible to cast
> bytea to text and vice-versa.  Is this intentional or bug?

Intentional.  text and friends do not like embedded nulls.

If there were a cast it would have to be one that implies
an I/O conversion, just like any other type that contains
non-textual data.
        regards, tom lane


Re: [PATCH] by request: base64 for bytea

From
Marko Kreen
Date:
On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote:
> Marko Kreen <marko@l-t.ee> writes:
> > Question to -hackers: currently there is not possible to cast
> > bytea to text and vice-versa.  Is this intentional or bug?
> 
> Intentional.  text and friends do not like embedded nulls.
> 
> If there were a cast it would have to be one that implies
> an I/O conversion, just like any other type that contains
> non-textual data.

Well, I have functions that should work on both - encode(),
digest(), hmac().  Probably should do then several entries.  Ok.

But what should be return type of decrypt()?  I imagine well
situations where user wants to crypt both bytea and text data.
When there is even not a way to cast them to each other, then
he is stuck for no good reason.

-- 
marko



Re: [PATCH] by request: base64 for bytea

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> Function to cast bytea as text, I think, should do proper checking that
> input did not contain nulls, and return text data back.

That is most definitely not good enough.  In MULTIBYTE installations
you'd have to also check that there were no illegal multibyte sequences.

The whole approach seems misguided to me anyway.  bytea isn't equivalent
to text and conversion functions based on providing incomplete binary
equivalence are fundamentally wrong.  hex or base64 encode/decode
functions seem like reasonable conversion paths, or you could provide
a function that mimics the existing I/O conversions for bytea, ugly as
they are.

In the case that Marko is describing, it seems to me he is providing
two independent sets of encryption functions, one for text and one
for bytea.  That they happen to share code under the hood is an
implementation detail of his code, not a reason to contort the type
system.  If someone wanted to add functions to encrypt, say, polygons,
would you start looking for ways to create a binary equivalence between
polygon and text?  I sure hope not.
        regards, tom lane


Re: [PATCH] by request: base64 for bytea

From
Alex Pilosov
Date:
On Sun, 24 Jun 2001, Marko Kreen wrote:

> On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote:
> > Marko Kreen <marko@l-t.ee> writes:
> > > Question to -hackers: currently there is not possible to cast
> > > bytea to text and vice-versa.  Is this intentional or bug?
> > 
> > Intentional.  text and friends do not like embedded nulls.
> > 
> > If there were a cast it would have to be one that implies
> > an I/O conversion, just like any other type that contains
> > non-textual data.
> 
> Well, I have functions that should work on both - encode(),
> digest(), hmac().  Probably should do then several entries.  Ok.
> 
> But what should be return type of decrypt()?  I imagine well
> situations where user wants to crypt both bytea and text data.
> When there is even not a way to cast them to each other, then
> he is stuck for no good reason.
There SHOULD be a text_bytea function to cast a text as bytea, as it is
always safe. (It doesn't exist yet, but its a trivial patch)

Function to cast bytea as text, I think, should do proper checking that
input did not contain nulls, and return text data back.

Your encrypt/decrypt should take bytea and return bytea. Its user's
responsibility to cast the things to bytea when needed.



Re: [PATCH] by request: base64 for bytea

From
Tom Lane
Date:
Marko Kreen <marko@l-t.ee> writes:
> But what should be return type of decrypt()?

You'll need more than one name: decrypt to text, decrypt to bytea, etc.
Think about what happens when you need to support additional types.
Relying on implicit conversions or binary equivalence will not scale.
        regards, tom lane


Re: [PATCH] by request: base64 for bytea

From
Bruce Momjian
Date:
> Alex Pilosov <alex@pilosoft.com> writes:
> > Function to cast bytea as text, I think, should do proper checking that
> > input did not contain nulls, and return text data back.
> 
> That is most definitely not good enough.  In MULTIBYTE installations
> you'd have to also check that there were no illegal multibyte sequences.
> 
> The whole approach seems misguided to me anyway.  bytea isn't equivalent
> to text and conversion functions based on providing incomplete binary
> equivalence are fundamentally wrong.  hex or base64 encode/decode
> functions seem like reasonable conversion paths, or you could provide
> a function that mimics the existing I/O conversions for bytea, ugly as
> they are.

He can create an output function just to text, and varchar, etc will work
OK, right?

I think the main issue is that char(), varchar(), text all input/output
strings of the same format while bytea has special backslash handling
for binary/null values.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [PATCH] by request: base64 for bytea

From
Alex Pilosov
Date:
On Sun, 24 Jun 2001, Tom Lane wrote:

> Alex Pilosov <alex@pilosoft.com> writes:
> > Function to cast bytea as text, I think, should do proper checking that
> > input did not contain nulls, and return text data back.
> 
> That is most definitely not good enough.  In MULTIBYTE installations
> you'd have to also check that there were no illegal multibyte sequences.
True, but see below.

> The whole approach seems misguided to me anyway.  bytea isn't equivalent
> to text and conversion functions based on providing incomplete binary
> equivalence are fundamentally wrong.  hex or base64 encode/decode
> functions seem like reasonable conversion paths, or you could provide
> a function that mimics the existing I/O conversions for bytea, ugly as
> they are.
>
> In the case that Marko is describing, it seems to me he is providing
> two independent sets of encryption functions, one for text and one
> for bytea.  That they happen to share code under the hood is an
> implementation detail of his code, not a reason to contort the type
> system.  If someone wanted to add functions to encrypt, say, polygons,
> would you start looking for ways to create a binary equivalence between
> polygon and text?  I sure hope not.

Well, encrypt/decrypt are special kinds of functions. When the data is
decrypted, its type is not known, as it is not stored anywhere in the
data. Caller is responsible to casting the result to whatever he needs to,
thus, there must be some way to cast output of decrypted data to any type.

I may be going a bit too far, but, if you think about it, if one wanted to
encrypt a generic type t, these ar e the alternatives: 

a) to encrypt, caller must use encrypt(t_out(val)) and to decrypt
t_in(decrypt(val)).

Problem with that is non-existance of CSTRING datatype as of yet, and a
possible inefficiency of it compared to b).

b) make encrypt operate on 'opaque' type, and just encrypt raw data in
memory, as many as there are, and store the original varlen separately.
(most encrypt-decrypt algorithms do not preserve data length anyway, they
operate in blocks of n bytes).  Question in this situation what to do with
decrypt, options are:

b1) make decrypt return opaque and to allow conversion from opaque to any
datatype, (by blindly setting the oid of return type), I'm not sure how
hard is this one to do with current type system, and do not like safety of
this since an ordinary user would be able to put garbage data into type
that may not be prepared to handle it.

b2) make encrypt store the name of original type in encrypted data. make
decrypt return opaque which would contain (type,data,length) triple, and
allow to cast opaque into any type but _checking_ that opaque has correct
format and that type stored in opaque matches type its being cast to.

This has additional benefit of being able to serialize/deserialize data,
preserving type, which may be used by something else...

In my opinion, a) is probably the easiest option to implement. b2) is
(IMHO) the most correct one, but it may be a bit too much work for not
that much of benefit?

This may be going a bit too far, since original question only dealt with
text-bytea conversions, but maybe its time to look at 'generic' functions
which return generic types.



Re: [PATCH] by request: base64 for bytea

From
Marko Kreen
Date:
On Sun, Jun 24, 2001 at 06:20:39PM -0400, Alex Pilosov wrote:
> On Sun, 24 Jun 2001, Tom Lane wrote:
> > In the case that Marko is describing, it seems to me he is providing
> > two independent sets of encryption functions, one for text and one
> > for bytea.  That they happen to share code under the hood is an
> > implementation detail of his code, not a reason to contort the type
> > system.  If someone wanted to add functions to encrypt, say, polygons,
> > would you start looking for ways to create a binary equivalence between
> > polygon and text?  I sure hope not.
> 
> Well, encrypt/decrypt are special kinds of functions. When the data is
> decrypted, its type is not known, as it is not stored anywhere in the
> data. Caller is responsible to casting the result to whatever he needs to,
> thus, there must be some way to cast output of decrypted data to any type.
> 
> I may be going a bit too far, but, if you think about it, if one wanted to
> encrypt a generic type t, these ar e the alternatives: 

[ ... bunch of good ideas ... ]

I do not want to go that far and imagine current encrypt() as
something low-level, that encrypts a unstructured array of 8bit
values.  That makes bytea as 'natural' type to use for it.
I now took the Tom suggestion that all functions do not operate
well on 8bit values - so now I declared that all my funtions
that _do_ operate on 8bit values, get data as bytea.
Btw, the length is preserved - I use padding if needed.  But no
additional info is preserved.

Now, if you want to do something higher-level, in POV of
PostgreSQL - to attach type data or something else, you can
very well build some higher-level functions on encrypt() that
add some additional structure for it.  This is easy - you can
do it in SQL level if you want, but I also tried to make
all crypto stuff accesible from C level too.  I do not think it
belongs to current encrypt() - this is 'next level'.  So I do
not worry about encrypting polygons yet.

Tho' current encrypt() has some 'negative' points on crypto POV.
As it does basically pure cipher, and has no structure I cant
use some higher features as key generation, attaching algorithm
info to data and checksums.  (Actually it _does_ support
attaching a MD or HMAC to encrypted data, but I consider it as
too hackish).  So, ee, someday, when I have more time I would like
to use current code as building block and do a minimal OpenPGP
implementation that does support all of it.

This again does not offer anything for 'generic types', but
again I do not consider it job for that level.

> This may be going a bit too far, since original question only dealt with
> text-bytea conversions, but maybe its time to look at 'generic' functions
> which return generic types.

I did want to encrypt() etc. to operate on 'text' too, as it
would be _very_ convinient, and they really are similar on POV
of encrypt().

Hmm, on the other hand -

Idea for 'generic types', taking account of PostgreSQL current
type system - functions:
pack(data::whatever)::bytea,unpack_text(data::bytea)::text,unpack_polygon(data::bytea)::polygon...

pack() does a compact representation of data, with type attached
unpack*() checks if it is of correct type and sane.  It may be
textual but this takes much room, binary is probably not
portable.  Eg. it could be done using *in(), *out() functions,
maybe even keep the '\0', and prepends type info (oid/name).
So later it could be given to encrypt()... ?

-- 
marko