Thread: [PATCH] by request: base64 for bytea
Attached is a patch (including documentation this time :) containing two functions, base64_encode(bytea) and base64_decode(text) with obvious functionality. Code was initially taken from public-domain base64.c by John Walker but much simplified (such as, breaking up long string into multiple lines is not done, EBCDIC support removed). -- Alex Pilosov | http://www.acedsl.com/home.html CTO - Acecape, Inc. | AceDSL:The best ADSL in the world 325 W 38 St. Suite 1005 | (Stealth Marketing Works! :) New York, NY 10018 |
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. > Attached is a patch (including documentation this time :) containing two > functions, base64_encode(bytea) and base64_decode(text) with obvious > functionality. > > Code was initially taken from public-domain base64.c by John Walker but > much simplified (such as, breaking up long string into multiple lines is > not done, EBCDIC support removed). > > -- > Alex Pilosov | http://www.acedsl.com/home.html > CTO - Acecape, Inc. | AceDSL:The best ADSL in the world > 325 W 38 St. Suite 1005 | (Stealth Marketing Works! :) > New York, NY 10018 | Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sat, 23 Jun 2001, Marko Kreen wrote: > On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote: > > Your patch has been added to the PostgreSQL unapplied patches list at: > > > > Attached is a patch (including documentation this time :) containing two > > > functions, base64_encode(bytea) and base64_decode(text) with obvious > > > functionality. > > Btw, there are functions in form encode(data, 'base64'), > decode(data, 'base64') in contrib/pgcrypto. They do also > encode(data, 'hex'). In the future I need to do probably > encode(data, 'pgp-armor') too... > > I agree those functionality should be in core code, and if > the Alex ones get there, maybe he could use same interface? Oy, I didn't notice them in contrib/pgcrypt. Bruce, you can take my patch out of queue, stuff in pgcrypt is far more comprehensive than what I done. > Or I can extract it out from pgcrypto and submit to core ;) > I simply had not a need for it because I used those with > pgcrypto, but Alex seems to hint that there would be bit of > interest otherwise too. I think encode/decode should be part of core, as they are basic functions to manipulate bytea data... -alex
Patch removed. > Attached is a patch (including documentation this time :) containing two > functions, base64_encode(bytea) and base64_decode(text) with obvious > functionality. > > Code was initially taken from public-domain base64.c by John Walker but > much simplified (such as, breaking up long string into multiple lines is > not done, EBCDIC support removed). > > -- > Alex Pilosov | http://www.acedsl.com/home.html > CTO - Acecape, Inc. | AceDSL:The best ADSL in the world > 325 W 38 St. Suite 1005 | (Stealth Marketing Works! :) > New York, NY 10018 | Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote: > > On Sat, 23 Jun 2001, Marko Kreen wrote: > > > Or I can extract it out from pgcrypto and submit to core ;) > > > I simply had not a need for it because I used those with > > > pgcrypto, but Alex seems to hint that there would be bit of > > > interest otherwise too. > > I think encode/decode should be part of core, as they are basic functions > > to manipulate bytea data... > > Ok, I think I look into it. I am anyway preparing a big update > to pgcrypto. > > Question to -hackers: currently there is not possible to cast > bytea to text and vice-versa. Is this intentional or bug? > > It is weird because internal representation is exactly the same. > As I want my funtions to operate on both, do I need to create > separate funtion entries to every combination of parameters? > It gets crazy on encrypt_iv(data, key, iv, type) which has 3 > parameters that can be both bytea or text... We just need to mark them as binary compatible. I will do that now and commit. We really weren't sure what bytea was for in the past (or forgot) so I am sure it was an oversight. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> On Sat, 23 Jun 2001, Marko Kreen wrote: > > > On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote: > > > Your patch has been added to the PostgreSQL unapplied patches list at: > > > > > > Attached is a patch (including documentation this time :) containing two > > > > functions, base64_encode(bytea) and base64_decode(text) with obvious > > > > functionality. > > > > Btw, there are functions in form encode(data, 'base64'), > > decode(data, 'base64') in contrib/pgcrypto. They do also > > encode(data, 'hex'). In the future I need to do probably > > encode(data, 'pgp-armor') too... > > > > I agree those functionality should be in core code, and if > > the Alex ones get there, maybe he could use same interface? > Oy, I didn't notice them in contrib/pgcrypt. > > Bruce, you can take my patch out of queue, stuff in pgcrypt is far more > comprehensive than what I done. Sure. Done. Funny we didn't need them as much for crypto but we do need them for binary insertion into the database. > > > Or I can extract it out from pgcrypto and submit to core ;) > > I simply had not a need for it because I used those with > > pgcrypto, but Alex seems to hint that there would be bit of > > interest otherwise too. > I think encode/decode should be part of core, as they are basic functions > to manipulate bytea data... Agreed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote: > On Sat, 23 Jun 2001, Marko Kreen wrote: > > Or I can extract it out from pgcrypto and submit to core ;) > > I simply had not a need for it because I used those with > > pgcrypto, but Alex seems to hint that there would be bit of > > interest otherwise too. > I think encode/decode should be part of core, as they are basic functions > to manipulate bytea data... Ok, I think I look into it. I am anyway preparing a big update to pgcrypto. Question to -hackers: currently there is not possible to cast bytea to text and vice-versa. Is this intentional or bug? It is weird because internal representation is exactly the same. As I want my funtions to operate on both, do I need to create separate funtion entries to every combination of parameters? It gets crazy on encrypt_iv(data, key, iv, type) which has 3 parameters that can be both bytea or text... -- marko
On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote: > Your patch has been added to the PostgreSQL unapplied patches list at: > > Attached is a patch (including documentation this time :) containing two > > functions, base64_encode(bytea) and base64_decode(text) with obvious > > functionality. Btw, there are functions in form encode(data, 'base64'), decode(data, 'base64') in contrib/pgcrypto. They do also encode(data, 'hex'). In the future I need to do probably encode(data, 'pgp-armor') too... I agree those functionality should be in core code, and if the Alex ones get there, maybe he could use same interface? Or I can extract it out from pgcrypto and submit to core ;) I simply had not a need for it because I used those with pgcrypto, but Alex seems to hint that there would be bit of interest otherwise too. -- marko
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote: > > On Sat, 23 Jun 2001, Marko Kreen wrote: > > > Or I can extract it out from pgcrypto and submit to core ;) > > > I simply had not a need for it because I used those with > > > pgcrypto, but Alex seems to hint that there would be bit of > > > interest otherwise too. > > I think encode/decode should be part of core, as they are basic functions > > to manipulate bytea data... > > Ok, I think I look into it. I am anyway preparing a big update > to pgcrypto. > > Question to -hackers: currently there is not possible to cast > bytea to text and vice-versa. Is this intentional or bug? > > It is weird because internal representation is exactly the same. > As I want my funtions to operate on both, do I need to create > separate funtion entries to every combination of parameters? > It gets crazy on encrypt_iv(data, key, iv, type) which has 3 > parameters that can be both bytea or text... I have commited code to CVS to make bytea binary compatible with text. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote: > > On Sat, 23 Jun 2001, Marko Kreen wrote: > > > Or I can extract it out from pgcrypto and submit to core ;) > > > I simply had not a need for it because I used those with > > > pgcrypto, but Alex seems to hint that there would be bit of > > > interest otherwise too. > > I think encode/decode should be part of core, as they are basic functions > > to manipulate bytea data... > > Ok, I think I look into it. I am anyway preparing a big update > to pgcrypto. > > Question to -hackers: currently there is not possible to cast > bytea to text and vice-versa. Is this intentional or bug? > > It is weird because internal representation is exactly the same. > As I want my funtions to operate on both, do I need to create > separate funtion entries to every combination of parameters? > It gets crazy on encrypt_iv(data, key, iv, type) which has 3 > parameters that can be both bytea or text... Sorry, backed out bytea binary compatibility code. Tom says it will not work. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Marko Kreen <marko@l-t.ee> writes: > Question to -hackers: currently there is not possible to cast > bytea to text and vice-versa. Is this intentional or bug? Intentional. text and friends do not like embedded nulls. If there were a cast it would have to be one that implies an I/O conversion, just like any other type that contains non-textual data. regards, tom lane
On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote: > Marko Kreen <marko@l-t.ee> writes: > > Question to -hackers: currently there is not possible to cast > > bytea to text and vice-versa. Is this intentional or bug? > > Intentional. text and friends do not like embedded nulls. > > If there were a cast it would have to be one that implies > an I/O conversion, just like any other type that contains > non-textual data. Well, I have functions that should work on both - encode(), digest(), hmac(). Probably should do then several entries. Ok. But what should be return type of decrypt()? I imagine well situations where user wants to crypt both bytea and text data. When there is even not a way to cast them to each other, then he is stuck for no good reason. -- marko
Alex Pilosov <alex@pilosoft.com> writes: > Function to cast bytea as text, I think, should do proper checking that > input did not contain nulls, and return text data back. That is most definitely not good enough. In MULTIBYTE installations you'd have to also check that there were no illegal multibyte sequences. The whole approach seems misguided to me anyway. bytea isn't equivalent to text and conversion functions based on providing incomplete binary equivalence are fundamentally wrong. hex or base64 encode/decode functions seem like reasonable conversion paths, or you could provide a function that mimics the existing I/O conversions for bytea, ugly as they are. In the case that Marko is describing, it seems to me he is providing two independent sets of encryption functions, one for text and one for bytea. That they happen to share code under the hood is an implementation detail of his code, not a reason to contort the type system. If someone wanted to add functions to encrypt, say, polygons, would you start looking for ways to create a binary equivalence between polygon and text? I sure hope not. regards, tom lane
On Sun, 24 Jun 2001, Marko Kreen wrote: > On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote: > > Marko Kreen <marko@l-t.ee> writes: > > > Question to -hackers: currently there is not possible to cast > > > bytea to text and vice-versa. Is this intentional or bug? > > > > Intentional. text and friends do not like embedded nulls. > > > > If there were a cast it would have to be one that implies > > an I/O conversion, just like any other type that contains > > non-textual data. > > Well, I have functions that should work on both - encode(), > digest(), hmac(). Probably should do then several entries. Ok. > > But what should be return type of decrypt()? I imagine well > situations where user wants to crypt both bytea and text data. > When there is even not a way to cast them to each other, then > he is stuck for no good reason. There SHOULD be a text_bytea function to cast a text as bytea, as it is always safe. (It doesn't exist yet, but its a trivial patch) Function to cast bytea as text, I think, should do proper checking that input did not contain nulls, and return text data back. Your encrypt/decrypt should take bytea and return bytea. Its user's responsibility to cast the things to bytea when needed.
Marko Kreen <marko@l-t.ee> writes: > But what should be return type of decrypt()? You'll need more than one name: decrypt to text, decrypt to bytea, etc. Think about what happens when you need to support additional types. Relying on implicit conversions or binary equivalence will not scale. regards, tom lane
> Alex Pilosov <alex@pilosoft.com> writes: > > Function to cast bytea as text, I think, should do proper checking that > > input did not contain nulls, and return text data back. > > That is most definitely not good enough. In MULTIBYTE installations > you'd have to also check that there were no illegal multibyte sequences. > > The whole approach seems misguided to me anyway. bytea isn't equivalent > to text and conversion functions based on providing incomplete binary > equivalence are fundamentally wrong. hex or base64 encode/decode > functions seem like reasonable conversion paths, or you could provide > a function that mimics the existing I/O conversions for bytea, ugly as > they are. He can create an output function just to text, and varchar, etc will work OK, right? I think the main issue is that char(), varchar(), text all input/output strings of the same format while bytea has special backslash handling for binary/null values. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 24 Jun 2001, Tom Lane wrote: > Alex Pilosov <alex@pilosoft.com> writes: > > Function to cast bytea as text, I think, should do proper checking that > > input did not contain nulls, and return text data back. > > That is most definitely not good enough. In MULTIBYTE installations > you'd have to also check that there were no illegal multibyte sequences. True, but see below. > The whole approach seems misguided to me anyway. bytea isn't equivalent > to text and conversion functions based on providing incomplete binary > equivalence are fundamentally wrong. hex or base64 encode/decode > functions seem like reasonable conversion paths, or you could provide > a function that mimics the existing I/O conversions for bytea, ugly as > they are. > > In the case that Marko is describing, it seems to me he is providing > two independent sets of encryption functions, one for text and one > for bytea. That they happen to share code under the hood is an > implementation detail of his code, not a reason to contort the type > system. If someone wanted to add functions to encrypt, say, polygons, > would you start looking for ways to create a binary equivalence between > polygon and text? I sure hope not. Well, encrypt/decrypt are special kinds of functions. When the data is decrypted, its type is not known, as it is not stored anywhere in the data. Caller is responsible to casting the result to whatever he needs to, thus, there must be some way to cast output of decrypted data to any type. I may be going a bit too far, but, if you think about it, if one wanted to encrypt a generic type t, these ar e the alternatives: a) to encrypt, caller must use encrypt(t_out(val)) and to decrypt t_in(decrypt(val)). Problem with that is non-existance of CSTRING datatype as of yet, and a possible inefficiency of it compared to b). b) make encrypt operate on 'opaque' type, and just encrypt raw data in memory, as many as there are, and store the original varlen separately. (most encrypt-decrypt algorithms do not preserve data length anyway, they operate in blocks of n bytes). Question in this situation what to do with decrypt, options are: b1) make decrypt return opaque and to allow conversion from opaque to any datatype, (by blindly setting the oid of return type), I'm not sure how hard is this one to do with current type system, and do not like safety of this since an ordinary user would be able to put garbage data into type that may not be prepared to handle it. b2) make encrypt store the name of original type in encrypted data. make decrypt return opaque which would contain (type,data,length) triple, and allow to cast opaque into any type but _checking_ that opaque has correct format and that type stored in opaque matches type its being cast to. This has additional benefit of being able to serialize/deserialize data, preserving type, which may be used by something else... In my opinion, a) is probably the easiest option to implement. b2) is (IMHO) the most correct one, but it may be a bit too much work for not that much of benefit? This may be going a bit too far, since original question only dealt with text-bytea conversions, but maybe its time to look at 'generic' functions which return generic types.
On Sun, Jun 24, 2001 at 06:20:39PM -0400, Alex Pilosov wrote: > On Sun, 24 Jun 2001, Tom Lane wrote: > > In the case that Marko is describing, it seems to me he is providing > > two independent sets of encryption functions, one for text and one > > for bytea. That they happen to share code under the hood is an > > implementation detail of his code, not a reason to contort the type > > system. If someone wanted to add functions to encrypt, say, polygons, > > would you start looking for ways to create a binary equivalence between > > polygon and text? I sure hope not. > > Well, encrypt/decrypt are special kinds of functions. When the data is > decrypted, its type is not known, as it is not stored anywhere in the > data. Caller is responsible to casting the result to whatever he needs to, > thus, there must be some way to cast output of decrypted data to any type. > > I may be going a bit too far, but, if you think about it, if one wanted to > encrypt a generic type t, these ar e the alternatives: [ ... bunch of good ideas ... ] I do not want to go that far and imagine current encrypt() as something low-level, that encrypts a unstructured array of 8bit values. That makes bytea as 'natural' type to use for it. I now took the Tom suggestion that all functions do not operate well on 8bit values - so now I declared that all my funtions that _do_ operate on 8bit values, get data as bytea. Btw, the length is preserved - I use padding if needed. But no additional info is preserved. Now, if you want to do something higher-level, in POV of PostgreSQL - to attach type data or something else, you can very well build some higher-level functions on encrypt() that add some additional structure for it. This is easy - you can do it in SQL level if you want, but I also tried to make all crypto stuff accesible from C level too. I do not think it belongs to current encrypt() - this is 'next level'. So I do not worry about encrypting polygons yet. Tho' current encrypt() has some 'negative' points on crypto POV. As it does basically pure cipher, and has no structure I cant use some higher features as key generation, attaching algorithm info to data and checksums. (Actually it _does_ support attaching a MD or HMAC to encrypted data, but I consider it as too hackish). So, ee, someday, when I have more time I would like to use current code as building block and do a minimal OpenPGP implementation that does support all of it. This again does not offer anything for 'generic types', but again I do not consider it job for that level. > This may be going a bit too far, since original question only dealt with > text-bytea conversions, but maybe its time to look at 'generic' functions > which return generic types. I did want to encrypt() etc. to operate on 'text' too, as it would be _very_ convinient, and they really are similar on POV of encrypt(). Hmm, on the other hand - Idea for 'generic types', taking account of PostgreSQL current type system - functions: pack(data::whatever)::bytea,unpack_text(data::bytea)::text,unpack_polygon(data::bytea)::polygon... pack() does a compact representation of data, with type attached unpack*() checks if it is of correct type and sane. It may be textual but this takes much room, binary is probably not portable. Eg. it could be done using *in(), *out() functions, maybe even keep the '\0', and prepends type info (oid/name). So later it could be given to encrypt()... ? -- marko