Thread: Bytea/Base64 encoders for libpq - interested?
Hi! Please find attached some very simple encoders/decoders for bytea and base64. Bytea encoder is very picky about what it leaves unescaped - basically the base64 char set ;-) Since this seems to be a very poorly documented but much asked-for thing, I thought you would maybe like to add this code to libpq (so that everyone benefits). I'm aware that function renames might be necessary, though. If you like, I could make the code fit into libpq, and send diffs. Any comments/interests? Greetings, Joerg
Attachment
On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote: > Hi! > > Please find attached some very simple encoders/decoders for bytea and base64. > Bytea encoder is very picky about what it leaves unescaped - basically the > base64 > char set ;-) > > Since this seems to be a very poorly documented but much asked-for thing, I > thought > you would maybe like to add this code to libpq (so that everyone benefits). > > I'm aware that function renames might be necessary, though. > If you like, I could make the code fit into libpq, and send diffs. > > Any comments/interests? What implement base64 PostgreSQL datetype that use externaly base64 and internaly same things as bytea. It prevent FE and parser problems with "bad" chars and internaly for data storage save less space than text with base64. Of course it doesn't solve a problem with encoding/decoding data in your application to/from base64. May be implement for this datetype cast to/from bytea too. SELECT my_bytea::base64 FROM foo; INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea); And you can still fetch all data directly in batea by binary cursor. Comments? Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
At 11:55 AM 28-08-2001 +0200, Karel Zak wrote: > > What implement base64 PostgreSQL datetype that use externaly base64 and >internaly same things as bytea. It prevent FE and parser problems with >"bad" chars and internaly for data storage save less space than text >with base64. Of course it doesn't solve a problem with encoding/decoding >data in your application to/from base64. May be implement for this >datetype cast to/from bytea too. > > SELECT my_bytea::base64 FROM foo; > > INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea); > > And you can still fetch all data directly in batea by binary cursor. > > Comments? Sounds good to me. Even better if the base64 parser is bulletproof and tolerant of junk. That way base64 email attachments may not even need to be processed much - just filter a bit and shove it in :). But shouldn't there be a ::base64 somewhere in the insert statement? Cheerio, Link.
At 11:55 AM 28-08-2001 +0200, Karel Zak wrote: > What implement base64 PostgreSQL datetype that use externaly base64 and >internaly same things as bytea. It prevent FE and parser problems with Another point: I have no problems with base64[1]. However I was thinking that it might be far easier for the C/C++/Java (and other low level languages) bunch to do hexadecimal. e.g. zero zero for null, zero A for line feed. It expands things in the input/output stream, but it might be worth some consideration. Simplicity, cpu usage etc. Cheerio, Link. [1] OK, I can't convert base64 to ASCII mentally yet. But I don't think that should really a factor.
Where did we leave this? > On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote: > > Hi! > > > > Please find attached some very simple encoders/decoders for bytea and base64. > > Bytea encoder is very picky about what it leaves unescaped - basically the > > base64 > > char set ;-) > > > > Since this seems to be a very poorly documented but much asked-for thing, I > > thought > > you would maybe like to add this code to libpq (so that everyone benefits). > > > > I'm aware that function renames might be necessary, though. > > If you like, I could make the code fit into libpq, and send diffs. > > > > Any comments/interests? > > What implement base64 PostgreSQL datetype that use externaly base64 and > internaly same things as bytea. It prevent FE and parser problems with > "bad" chars and internaly for data storage save less space than text > with base64. Of course it doesn't solve a problem with encoding/decoding > data in your application to/from base64. May be implement for this > datetype cast to/from bytea too. > > SELECT my_bytea::base64 FROM foo; > > INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea); > > And you can still fetch all data directly in batea by binary cursor. > > Comments? > > Karel > -- > Karel Zak <zakkr@zf.jcu.cz> > http://home.zf.jcu.cz/~zakkr/ > > C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Where did we leave this? I don't think adding a datatype just to provide base64 encoding is a wise approach. The overhead of a new datatype (in the sense of providing operators/functions for it) will be much more than the benefit. I think providing encode/decode functions is sufficient... and we have those already, don't we? regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Where did we leave this? > > I don't think adding a datatype just to provide base64 encoding is > a wise approach. The overhead of a new datatype (in the sense of > providing operators/functions for it) will be much more than the > benefit. I think providing encode/decode functions is sufficient... > and we have those already, don't we? Agreed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I don't think adding a datatype just to provide base64 encoding is > a wise approach. The overhead of a new datatype (in the sense of > providing operators/functions for it) will be much more than the > benefit. I think providing encode/decode functions is sufficient... > and we have those already, don't we? > It might be nice to have a PQbyteaEscape or some such function available in the libpq client library so that arbitrary binary could be escaped on the client side and used in a sql statement. I actually wrote this already as an addition to the PHP PostgreSQL extension, but it would make more sense, now that I think about it, for it to be in libpq and called from PHP (or whatever). Comments? On a related note, are there any other bytea functions we should have in the backend before freezing for 7.2? I was thinking it would be nice to have a way to cast bytea into text and vice-versa, so that the normal text functions could be used for things like LIKE and concatenation. Any interest in this? If so, any guidance WRT how it should be implemented? -- Joe
> > I don't think adding a datatype just to provide base64 encoding is > > a wise approach. The overhead of a new datatype (in the sense of > > providing operators/functions for it) will be much more than the > > benefit. I think providing encode/decode functions is sufficient... > > and we have those already, don't we? > > > > It might be nice to have a PQbyteaEscape or some such function available in > the libpq client library so that arbitrary binary could be escaped on the > client side and used in a sql statement. I actually wrote this already as an > addition to the PHP PostgreSQL extension, but it would make more sense, now > that I think about it, for it to be in libpq and called from PHP (or > whatever). Comments? Good idea. I will commit the non-bytea escape in a day and you can base a bytea one on that. You will have to pass in the length of the field because of course it is not null terminated. > On a related note, are there any other bytea functions we should have in the > backend before freezing for 7.2? I was thinking it would be nice to have a > way to cast bytea into text and vice-versa, so that the normal text > functions could be used for things like LIKE and concatenation. Any interest > in this? If so, any guidance WRT how it should be implemented? I can't see why you can't do that. The only problem is passing a \0 (null byte) back to the client. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Joe Conway" <joseph.conway@home.com> writes: > I was thinking it would be nice to have a > way to cast bytea into text and vice-versa, How will you handle a null byte in bytea data? Transforming it directly into an embedded null in a text object is NOT an acceptable answer, because too many of the text functions will misbehave on such data. regards, tom lane
> > It might be nice to have a PQbyteaEscape or some such function available in > > the libpq client library so that arbitrary binary could be escaped on the > > client side and used in a sql statement. I actually wrote this already as an > > addition to the PHP PostgreSQL extension, but it would make more sense, now > > that I think about it, for it to be in libpq and called from PHP (or > > whatever). Comments? > > Good idea. I will commit the non-bytea escape in a day and you can base > a bytea one on that. You will have to pass in the length of the field > because of course it is not null terminated. OK. > > > On a related note, are there any other bytea functions we should have in the > > backend before freezing for 7.2? I was thinking it would be nice to have a > > way to cast bytea into text and vice-versa, so that the normal text > > functions could be used for things like LIKE and concatenation. Any interest > > in this? If so, any guidance WRT how it should be implemented? > > I can't see why you can't do that. The only problem is passing a \0 > (null byte) back to the client. Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text would be a function that takes the escaped string value from byteaout, and creates a text value directly from it. The only danger I can think of is that very long strings might need to be truncated in length, since the escaped string could be significantly longer than the binary. Text-to-bytea should be a straight copy, since nothing that can be represented as text cannot be represented as bytea. Any comments or concerns? -- Joe
On Mon, Sep 03, 2001 at 08:48:22PM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Where did we leave this? > > I don't think adding a datatype just to provide base64 encoding is > a wise approach. The overhead of a new datatype (in the sense of > providing operators/functions for it) will be much more than the > benefit. I think providing encode/decode functions is sufficient... > and we have those already, don't we? Agree too. But 1000 "bad" chars encoded by base64 vs. encoded by escape, what is longer and more expensive for transfer between FE and BE? A base64 problem is that encode all chars in string, but in the real usage some data contains "bad" chars occasional only. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Joe Conway writes: > On a related note, are there any other bytea functions we should have in the > backend before freezing for 7.2? The SQL standards has a lot of functions for BLOB... > I was thinking it would be nice to have a > way to cast bytea into text and vice-versa, so that the normal text > functions could be used for things like LIKE and concatenation. Better write a native LIKE function for bytea, now that some parts are threatening to make the text-LIKE function use the locale collating sequence. (Multibyte aware text could also have interesting effects.) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
"Joe Conway" <joseph.conway@home.com> writes: > Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text > would be a function that takes the escaped string value from byteaout, and > creates a text value directly from it. The only danger I can think of is > that very long strings might need to be truncated in length, since the > escaped string could be significantly longer than the binary. > Text-to-bytea should be a straight copy, since nothing that can be > represented as text cannot be represented as bytea. Ugh ... if the conversion functions are not inverses then I think they lose much of their value. I could see doing either of these: 1. Conversion functions based on byteaout/byteain. 2. Bytea to text escapes *only* null bytes, text to bytea treats only "\0" as an escape sequence. Or maybe both, with two pairs of conversion functions. In any case, we have to decide whether these coercion functions should be named after the types --- ie, should they be made invokable as implicit coercions? I'm dubious that that's a good idea; if we do it then all sorts of textual operations will suddenly be allowed for bytea without any explicit conversion, which is likely to do more harm than good. The reason for having a separate bytea type is exactly so that you *can't* apply text ops to it without thinking. regards, tom lane
> > On a related note, are there any other bytea functions we should have in the > > backend before freezing for 7.2? > > The SQL standards has a lot of functions for BLOB... > OK - thanks. I'll take a look. > > I was thinking it would be nice to have a > > way to cast bytea into text and vice-versa, so that the normal text > > functions could be used for things like LIKE and concatenation. > > Better write a native LIKE function for bytea, now that some parts are > threatening to make the text-LIKE function use the locale collating > sequence. (Multibyte aware text could also have interesting effects.) > Sounds like good advice. I'll try to get both the cast functions and a native bytea LIKE function done. -- Joe
> Ugh ... if the conversion functions are not inverses then I think they > lose much of their value. I could see doing either of these: > > 1. Conversion functions based on byteaout/byteain. > > 2. Bytea to text escapes *only* null bytes, text to bytea treats only > "\0" as an escape sequence. > > Or maybe both, with two pairs of conversion functions. > > In any case, we have to decide whether these coercion functions should > be named after the types --- ie, should they be made invokable as > implicit coercions? I'm dubious that that's a good idea; if we do it > then all sorts of textual operations will suddenly be allowed for bytea > without any explicit conversion, which is likely to do more harm than > good. The reason for having a separate bytea type is exactly so that > you *can't* apply text ops to it without thinking. > > regards, tom lane You're right, as usual (I was tired when I wrote this last night ;). But I think we have to escape/unescape both null and '\', don't we? I agree that it would be better to *not* allow implicit coercions. Given that, any preferences on function names? Are text_to_bytea() and bytea_to_text() too ugly? -- Joe
"Joe Conway" <joseph.conway@home.com> writes: > You're right, as usual (I was tired when I wrote this last night ;). But I > think we have to escape/unescape both null and '\', don't we? Yeah, you're right. My turn to have not thought hard enough. > I agree that it would be better to *not* allow implicit coercions. Given > that, any preferences on function names? Are text_to_bytea() and > bytea_to_text() too ugly? They're pretty ugly, but more importantly they're only suitable if we have exactly one conversion function each way. If we have two, what will we call the second one? I think it's okay to let the argument type be implicit in the function argument list. Something like text_escaped(bytea) and text_direct(bytea) (with inverses bytea_escaped(text) and bytea_direct(text)) might do. I'm not totally happy with "direct" to suggest minimum escaping, though. Better ideas anyone? regards, tom lane
> You're right, as usual (I was tired when I wrote this last night ;). But I > think we have to escape/unescape both null and '\', don't we? Yes, I think backslashes need special escapes too. Let me ask a bigger question. We have the length of the text string in the varlena header. Are we concerned about backend code not handling NULL in text fields, or frontend code returning strings with embedded nulls? I see problems in the text() functions for nulls, but is such a limitation required for text types? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
* Tom Lane <tgl@sss.pgh.pa.us> [010904 12:01]: > They're pretty ugly, but more importantly they're only suitable if we > have exactly one conversion function each way. If we have two, what > will we call the second one? > > I think it's okay to let the argument type be implicit in the function > argument list. Something like text_escaped(bytea) and text_direct(bytea) > (with inverses bytea_escaped(text) and bytea_direct(text)) might do. > I'm not totally happy with "direct" to suggest minimum escaping, though. > Better ideas anyone? Cooked vs raw? LER > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Let me ask a bigger question. We have the length of the text string in > the varlena header. Are we concerned about backend code not handling > NULL in text fields, or frontend code returning strings with embedded > nulls? The former. > I see problems in the text() functions for nulls, but is such a > limitation required for text types? Unless you want to re-implement strcoll() and friends from scratch. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Let me ask a bigger question. We have the length of the text string in > > the varlena header. Are we concerned about backend code not handling > > NULL in text fields, or frontend code returning strings with embedded > > nulls? > > The former. > > > I see problems in the text() functions for nulls, but is such a > > limitation required for text types? > > Unless you want to re-implement strcoll() and friends from scratch. Yes, I saw strcoll(). -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut <peter_e@gmx.net> writes: > Why not just stick these things into encode() and name them > "my-cool-encoding" or whatever. Sounds good to me ... regards, tom lane
> Peter Eisentraut <peter_e@gmx.net> writes: > > Why not just stick these things into encode() and name them > > "my-cool-encoding" or whatever. > > Sounds good to me ... > > regards, tom lane > Sounds good to me too. Patch forthcoming . . . -- Joe
Tom Lane writes: > > I agree that it would be better to *not* allow implicit coercions. Given > > that, any preferences on function names? Are text_to_bytea() and > > bytea_to_text() too ugly? > > They're pretty ugly, but more importantly they're only suitable if we > have exactly one conversion function each way. If we have two, what > will we call the second one? Why not just stick these things into encode() and name them "my-cool-encoding" or whatever. There is no truly natural conversion between text and bytea, so encode/decode seem like the proper place. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter