Thread: [RFC] new digest datatypes, or generic fixed-len hex types?
Hi, We've developed some code to implement fixed-length datatypes for well known digest function output (MD5, SHA1 and the various SHA2 types). These types have minimal overhead and are quite complete, including btree and hash opclasses. We're wondering about proposing them for inclusion in pgcrypto. I asked Marko Kreen but he is not sure about it; according to him it would be better to have general fixed-length hex types. (I guess it would be possible to implement the digest types as domains over those.) So basically we have sha1, sha-256, sha-512 etc on one hand, and hex8, hex16, hex32 on the other hand. In both cases there is a single body of code that is compiled with a macro definition that provides the data length for every separate case. (Actually in the digest code we refactored the common routines so that each type has a light wrapper calling a function that works on any length; this could also be done to the fixed-len hex code as well -- that code is pretty grotty at the moment.) Of these two choices, which one is likely to have better acceptance around here? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > We've developed some code to implement fixed-length datatypes for well > known digest function output (MD5, SHA1 and the various SHA2 types). > These types have minimal overhead and are quite complete, including > btree and hash opclasses. > We're wondering about proposing them for inclusion in pgcrypto. Wasn't this proposed and rejected before? (Or more to the point, why'd you bother? The advantage over bytea seems negligible.) regards, tom lane
On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> We've developed some code to implement fixed-length datatypes for well >> known digest function output (MD5, SHA1 and the various SHA2 types). >> These types have minimal overhead and are quite complete, including >> btree and hash opclasses. > >> We're wondering about proposing them for inclusion in pgcrypto. > > Wasn't this proposed and rejected before? (Or more to the point, > why'd you bother? The advantage over bytea seems negligible.) well, one nice things about the fixed length types is that you can keep your table from needing a toast table when you have a bytea in it. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Wasn't this proposed and rejected before? �(Or more to the point, >> why'd you bother? �The advantage over bytea seems negligible.) > well, one nice things about the fixed length types is that you can > keep your table from needing a toast table when you have a bytea in > it. If you don't actually use the toast table, it doesn't cost anything very noticeable ... regards, tom lane
Merlin Moncure wrote: > On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > >> Alvaro Herrera <alvherre@commandprompt.com> writes: >> >>> We've developed some code to implement fixed-length datatypes for well >>> known digest function output (MD5, SHA1 and the various SHA2 types). >>> These types have minimal overhead and are quite complete, including >>> btree and hash opclasses. >>> >>> We're wondering about proposing them for inclusion in pgcrypto. >>> >> Wasn't this proposed and rejected before? (Or more to the point, >> why'd you bother? The advantage over bytea seems negligible.) >> > > well, one nice things about the fixed length types is that you can > keep your table from needing a toast table when you have a bytea in > it. > > > Can't you just set storage on the column to MAIN to stop it being stored in a toast table? cheers andrew
On Mon, Jul 27, 2009 at 12:02 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > Merlin Moncure wrote: >> >> On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >>> >>> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> >>>> >>>> We've developed some code to implement fixed-length datatypes for well >>>> known digest function output (MD5, SHA1 and the various SHA2 types). >>>> These types have minimal overhead and are quite complete, including >>>> btree and hash opclasses. >>>> We're wondering about proposing them for inclusion in pgcrypto. >>>> >>> >>> Wasn't this proposed and rejected before? (Or more to the point, >>> why'd you bother? The advantage over bytea seems negligible.) >>> >> >> well, one nice things about the fixed length types is that you can >> keep your table from needing a toast table when you have a bytea in >> it. > > Can't you just set storage on the column to MAIN to stop it being stored in > a toast table? of course. hm. would the input/output functions for the fixed length types be faster? what is the advantage of the proposal? merlin
On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote: > We've developed some code to implement fixed-length datatypes for well > known digest function output (MD5, SHA1 and the various SHA2 types). > These types have minimal overhead and are quite complete, including > btree and hash opclasses. > > We're wondering about proposing them for inclusion in pgcrypto. I asked > Marko Kreen but he is not sure about it; according to him it would be > better to have general fixed-length hex types. (I guess it would be > possible to implement the digest types as domains over those.) I think equipping bytea with a length restriction would be a very natural, simple, and useful addition. If we ever want to move the bytea type closer to the SQL standard blob type, this will need to happen anyway. The case for separate fixed-length data types seems very dubious, unless you can show very impressive performance numbers. For one thing, they would make the whole type system more complicated, or in the alternative, would have little function and operator support.
On Tue, Jul 28, 2009 at 7:15 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote: >> We've developed some code to implement fixed-length datatypes for well >> known digest function output (MD5, SHA1 and the various SHA2 types). >> These types have minimal overhead and are quite complete, including >> btree and hash opclasses. >> > I think equipping bytea with a length restriction would be a very natural, > simple, and useful addition. If we ever want to move the bytea type closer to > the SQL standard blob type, this will need to happen anyway. > +1 merlin
On Jul 28, 2009, at 6:15 AM, Peter Eisentraut wrote: > On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote: >> We've developed some code to implement fixed-length datatypes for >> well >> known digest function output (MD5, SHA1 and the various SHA2 types). >> These types have minimal overhead and are quite complete, including >> btree and hash opclasses. >> >> We're wondering about proposing them for inclusion in pgcrypto. I >> asked >> Marko Kreen but he is not sure about it; according to him it would be >> better to have general fixed-length hex types. (I guess it would be >> possible to implement the digest types as domains over those.) > > I think equipping bytea with a length restriction would be a very > natural, > simple, and useful addition. If we ever want to move the bytea > type closer to > the SQL standard blob type, this will need to happen anyway. > > The case for separate fixed-length data types seems very dubious, > unless you > can show very impressive performance numbers. For one thing, they > would make > the whole type system more complicated, or in the alternative, > would have > little function and operator support. bytea doesn't cast well to and from text when you're dealing with hex data; you end up using the same amount of space as a varchar. What would probably work well is a hex datatype that internally works like bytea but requires that the input data is hex (I know you can use encode/decode, but that added step is a pain). A similar argument could be made for base64 encoded data. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
On Wednesday 29 July 2009 20:16:48 decibel wrote: > bytea doesn't cast well to and from text when you're dealing with hex > data; you end up using the same amount of space as a varchar. What > would probably work well is a hex datatype that internally works like > bytea but requires that the input data is hex (I know you can use > encode/decode, but that added step is a pain). A similar argument > could be made for base64 encoded data. There is a patch in the queue that adds hex input and output to bytea.