Thread: Fixed length datatypes. WAS [GENERAL] UUID's as primary keys
Martijn van Oosterhout wrote: > On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote: >> A user that is trusted with installing a C-function in the backend is >> free to scan the process memory anyway so in what way did that increase >> the security? IMHO, the only relevant security in that context is to >> have trusted people install trusted modules. I'm surprised that >> something like that made you remove significant functionality. > > You're missing the point. The type output function is not generally a > priveledged function. Think bpcharout, text_out, numeric_out, etc... > These can be called by users directly and the input to those functions > cannot be trusted. > Ah, OK that makes sense. An alternative solution when the signature was changed could perhaps have been to pass one single argument, a structure appointing the data and its associated type. My idea would work if the data and its type lived together always from the moment its instantiated (read from disk or otherwise) and until death do them apart (or the data is stored on disk, in which case the tupledesc knows what it is). I guess that would imply a major rewrite and that my desire to have a RAW fixed length type isn't enough motivation to do that :-) Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128 bit datatype) in the core package. It's increasingly common and some databases (MS SQLServer) already have built in support for it. Regards, Thomas Hallgren
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: > Instead, I would like to humbly request the inclusion of a UUID datatype > (or an opaque 128 bit datatype) in the core package. It's increasingly ISTM that we get enough requests for this that it's probably worth doing. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: > Instead, I would like to humbly request the inclusion of a UUID datatype > (or an opaque 128 bit datatype) in the core package. It's increasingly > common and some databases (MS SQLServer) already have built in support for > it. We have it. We're just not putting in the effort required to have it included in core, as it's too much effort to convince people that the type has value, that is is generic, and would be widely used without being abused. All the geometric types that I'll never use in core, with few or no uses, including functions to operate on these types, and no UUID type... Hehe... To me, that's irony... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote: > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: > >> Instead, I would like to humbly request the inclusion of a UUID >> datatype (or an opaque 128 bit datatype) in the core package. It's >> increasingly common and some databases (MS SQLServer) already have built >> in support for it. > > We have it. We're just not putting in the effort required to have it > included in core, as it's too much effort to convince people that the type > has value, that is is generic, and would be widely used without being > abused. All the geometric types that I'll never use in core, with few or > no uses, including functions to operate on these types, and no UUID > type... Hehe... To me, that's irony... :-) Is it on pgfoundry? From past discussions, the new criteria for getting something into core is to first determine if it is successful on pgfoundry.
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote: > On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote: > > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: > > > >> Instead, I would like to humbly request the inclusion of a UUID > >> datatype (or an opaque 128 bit datatype) in the core package. It's > >> increasingly common and some databases (MS SQLServer) already have built > >> in support for it. > > > > We have it. We're just not putting in the effort required to have it > > included in core, as it's too much effort to convince people that the type > > has value, that is is generic, and would be widely used without being > > abused. All the geometric types that I'll never use in core, with few or > > no uses, including functions to operate on these types, and no UUID > > type... Hehe... To me, that's irony... :-) > > Is it on pgfoundry? From past discussions, the new criteria for getting > something into core is to first determine if it is successful on > pgfoundry. If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever downloaded it. But I find that exceptionally hard to believe... Looking back through the list archives I think you'd find this comes up at least every few months. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote: > On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote: > > On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote: > > > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: > > >> Instead, I would like to humbly request the inclusion of a UUID > > >> datatype (or an opaque 128 bit datatype) in the core package. It's > > >> increasingly common and some databases (MS SQLServer) already have built > > >> in support for it. > > > We have it. We're just not putting in the effort required to have it > > > included in core, as it's too much effort to convince people that the type > > > has value, that is is generic, and would be widely used without being > > > abused. All the geometric types that I'll never use in core, with few or > > > no uses, including functions to operate on these types, and no UUID > > > type... Hehe... To me, that's irony... :-) > > Is it on pgfoundry? From past discussions, the new criteria for getting > > something into core is to first determine if it is successful on > > pgfoundry. > If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever > downloaded it. But I find that exceptionally hard to believe... > > Looking back through the list archives I think you'd find this comes up > at least every few months. I've downloaded the version off pgfoundry.org. It is broken. It leaks memory, and if memory is correct it can cause the client to core dump. Two of us worked on a re-write based off a different UUID system library, and I've been happily using it in production for a year or so. I don't believe either of us have bothered to market it. Each time it comes up, a number of people on this list shut it down, and it doesn't seem worth the effort to convince them otherwise. They can have their ivory tower, and I can have my plugin. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Jim C. Nasby wrote: >On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote: > > >>On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote: >> >> >>>On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote: >>> >>> >>> >>>>Instead, I would like to humbly request the inclusion of a UUID >>>>datatype (or an opaque 128 bit datatype) in the core package. It's >>>>increasingly common and some databases (MS SQLServer) already have built >>>>in support for it. >>>> >>>> >>>We have it. We're just not putting in the effort required to have it >>>included in core, as it's too much effort to convince people that the type >>>has value, that is is generic, and would be widely used without being >>>abused. All the geometric types that I'll never use in core, with few or >>>no uses, including functions to operate on these types, and no UUID >>>type... Hehe... To me, that's irony... :-) >>> >>> >>Is it on pgfoundry? From past discussions, the new criteria for getting >>something into core is to first determine if it is successful on >>pgfoundry. >> >> > >If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever >downloaded it. But I find that exceptionally hard to believe... > >Looking back through the list archives I think you'd find this comes up >at least every few months. > > That's because there is nothing there to download. See instead: http://gborg.postgresql.org/project/pguuid/projdisplay.php Personally I don't buy the misuse objection - we already have plenty of things that can be misused. As long as there is a reasonable valid use and we can make it portable enough, I think there is a good case for including it. cheers andrew
mark@mark.mielke.cc wrote: > I've downloaded the version off pgfoundry.org. It is broken. It leaks > memory, and if memory is correct it can cause the client to core dump. Also it couldn't possibly be included in core, since it's based on a GPL'ed UUID library. If you have a more appropiately licensed package, it could be considered for inclusion. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote: > Personally I don't buy the misuse objection - we already have plenty of > things that can be misused. As long as there is a reasonable valid use > and we can make it portable enough, I think there is a good case for > including it. Well, since Mark has one, how about we consider adding it in? If nothing else, can you please put your stuff on pgFoundry so others can find it, Mark? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote: > On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote: > > Personally I don't buy the misuse objection - we already have plenty of > > things that can be misused. As long as there is a reasonable valid use > > and we can make it portable enough, I think there is a good case for > > including it. > Well, since Mark has one, how about we consider adding it in? > If nothing else, can you please put your stuff on pgFoundry so others > can find it, Mark? It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. I'm not an expert on the license, but it seems acceptable to me: "Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, providedthat the above copyright notice and this permission notice appear in all copies." I haven't tested to see how portable the OSSP UUID implementation is. This is their words: "OSSP uuid was already written with maximum portability in mind, so there should be no great effort required to get it runningon any Unix platform with a reasonable POSIX API. Additionally, the portability was tested by successfully buildingand running it on the following particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"): alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc) ia64-hpux11.23 (cc) ix86-debian2.2(gcc, icc) ix86-debian3.0 (gcc) ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2(gcc, icc) ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3(cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc)" I've put it through a fair amount of testing, including using it within compound indexes, expecting the index to be used for at least '=', constructing many UUIDs quickly, in a sequence, and converting it to and from string form. We chose to implement our own encode / decode routines for performance reasons. With the exception of testing it on a wider range of platforms, I would call the module stable. If there is interest - I'm sure Nathan and I would be willing to put it on pgfoundry, and at some point give it up for inclusion into PostgreSQL. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote: > It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and > is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. > I'm not an expert on the license, but it seems acceptable to me: > ... > If there is interest - I'm sure Nathan and I would be willing to put > it on pgfoundry, and at some point give it up for inclusion into > PostgreSQL. This might require a little bit of research. It appears that the development version of OSSP UUID may provide its own PostgreSQL 'bindings'. I may try and contact the author of the OSSP UUID and see whether any changes we have that he does not, can be rolled into his version... Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Jun 28, 2006, at 10:14 AM, mark@mark.mielke.cc wrote: > All the geometric types that I'll never use in core, > with few or no uses, including functions to operate on these types, > and no UUID type... Hehe... To me, that's irony... :-) Interestingly, the superior geometry capability is driving a lot of recent migration from MySQL to PostgreSQL in my own experience, especially with PostGIS. The geometry parts may not get as much love as other parts, but they still get to leverage the very solid foundation they are built on top of. The geometry capability of MySQL is basically checklist in nature, as it lacks the more sophisticated indexing and query execution that is really required to get passable performance from queries with geometry in them. MySQL has similar geometry capability to PostgreSQL in theory if you don't look too closely, but in practice the engine is not up to the more rigorous demands of that kind of work. With the nascent rise of the geospatial web, it is going to become a lot more important than it has been. J. Andrew Rogers jrogers@neopolitan.com
mark@mark.mielke.cc wrote: > On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote: >> On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote: >>> Personally I don't buy the misuse objection - we already have plenty of >>> things that can be misused. As long as there is a reasonable valid use >>> and we can make it portable enough, I think there is a good case for >>> including it. >> Well, since Mark has one, how about we consider adding it in? >> If nothing else, can you please put your stuff on pgFoundry so others >> can find it, Mark? > > It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and > is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. > I'm not an expert on the license, but it seems acceptable to me: > > "Permission to use, copy, modify, and distribute this software for > any purpose with or without fee is hereby granted, provided that > the above copyright notice and this permission notice appear in all > copies." > > I haven't tested to see how portable the OSSP UUID implementation is. > This is their words: > > "OSSP uuid was already written with maximum portability in mind, so > there should be no great effort required to get it running on any Unix > platform with a reasonable POSIX API. Additionally, the portability > was tested by successfully building and running it on the following > particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"): > > alpha-tru644.0 (cc) > alpha-tru645.1 (gcc, cc) > hppa-hpux11.11 (cc) > ia64-hpux11.23 (cc) > ix86-debian2.2 (gcc, icc) > ix86-debian3.0 (gcc) > ix86-debian3.1 (gcc) > ix86-freebsd4.9 (gcc) > ix86-freebsd5.2 (gcc, icc) > ix86-netbsd1.6 (gcc) > ix86-qnx6.2 (gcc) > ix86-solaris10 (gcc) > ix86-unixware7.1.3 (cc) > mips64-irix6.5 (gcc) > sparc64-solaris8 (gcc, forte) > sparc64-solaris9 (gcc)" > > I've put it through a fair amount of testing, including using it > within compound indexes, expecting the index to be used for at > least '=', constructing many UUIDs quickly, in a sequence, and > converting it to and from string form. We chose to implement our > own encode / decode routines for performance reasons. With the > exception of testing it on a wider range of platforms, I would > call the module stable. > > If there is interest - I'm sure Nathan and I would be willing to put > it on pgfoundry, and at some point give it up for inclusion into > PostgreSQL. > One requirement would be that it runs on Windows. Is that something you have tested? Regards, Thomas Hallgren
mark@mark.mielke.cc wrote: > On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote: >> It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and >> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation. >> I'm not an expert on the license, but it seems acceptable to me: >> ... >> If there is interest - I'm sure Nathan and I would be willing to put >> it on pgfoundry, and at some point give it up for inclusion into >> PostgreSQL. > > This might require a little bit of research. It appears that the > development version of OSSP UUID may provide its own PostgreSQL > 'bindings'. I may try and contact the author of the OSSP UUID and > see whether any changes we have that he does not, can be rolled > into his version... > > Cheers, > mark > I'm thinking ahead on possible objections to inclusion in core. One objection might be that a fully blown UUID implementation is a lot of code. Code that needs to be maintained and it increases the size of the binary etc. A solution to that might be to break the whole thing up in two: 1 The actual type A plain scalar type that stores 16 bytes. It's complete with standard operators for comparison (natural order) and the text representation would be a 32 character hexadecimal string. This type should make no interpretation whatsoever on what it stores and its only association with UUID's is the storage size. 2 UUID utilities Various ways of representing, generating, and extract partial information from UUID's. Should have support for variants #0, #1, and #2 (the OSSP based code sounds like a good candidate). The split make sense since clients often have powerful UUID utilities handy and hence have limited or no use for such utilities in the database (true for all .NET and Java clients). Some PL's will also enable such packages out of the box. The actual type would be extremely generic, lightweight, and easy to implement. No portability issues whatsoever. The only difficulty that I see is naming it :-). Regards, Thomas Hallgren
>> If there is interest - I'm sure Nathan and I would be willing to put >> it on pgfoundry, and at some point give it up for inclusion into >> PostgreSQL. >> > One requirement would be that it runs on Windows. Is that something you > have tested? In case it influences anyone, MySQL 5 already has built-in UUID support: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#id2899901 Chris
On Thu, Jun 29, 2006 at 09:12:32AM +0200, Thomas Hallgren wrote: > The split make sense since clients often have powerful UUID utilities handy > and hence have limited or no use for such utilities in the database (true > for all .NET and Java clients). Some PL's will also enable such packages > out of the box. I agree about splitting the utilities, except that I think the database should be able to generate UUIDs somehow. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim, > I agree about splitting the utilities, except that I think the database > should be able to generate UUIDs somehow. There is a GUID add-in, and someone is working on a 2nd one. UUIDs are not part of the SQL standard, and we've only seen sporadic demand for them (and different types each time) so I can't imagine one making it further than contrib real soon. Also, one could argue that UUIDs are a foot gun, so they're not exactly the type of thing we want to advocate in advance of demand. --Josh Berkus
Josh Berkus wrote:> Jim,>>> I agree about splitting the utilities, except that I think the database>> should be able to generateUUIDs somehow.>> There is a GUID add-in, and someone is working on a 2nd one. UUIDs are not part of the SQL standard, and we've only seen sporadic demand for them (and different types each time) so I can't imagine one making it further than contrib real soon.>> Also, one could argue that UUIDs are a foot gun, so they're not exactly the type of thing we want to advocate in advance of demand.> Martijn van Oosterhout wrote:> It seems to me that maybe the backend should include a 16-byte fixed> length object (afterall, we've got 1, 2, 4 and 8 bytes already) and> then people can use that to build whatever they like, using domains,>for example...> So how about the split? I.e. just add a 16 byte data type and forget all about UUID's for now. Regards, Thomas Hallgren
On Fri, Jun 30, 2006 at 08:53:28AM +0200, Thomas Hallgren wrote: > Josh Berkus wrote: > >> I agree about splitting the utilities, except that I think the database > >> should be able to generate UUIDs somehow. > > There is a GUID add-in, and someone is working on a 2nd one. UUIDs > > are not part of the SQL standard, and we've only seen sporadic demand > > for them (and different types each time) so I can't imagine one making > > it further than contrib real soon. > > > > Also, one could argue that UUIDs are a foot gun, so they're not > > exactly the type of thing we want to advocate in advance of demand. Josh: Although PostgreSQL is easy to extend - it is still beyond many people to put in the time required to learn how. The demand exists. It's the supply that doesn't. People work around the problem. Perhaps they are using one of the sequence number 'tricks' such as having each site be allocated a range, or modulus. I was willing to learn how to implement a UUID type, and not willing to use one of these sequence number hacks. If you want to call UUID a foot gun - then please call sequence numbers hacks a foot gun as well, to be fair, and then we can start to talk about how a theoretically perfect system should work. If UUID existed in core, people would use it. It would be used, and it would be abused - like most other PostgreSQL features. There would be nothing unique about this. And what's the real danger anyways? People who should be using 4 byte or 8 byte sequences, find that they lose a little performance, and that their databases are larger than they expected? UUID is designed not to collide. So what is the real danger you are speaking about? How bad can they shoot themselves? > Martijn van Oosterhout wrote: > > It seems to me that maybe the backend should include a 16-byte fixed > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and > > then people can use that to build whatever they like, using domains, > > for example... > So how about the split? I.e. just add a 16 byte data type and forget all > about UUID's for now. Martijn: Were you thinking that it would look like a really big integer, displayed by default as a decimal string in the client? This makes sense to me. If it was a full data type - could it be passed around in memory by value, and not as a pointer? Or is 16 bytes too big to pass around by value? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote: > > Martijn van Oosterhout wrote: > > > It seems to me that maybe the backend should include a 16-byte fixed > > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and > > > then people can use that to build whatever they like, using domains, > > > for example... > > So how about the split? I.e. just add a 16 byte data type and forget all > > about UUID's for now. > > Martijn: Were you thinking that it would look like a really big integer, > displayed by default as a decimal string in the client? > > This makes sense to me. Either that, or a hex string. My problem with displaying as integer is that not many clients will be able to parse (or print) a 16-byte integer (the C library doesn't do it), but anyone can write a hex-to-binary converter, or convince scanf/printf to do it for them. > If it was a full data type - could it be passed around in memory by > value, and not as a pointer? Or is 16 bytes too big to pass around by > value? You can't pass it by value (doesn't fit in a register on the CPU and there is no corrosponding C type), and I'm not sure you'd want to. A pointer is much easier and faster to pass around. The other thing I was thinking of is a type generator, like so: # select make_raw_hex_type(16,'uuid'); NOTICE: Created raw hex type 'uuid' of fixed length 16make_raw_hex_type ------------------- (0 rows) # select '1234FF'::uuid; ERROR: Bad length for type 'uuid' # select 'hello world'::uuid; ERROR: Invalid characters for type 'uuid' # select '1234567890abcdef'::uuid; ?column? ------------------1234567890ABCDEF (1 row) Only this could be used to create other types too, for cryptographic functions for example. PostgreSQL doesn't have any type generators yet, so I'm unsure whether a patch creating one would be accepted for core. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Fri, Jun 30, 2006 at 10:38:49AM +0200, Martijn van Oosterhout wrote: > On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote: > > > > It seems to me that maybe the backend should include a 16-byte fixed > > > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and > > > > then people can use that to build whatever they like, using domains, > > > > for example... > > > So how about the split? I.e. just add a 16 byte data type and forget all > > > about UUID's for now. > > Martijn: Were you thinking that it would look like a really big integer, > > displayed by default as a decimal string in the client? > > This makes sense to me. > Either that, or a hex string. My problem with displaying as integer is > that not many clients will be able to parse (or print) a 16-byte > integer (the C library doesn't do it), but anyone can write a > hex-to-binary converter, or convince scanf/printf to do it for them. No real preference here. I'd be happy to have a native 16-byte type. > > If it was a full data type - could it be passed around in memory by > > value, and not as a pointer? Or is 16 bytes too big to pass around by > > value? > You can't pass it by value (doesn't fit in a register on the CPU and > there is no corrosponding C type), and I'm not sure you'd want to. A > pointer is much easier and faster to pass around. It depends how it is used. If the memory location needs to be allocated, for the value to be used only a few times, the overhead of allocation and redirection can be more expensive. If many though, than the reduction in value copying can make the pointer faster. 64-bytes, and 128-bytes are just on the line of not being clearly one or the other. It was just a thought though. The PostgreSQL API seemed pretty fixed the last time I looked at this stuff. > The other thing I was thinking of is a type generator, like so: > # select make_raw_hex_type(16,'uuid'); > NOTICE: Created raw hex type 'uuid' of fixed length 16 > make_raw_hex_type > ------------------- > (0 rows) > # select '1234FF'::uuid; > ERROR: Bad length for type 'uuid' > # select 'hello world'::uuid; > ERROR: Invalid characters for type 'uuid' > # select '1234567890abcdef'::uuid; > ?column? > ------------------ > 1234567890ABCDEF > (1 row) > Only this could be used to create other types too, for cryptographic > functions for example. PostgreSQL doesn't have any type generators yet, > so I'm unsure whether a patch creating one would be accepted for core. Not sure what I think of this. I suppose the intention would be for it to work for lengths other than 16? I can see people wanting to use such a generalized function for char as well as bytea, for at least latin1 characters... If people agree to a generic 16-byte type, or a hex type with defined fixed length with a set of standard functions and index operators that it should work for, but nobody more qualified wants to make the patch - I'll step up. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, Jun 30, 2006 at 12:39:52PM -0400, mark@mark.mielke.cc wrote: > > Only this could be used to create other types too, for cryptographic > > functions for example. PostgreSQL doesn't have any type generators yet, > > so I'm unsure whether a patch creating one would be accepted for core. > > Not sure what I think of this. I suppose the intention would be for it > to work for lengths other than 16? I can see people wanting to use such > a generalized function for char as well as bytea, for at least latin1 > characters... > > If people agree to a generic 16-byte type, or a hex type with defined > fixed length with a set of standard functions and index operators that > it should work for, but nobody more qualified wants to make the patch > - I'll step up. I think it'd be extremely useful to have a means of defining fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually only see people trying to do the same thing for CHAR in poorly-designed systems, but I suspect anyone dealing with legacy stuff might welcome that ability as well. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
mark@mark.mielke.cc writes: > It depends how it is used. If the memory location needs to be > allocated, for the value to be used only a few times, the overhead of > allocation and redirection can be more expensive. If many though, than > the reduction in value copying can make the pointer faster. 64-bytes, > and 128-bytes are just on the line of not being clearly one or the > other. It was just a thought though. The PostgreSQL API seemed pretty > fixed the last time I looked at this stuff. Yeah, changing the definition of Datum is probably out of the question ;-) as it'd break not only most of core but every external module in existence. Anything bigger than a "long" has to be pass-by-reference. The efficiency argument is wrong anyway, as we've optimized the heck out of those code paths; very little actual copying happens unless a new value is getting generated. Pushing pointers around is definitely faster than pushing multiword values around. regards, tom lane
On Fri, Jun 30, 2006 at 12:45:13PM -0500, Jim C. Nasby wrote: > > If people agree to a generic 16-byte type, or a hex type with defined > > fixed length with a set of standard functions and index operators that > > it should work for, but nobody more qualified wants to make the patch > > - I'll step up. > > I think it'd be extremely useful to have a means of defining > fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually > only see people trying to do the same thing for CHAR in poorly-designed > systems, but I suspect anyone dealing with legacy stuff might welcome > that ability as well. It would also be possible to provide two functions called hex_raw_in() and hex_raw_out() that people could use like so: CREATE TYPE uuid ( input = hex_raw_in, output = hex_raw_out, INTERNALLENGTH = 16 ); Where these input/output functions would work for any given length, so the 16 could be replaced by any number, or even -1 to make a variable length type... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > It would also be possible to provide two functions called hex_raw_in() > and hex_raw_out() that people could use like so: > CREATE TYPE uuid ( > input = hex_raw_in, > output = hex_raw_out, > INTERNALLENGTH = 16 > ); > Where these input/output functions would work for any given length, so > the 16 could be replaced by any number, or even -1 to make a variable > length type... I believe you could make an input function that would support that, though it would have to do a catalog lookup to find out the desired type length. The output function, however, would be quite a trick. It's not going to receive anything except the Datum itself. regards, tom lane
On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote: > > Where these input/output functions would work for any given length, so > > the 16 could be replaced by any number, or even -1 to make a variable > > length type... > > I believe you could make an input function that would support that, > though it would have to do a catalog lookup to find out the desired > type length. The output function, however, would be quite a trick. > It's not going to receive anything except the Datum itself. Hmm, you're right. With the taggedtypes module I made it work by cloning the output function with a new OID each time and setting the arg type so that procLookupArgType() would work. Similarly, the input function would use procLookupRettype() to find the desired type. So the procedure would be slightly more complicated: CREATE FUNCTION uuid_in(cstring) RETURNS uuid AS 'hex_raw_in' LANGUAGE internal; CREATE FUNCTION uuid_out(uuid) RETURNS cstring AS 'hex_raw_out' LANGUAGE internal; CREATE TYPE uuid ( input = uuid_in, output = uuid_out, internallength = 16 ); The cat lookups are irritating, but that's what syscache is for, I guess :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Tom Lane wrote: >The output function, however, would be quite a trick. >It's not going to receive anything except the Datum itself. > > > > I understand the reasons for this limitation of output functions, but I have now seen it bite several times. Maybe we need a little out of the box thinking on this. I have spent a while taxing my meagre brain on it over the last few months, without much success ;-( cheers andrew
On Sat, Jul 01, 2006 at 10:58:05AM -0400, Andrew Dunstan wrote: > Tom Lane wrote: > > >The output function, however, would be quite a trick. > >It's not going to receive anything except the Datum itself. > > I understand the reasons for this limitation of output functions, but I > have now seen it bite several times. Maybe we need a little out of the > box thinking on this. I have spent a while taxing my meagre brain on it > over the last few months, without much success ;-( The thing is, in a lot of other contexts it can work easily because fcinfo->flinfo->fn_expr points the expression node for this function call, which means you can extract the relevent data out of that. This field is simply not filled in for type input/output functions. Something that has been discussed in the past is allowing non-strict type input/output functions to be evaluated at query execution time, rather than during parse time. This would give the type input/output functions the Expr node they need to extract this info. I have no idea how easy/hard this would be. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote: >> The output function, however, would be quite a trick. >> It's not going to receive anything except the Datum itself. > Hmm, you're right. With the taggedtypes module I made it work by > cloning the output function with a new OID each time and setting the > arg type so that procLookupArgType() would work. Similarly, the input > function would use procLookupRettype() to find the desired type. Oh, I see, you relied on flinfo->fn_oid and then did two cat lookups. That would work as long as nothing tried to call the function with DirectFunctionCall ... which is a pretty safe assumption I guess. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > Something that has been discussed in the past is allowing non-strict > type input/output functions to be evaluated at query execution time, > rather than during parse time. This would give the type input/output > functions the Expr node they need to extract this info. We could make that happen for literals used in queries (see comment in coerce_type()), but it's not appealing to expect all of the ad-hoc I/O function calls in the whole system to supply dummy expression trees. That would be adding overhead to all cases that's only useful in a few. I think requiring the functions that need this info to do extra work is probably the right answer. (It's already possible to cache whatever lookups you have to do, cf array_in or record_in, so the overhead isn't *that* daunting.) regards, tom lane