Thread: Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

From

Thomas Hallgren

Date:

28 June 2006, 13:39:24

Martijn van Oosterhout wrote:
> On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
>> A user that is trusted with installing a C-function in the backend is 
>> free to scan the process memory anyway so in what way did that increase 
>> the security? IMHO, the only relevant security in that context is to 
>> have trusted people install trusted modules. I'm surprised that 
>> something like that made you remove significant functionality.
> 
> You're missing the point. The type output function is not generally a
> priveledged function. Think bpcharout, text_out, numeric_out, etc...
> These can be called by users directly and the input to those functions
> cannot be trusted.
> 
Ah, OK that makes sense. An alternative solution when the signature was changed could 
perhaps have been to pass one single argument, a structure appointing the data and its 
associated type. My idea would work if the data and its type lived together always from the 
moment its instantiated (read from disk or otherwise) and until death do them apart (or the 
data is stored on disk, in which case the tupledesc knows what it is). I guess that would 
imply a major rewrite and that my desire to have a RAW fixed length type isn't enough 
motivation to do that :-)

Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128 
bit datatype) in the core package. It's increasingly common and some databases (MS 
SQLServer) already have built in support for it.

Regards,
Thomas Hallgren

Re: Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

From

"Jim C. Nasby"

Date:

28 June 2006, 14:11:09

On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> Instead, I would like to humbly request the inclusion of a UUID datatype 
> (or an opaque 128 bit datatype) in the core package. It's increasingly 

ISTM that we get enough requests for this that it's probably worth
doing.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

From

mark@mark.mielke.cc

Date:

28 June 2006, 14:15:10

On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> Instead, I would like to humbly request the inclusion of a UUID datatype 
> (or an opaque 128 bit datatype) in the core package. It's increasingly 
> common and some databases (MS SQLServer) already have built in support for 
> it.

We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the
type has value, that is is generic, and would be widely used without
being abused. All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

"A.M."

Date:

28 June 2006, 14:18:53

On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
> On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
>
>> Instead, I would like to humbly request the inclusion of a UUID
>> datatype (or an opaque 128 bit datatype) in the core package. It's
>> increasingly common and some databases (MS SQLServer) already have built
>> in support for it.
>
> We have it. We're just not putting in the effort required to have it
> included in core, as it's too much effort to convince people that the type
> has value, that is is generic, and would be widely used without being
> abused. All the geometric types that I'll never use in core, with few or
> no uses, including functions to operate on these types, and no UUID
> type... Hehe... To me, that's irony... :-)

Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

"Jim C. Nasby"

Date:

28 June 2006, 14:38:59

On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
> On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
> > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> >
> >> Instead, I would like to humbly request the inclusion of a UUID
> >> datatype (or an opaque 128 bit datatype) in the core package. It's
> >> increasingly common and some databases (MS SQLServer) already have built
> >> in support for it.
> >
> > We have it. We're just not putting in the effort required to have it
> > included in core, as it's too much effort to convince people that the type
> > has value, that is is generic, and would be widely used without being
> > abused. All the geometric types that I'll never use in core, with few or
> > no uses, including functions to operate on these types, and no UUID
> > type... Hehe... To me, that's irony... :-)
> 
> Is it on pgfoundry? From past discussions, the new criteria for getting
> something into core is to first determine if it is successful on
> pgfoundry.

If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...

Looking back through the list archives I think you'd find this comes up
at least every few months.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

mark@mark.mielke.cc

Date:

28 June 2006, 14:48:20

On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
> > On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
> > > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> > >> Instead, I would like to humbly request the inclusion of a UUID
> > >> datatype (or an opaque 128 bit datatype) in the core package. It's
> > >> increasingly common and some databases (MS SQLServer) already have built
> > >> in support for it.
> > > We have it. We're just not putting in the effort required to have it
> > > included in core, as it's too much effort to convince people that the type
> > > has value, that is is generic, and would be widely used without being
> > > abused. All the geometric types that I'll never use in core, with few or
> > > no uses, including functions to operate on these types, and no UUID
> > > type... Hehe... To me, that's irony... :-)
> > Is it on pgfoundry? From past discussions, the new criteria for getting
> > something into core is to first determine if it is successful on
> > pgfoundry.
> If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
> downloaded it. But I find that exceptionally hard to believe...
> 
> Looking back through the list archives I think you'd find this comes up
> at least every few months.

I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.

Two of us worked on a re-write based off a different UUID system library,
and I've been happily using it in production for a year or so. I don't
believe either of us have bothered to market it. Each time it comes up,
a number of people on this list shut it down, and it doesn't seem worth
the effort to convince them otherwise. They can have their ivory tower,
and I can have my plugin.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Andrew Dunstan

Date:

28 June 2006, 14:50:10

Jim C. Nasby wrote:

>On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
>  
>
>>On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
>>    
>>
>>>On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
>>>
>>>      
>>>
>>>>Instead, I would like to humbly request the inclusion of a UUID
>>>>datatype (or an opaque 128 bit datatype) in the core package. It's
>>>>increasingly common and some databases (MS SQLServer) already have built
>>>>in support for it.
>>>>        
>>>>
>>>We have it. We're just not putting in the effort required to have it
>>>included in core, as it's too much effort to convince people that the type
>>>has value, that is is generic, and would be widely used without being
>>>abused. All the geometric types that I'll never use in core, with few or
>>>no uses, including functions to operate on these types, and no UUID
>>>type... Hehe... To me, that's irony... :-)
>>>      
>>>
>>Is it on pgfoundry? From past discussions, the new criteria for getting
>>something into core is to first determine if it is successful on
>>pgfoundry.
>>    
>>
>
>If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
>downloaded it. But I find that exceptionally hard to believe...
>
>Looking back through the list archives I think you'd find this comes up
>at least every few months.
>  
>

That's because there is nothing there to download. See instead: 
http://gborg.postgresql.org/project/pguuid/projdisplay.php

Personally I don't buy the misuse objection - we already have plenty of 
things that can be misused. As long as there is a reasonable valid use 
and we can make it portable enough, I think there is a good case for 
including it.

cheers

andrew

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Alvaro Herrera

Date:

28 June 2006, 14:53:22

mark@mark.mielke.cc wrote:

> I've downloaded the version off pgfoundry.org. It is broken. It leaks
> memory, and if memory is correct it can cause the client to core dump.

Also it couldn't possibly be included in core, since it's based on a
GPL'ed UUID library.  If you have a more appropiately licensed package,
it could be considered for inclusion.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

"Jim C. Nasby"

Date:

28 June 2006, 15:30:48

On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
> Personally I don't buy the misuse objection - we already have plenty of 
> things that can be misused. As long as there is a reasonable valid use 
> and we can make it portable enough, I think there is a good case for 
> including it.

Well, since Mark has one, how about we consider adding it in?

If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

mark@mark.mielke.cc

Date:

29 June 2006, 03:03:18

On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
> > Personally I don't buy the misuse objection - we already have plenty of 
> > things that can be misused. As long as there is a reasonable valid use 
> > and we can make it portable enough, I think there is a good case for 
> > including it.
> Well, since Mark has one, how about we consider adding it in?
> If nothing else, can you please put your stuff on pgFoundry so others
> can find it, Mark?

It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:
"Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted,
providedthat the above copyright notice and this permission notice appear in all copies."

I haven't tested to see how portable the OSSP UUID implementation is.
This is their words:
"OSSP uuid was already written with maximum portability in mind, so there should be no great effort required to get it
runningon any Unix platform with a reasonable POSIX API. Additionally, the portability was tested by successfully
buildingand running it on the following particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):

     alpha-tru644.0 (cc)     alpha-tru645.1 (gcc, cc)     hppa-hpux11.11 (cc)     ia64-hpux11.23 (cc)
ix86-debian2.2(gcc, icc)     ix86-debian3.0 (gcc)     ix86-debian3.1 (gcc)     ix86-freebsd4.9 (gcc)
ix86-freebsd5.2(gcc, icc)     ix86-netbsd1.6 (gcc)     ix86-qnx6.2 (gcc)     ix86-solaris10 (gcc)
ix86-unixware7.1.3(cc)     mips64-irix6.5 (gcc)     sparc64-solaris8 (gcc, forte)     sparc64-solaris9 (gcc)"

I've put it through a fair amount of testing, including using it
within compound indexes, expecting the index to be used for at
least '=', constructing many UUIDs quickly, in a sequence, and
converting it to and from string form. We chose to implement our
own encode / decode routines for performance reasons. With the
exception of testing it on a wider range of platforms, I would
call the module stable.

If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

mark@mark.mielke.cc

Date:

29 June 2006, 03:09:29

On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote:
> It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
> I'm not an expert on the license, but it seems acceptable to me:
> ...
> If there is interest - I'm sure Nathan and I would be willing to put
> it on pgfoundry, and at some point give it up for inclusion into
> PostgreSQL.

This might require a little bit of research. It appears that the
development version of OSSP UUID may provide its own PostgreSQL
'bindings'. I may try and contact the author of the OSSP UUID and
see whether any changes we have that he does not, can be rolled
into his version...

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Fixed length datatypes.

From

"J. Andrew Rogers"

Date:

29 June 2006, 03:21:21

On Jun 28, 2006, at 10:14 AM, mark@mark.mielke.cc wrote:
> All the geometric types that I'll never use in core,
> with few or no uses, including functions to operate on these types,
> and no UUID type... Hehe... To me, that's irony... :-)

Interestingly, the superior geometry capability is driving a lot of  
recent migration from MySQL to PostgreSQL in my own experience,  
especially with PostGIS.  The geometry parts may not get as much love  
as other parts, but they still get to leverage the very solid  
foundation they are built on top of.  The geometry capability of  
MySQL is basically checklist in nature, as it lacks the more  
sophisticated indexing and query execution that is really required to  
get passable performance from queries with geometry in them.  MySQL  
has similar geometry capability to PostgreSQL in theory if you don't  
look too closely, but in practice the engine is not up to the more  
rigorous demands of that kind of work.

With the nascent rise of the geospatial web, it is going to become a  
lot more important than it has been.

J. Andrew Rogers
jrogers@neopolitan.com

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Thomas Hallgren

Date:

29 June 2006, 04:02:54

mark@mark.mielke.cc wrote:
> On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
>> On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
>>> Personally I don't buy the misuse objection - we already have plenty of 
>>> things that can be misused. As long as there is a reasonable valid use 
>>> and we can make it portable enough, I think there is a good case for 
>>> including it.
>> Well, since Mark has one, how about we consider adding it in?
>> If nothing else, can you please put your stuff on pgFoundry so others
>> can find it, Mark?
> 
> It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
> I'm not an expert on the license, but it seems acceptable to me:
> 
>  "Permission to use, copy, modify, and distribute this software for
>   any purpose with or without fee is hereby granted, provided that
>   the above copyright notice and this permission notice appear in all
>   copies."
> 
> I haven't tested to see how portable the OSSP UUID implementation is.
> This is their words:
> 
>  "OSSP uuid was already written with maximum portability in mind, so
>   there should be no great effort required to get it running on any Unix
>   platform with a reasonable POSIX API. Additionally, the portability
>   was tested by successfully building and running it on the following
>   particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):
> 
>       alpha-tru644.0 (cc)
>       alpha-tru645.1 (gcc, cc)
>       hppa-hpux11.11 (cc)
>       ia64-hpux11.23 (cc)
>       ix86-debian2.2 (gcc, icc)
>       ix86-debian3.0 (gcc)
>       ix86-debian3.1 (gcc)
>       ix86-freebsd4.9 (gcc)
>       ix86-freebsd5.2 (gcc, icc)
>       ix86-netbsd1.6 (gcc)
>       ix86-qnx6.2 (gcc)
>       ix86-solaris10 (gcc)
>       ix86-unixware7.1.3 (cc)
>       mips64-irix6.5 (gcc)
>       sparc64-solaris8 (gcc, forte)
>       sparc64-solaris9 (gcc)"
> 
> I've put it through a fair amount of testing, including using it
> within compound indexes, expecting the index to be used for at
> least '=', constructing many UUIDs quickly, in a sequence, and
> converting it to and from string form. We chose to implement our
> own encode / decode routines for performance reasons. With the
> exception of testing it on a wider range of platforms, I would
> call the module stable.
> 
> If there is interest - I'm sure Nathan and I would be willing to put
> it on pgfoundry, and at some point give it up for inclusion into
> PostgreSQL.
> 
One requirement would be that it runs on Windows. Is that something you have tested?

Regards,
Thomas Hallgren

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Thomas Hallgren

Date:

29 June 2006, 04:12:54

mark@mark.mielke.cc wrote:
> On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote:
>> It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
>> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
>> I'm not an expert on the license, but it seems acceptable to me:
>> ...
>> If there is interest - I'm sure Nathan and I would be willing to put
>> it on pgfoundry, and at some point give it up for inclusion into
>> PostgreSQL.
> 
> This might require a little bit of research. It appears that the
> development version of OSSP UUID may provide its own PostgreSQL
> 'bindings'. I may try and contact the author of the OSSP UUID and
> see whether any changes we have that he does not, can be rolled
> into his version...
> 
> Cheers,
> mark
> 
I'm thinking ahead on possible objections to inclusion in core. One objection might be that 
a fully blown UUID implementation is a lot of code. Code that needs to be maintained and it 
increases the size of the binary etc. A solution to that might be to break the whole thing 
up in two:

1 The actual type
A plain scalar type that stores 16 bytes. It's complete with standard operators for 
comparison (natural order) and the text representation would be a 32 character hexadecimal 
string. This type should make no interpretation whatsoever on what it stores and its only 
association with UUID's is the storage size.

2 UUID utilities
Various ways of representing, generating, and extract partial information from UUID's. 
Should have support for variants #0, #1, and #2 (the OSSP based code sounds like a good 
candidate).

The split make sense since clients often have powerful UUID utilities handy and hence have 
limited or no use for such utilities in the database (true for all .NET and Java clients). 
Some PL's will also enable such packages out of the box.

The actual type would be extremely generic, lightweight, and easy to implement. No 
portability issues whatsoever. The only difficulty that I see is naming it :-).

Regards,
Thomas Hallgren

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Christopher Kings-Lynne

Date:

29 June 2006, 04:13:38

>> If there is interest - I'm sure Nathan and I would be willing to put
>> it on pgfoundry, and at some point give it up for inclusion into
>> PostgreSQL.
>>
> One requirement would be that it runs on Windows. Is that something you 
> have tested?

In case it influences anyone, MySQL 5 already has built-in UUID support:

http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#id2899901

Chris

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

"Jim C. Nasby"

Date:

29 June 2006, 13:33:11

On Thu, Jun 29, 2006 at 09:12:32AM +0200, Thomas Hallgren wrote:
> The split make sense since clients often have powerful UUID utilities handy 
> and hence have limited or no use for such utilities in the database (true 
> for all .NET and Java clients). Some PL's will also enable such packages 
> out of the box.

I agree about splitting the utilities, except that I think the database
should be able to generate UUIDs somehow.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Josh Berkus

Date:

30 June 2006, 03:30:20

Jim,

> I agree about splitting the utilities, except that I think the database
> should be able to generate UUIDs somehow.

There is a GUID add-in, and someone is working on a 2nd one.  UUIDs are 
not part of the SQL standard, and we've only seen sporadic demand for 
them (and different types each time) so I can't imagine one making it 
further than contrib real soon.

Also, one could argue that UUIDs are a foot gun, so they're not exactly 
the type of thing we want to advocate in advance of demand.

--Josh Berkus

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Thomas Hallgren

Date:

30 June 2006, 03:53:49

Josh Berkus wrote:> Jim,>>> I agree about splitting the utilities, except that I think the database>> should be able to
generateUUIDs somehow.>> There is a GUID add-in, and someone is working on a 2nd one.  UUIDs 
 
are not part of the SQL standard, and we've only seen sporadic demand 
for them (and different types each time) so I can't imagine one making 
it further than contrib real soon.>> Also, one could argue that UUIDs are a foot gun, so they're not 
exactly the type of thing we want to advocate in advance of demand.>

Martijn van Oosterhout wrote:> It seems to me that maybe the backend should include a 16-byte fixed> length object
(afterall, we've got 1, 2, 4 and 8 bytes already) and> then people can use that to build whatever they like, using
domains,>for example...>
 
So how about the split? I.e. just add a 16 byte data type and forget all 
about UUID's for now.

Regards,
Thomas Hallgren

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

mark@mark.mielke.cc

Date:

30 June 2006, 05:05:07

On Fri, Jun 30, 2006 at 08:53:28AM +0200, Thomas Hallgren wrote:
> Josh Berkus wrote:
> >> I agree about splitting the utilities, except that I think the database
> >> should be able to generate UUIDs somehow.
> > There is a GUID add-in, and someone is working on a 2nd one.  UUIDs 
> > are not part of the SQL standard, and we've only seen sporadic demand 
> > for them (and different types each time) so I can't imagine one making 
> > it further than contrib real soon.
> >
> > Also, one could argue that UUIDs are a foot gun, so they're not 
> > exactly the type of thing we want to advocate in advance of demand.

Josh: Although PostgreSQL is easy to extend - it is still beyond many
people to put in the time required to learn how. The demand exists.
It's the supply that doesn't. People work around the problem.
Perhaps they are using one of the sequence number 'tricks' such as
having each site be allocated a range, or modulus. I was willing to
learn how to implement a UUID type, and not willing to use one of
these sequence number hacks.

If you want to call UUID a foot gun - then please call sequence
numbers hacks a foot gun as well, to be fair, and then we can start
to talk about how a theoretically perfect system should work.

If UUID existed in core, people would use it. It would be used, and it
would be abused - like most other PostgreSQL features. There would be
nothing unique about this. And what's the real danger anyways? People
who should be using 4 byte or 8 byte sequences, find that they lose
a little performance, and that their databases are larger than they
expected? UUID is designed not to collide. So what is the real danger
you are speaking about? How bad can they shoot themselves?

> Martijn van Oosterhout wrote:
> > It seems to me that maybe the backend should include a 16-byte fixed
> > length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> > then people can use that to build whatever they like, using domains,
> > for example...
> So how about the split? I.e. just add a 16 byte data type and forget all 
> about UUID's for now.

Martijn: Were you thinking that it would look like a really big integer,
displayed by default as a decimal string in the client?

This makes sense to me.

If it was a full data type - could it be passed around in memory by
value, and not as a pointer? Or is 16 bytes too big to pass around by
value?

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Martijn van Oosterhout

Date:

30 June 2006, 05:39:47

On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote:
> > Martijn van Oosterhout wrote:
> > > It seems to me that maybe the backend should include a 16-byte fixed
> > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> > > then people can use that to build whatever they like, using domains,
> > > for example...
> > So how about the split? I.e. just add a 16 byte data type and forget all
> > about UUID's for now.
>
> Martijn: Were you thinking that it would look like a really big integer,
> displayed by default as a decimal string in the client?
>
> This makes sense to me.

Either that, or a hex string. My problem with displaying as integer is
that not many clients will be able to parse (or print) a 16-byte
integer (the C library doesn't do it), but anyone can write a
hex-to-binary converter, or convince scanf/printf to do it for them.

> If it was a full data type - could it be passed around in memory by
> value, and not as a pointer? Or is 16 bytes too big to pass around by
> value?

You can't pass it by value (doesn't fit in a register on the CPU and
there is no corrosponding C type), and I'm not sure you'd want to. A
pointer is much easier and faster to pass around.

The other thing I was thinking of is a type generator, like so:

# select make_raw_hex_type(16,'uuid');
NOTICE: Created raw hex type 'uuid' of fixed length 16make_raw_hex_type
-------------------
(0 rows)
# select '1234FF'::uuid;
ERROR: Bad length for type 'uuid'
# select 'hello world'::uuid;
ERROR: Invalid characters for type 'uuid'
# select '1234567890abcdef'::uuid;    ?column?
------------------1234567890ABCDEF
(1 row)

Only this could be used to create other types too, for cryptographic
functions for example. PostgreSQL doesn't have any type generators yet,
so I'm unsure whether a patch creating one would be accepted for core.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

mark@mark.mielke.cc

Date:

30 June 2006, 13:40:44

On Fri, Jun 30, 2006 at 10:38:49AM +0200, Martijn van Oosterhout wrote:
> On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote:
> > > > It seems to me that maybe the backend should include a 16-byte fixed
> > > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> > > > then people can use that to build whatever they like, using domains,
> > > > for example...
> > > So how about the split? I.e. just add a 16 byte data type and forget all 
> > > about UUID's for now.
> > Martijn: Were you thinking that it would look like a really big integer,
> > displayed by default as a decimal string in the client?
> > This makes sense to me.
> Either that, or a hex string. My problem with displaying as integer is
> that not many clients will be able to parse (or print) a 16-byte
> integer (the C library doesn't do it), but anyone can write a
> hex-to-binary converter, or convince scanf/printf to do it for them.

No real preference here. I'd be happy to have a native 16-byte type.

> > If it was a full data type - could it be passed around in memory by
> > value, and not as a pointer? Or is 16 bytes too big to pass around by
> > value?
> You can't pass it by value (doesn't fit in a register on the CPU and
> there is no corrosponding C type), and I'm not sure you'd want to. A
> pointer is much easier and faster to pass around.

It depends how it is used. If the memory location needs to be
allocated, for the value to be used only a few times, the overhead of
allocation and redirection can be more expensive. If many though, than
the reduction in value copying can make the pointer faster. 64-bytes,
and 128-bytes are just on the line of not being clearly one or the
other. It was just a thought though. The PostgreSQL API seemed pretty
fixed the last time I looked at this stuff.

> The other thing I was thinking of is a type generator, like so:
> # select make_raw_hex_type(16,'uuid');
> NOTICE: Created raw hex type 'uuid' of fixed length 16
>  make_raw_hex_type
> -------------------
> (0 rows)
> # select '1234FF'::uuid;
> ERROR: Bad length for type 'uuid'
> # select 'hello world'::uuid;
> ERROR: Invalid characters for type 'uuid'
> # select '1234567890abcdef'::uuid;
>      ?column?
> ------------------
>  1234567890ABCDEF
> (1 row)

> Only this could be used to create other types too, for cryptographic
> functions for example. PostgreSQL doesn't have any type generators yet,
> so I'm unsure whether a patch creating one would be accepted for core.

Not sure what I think of this. I suppose the intention would be for it
to work for lengths other than 16? I can see people wanting to use such
a generalized function for char as well as bytea, for at least latin1
characters...

If people agree to a generic 16-byte type, or a hex type with defined
fixed length with a set of standard functions and index operators that
it should work for, but nobody more qualified wants to make the patch
- I'll step up.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

"Jim C. Nasby"

Date:

30 June 2006, 14:45:23

On Fri, Jun 30, 2006 at 12:39:52PM -0400, mark@mark.mielke.cc wrote:
> > Only this could be used to create other types too, for cryptographic
> > functions for example. PostgreSQL doesn't have any type generators yet,
> > so I'm unsure whether a patch creating one would be accepted for core.
> 
> Not sure what I think of this. I suppose the intention would be for it
> to work for lengths other than 16? I can see people wanting to use such
> a generalized function for char as well as bytea, for at least latin1
> characters...
> 
> If people agree to a generic 16-byte type, or a hex type with defined
> fixed length with a set of standard functions and index operators that
> it should work for, but nobody more qualified wants to make the patch
> - I'll step up.

I think it'd be extremely useful to have a means of defining
fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually
only see people trying to do the same thing for CHAR in poorly-designed
systems, but I suspect anyone dealing with legacy stuff might welcome
that ability as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Tom Lane

Date:

30 June 2006, 14:59:42

mark@mark.mielke.cc writes:
> It depends how it is used. If the memory location needs to be
> allocated, for the value to be used only a few times, the overhead of
> allocation and redirection can be more expensive. If many though, than
> the reduction in value copying can make the pointer faster. 64-bytes,
> and 128-bytes are just on the line of not being clearly one or the
> other. It was just a thought though. The PostgreSQL API seemed pretty
> fixed the last time I looked at this stuff.

Yeah, changing the definition of Datum is probably out of the question ;-)
as it'd break not only most of core but every external module in existence.
Anything bigger than a "long" has to be pass-by-reference.  The
efficiency argument is wrong anyway, as we've optimized the heck out of
those code paths; very little actual copying happens unless a new value
is getting generated.  Pushing pointers around is definitely faster than
pushing multiword values around.
        regards, tom lane

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Martijn van Oosterhout

Date:

01 July 2006, 11:27:27

On Fri, Jun 30, 2006 at 12:45:13PM -0500, Jim C. Nasby wrote:
> > If people agree to a generic 16-byte type, or a hex type with defined
> > fixed length with a set of standard functions and index operators that
> > it should work for, but nobody more qualified wants to make the patch
> > - I'll step up.
>
> I think it'd be extremely useful to have a means of defining
> fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually
> only see people trying to do the same thing for CHAR in poorly-designed
> systems, but I suspect anyone dealing with legacy stuff might welcome
> that ability as well.

It would also be possible to provide two functions called hex_raw_in()
and hex_raw_out() that people could use like so:

CREATE TYPE uuid ( input = hex_raw_in, output = hex_raw_out, INTERNALLENGTH = 16
);

Where these input/output functions would work for any given length, so
the 16 could be replaced by any number, or even -1 to make a variable
length type...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Tom Lane

Date:

01 July 2006, 11:38:15

Martijn van Oosterhout <kleptog@svana.org> writes:
> It would also be possible to provide two functions called hex_raw_in()
> and hex_raw_out() that people could use like so:

> CREATE TYPE uuid (
>   input = hex_raw_in,
>   output = hex_raw_out,
>   INTERNALLENGTH = 16
> );

> Where these input/output functions would work for any given length, so
> the 16 could be replaced by any number, or even -1 to make a variable
> length type...

I believe you could make an input function that would support that,
though it would have to do a catalog lookup to find out the desired
type length.  The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.
        regards, tom lane

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Martijn van Oosterhout

Date:

01 July 2006, 11:48:27

On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
> > Where these input/output functions would work for any given length, so
> > the 16 could be replaced by any number, or even -1 to make a variable
> > length type...
>
> I believe you could make an input function that would support that,
> though it would have to do a catalog lookup to find out the desired
> type length.  The output function, however, would be quite a trick.
> It's not going to receive anything except the Datum itself.

Hmm, you're right. With the taggedtypes module I made it work by
cloning the output function with a new OID each time and setting the
arg type so that procLookupArgType() would work. Similarly, the input
function would use procLookupRettype() to find the desired type.

So the procedure would be slightly more complicated:

CREATE FUNCTION uuid_in(cstring) RETURNS uuid AS 'hex_raw_in' LANGUAGE internal;
CREATE FUNCTION uuid_out(uuid) RETURNS cstring AS 'hex_raw_out' LANGUAGE internal;
CREATE TYPE uuid ( input = uuid_in, output = uuid_out, internallength = 16
);

The cat lookups are irritating, but that's what syscache is for, I
guess :)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Andrew Dunstan

Date:

01 July 2006, 11:59:18

Tom Lane wrote:

>The output function, however, would be quite a trick.
>It's not going to receive anything except the Datum itself.
>
>    
>  
>

I understand the reasons for this limitation of output functions, but I 
have now seen it bite several times. Maybe we need a little out of the 
box thinking on this. I have spent a while taxing my meagre brain on it 
over the last few months, without much success ;-(

cheers

andrew

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Martijn van Oosterhout

Date:

01 July 2006, 12:07:11

On Sat, Jul 01, 2006 at 10:58:05AM -0400, Andrew Dunstan wrote:
> Tom Lane wrote:
>
> >The output function, however, would be quite a trick.
> >It's not going to receive anything except the Datum itself.
>
> I understand the reasons for this limitation of output functions, but I
> have now seen it bite several times. Maybe we need a little out of the
> box thinking on this. I have spent a while taxing my meagre brain on it
> over the last few months, without much success ;-(

The thing is, in a lot of other contexts it can work easily because
fcinfo->flinfo->fn_expr points the expression node for this function
call, which means you can extract the relevent data out of that. This
field is simply not filled in for type input/output functions.

Something that has been discussed in the past is allowing non-strict
type input/output functions to be evaluated at query execution time,
rather than during parse time. This would give the type input/output
functions the Expr node they need to extract this info.

I have no idea how easy/hard this would be.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Tom Lane

Date:

01 July 2006, 12:15:43

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
>> The output function, however, would be quite a trick.
>> It's not going to receive anything except the Datum itself.

> Hmm, you're right. With the taggedtypes module I made it work by
> cloning the output function with a new OID each time and setting the
> arg type so that procLookupArgType() would work. Similarly, the input
> function would use procLookupRettype() to find the desired type.

Oh, I see, you relied on flinfo->fn_oid and then did two cat lookups.
That would work as long as nothing tried to call the function with
DirectFunctionCall ... which is a pretty safe assumption I guess.
        regards, tom lane

Re: Fixed length datatypes. WAS [GENERAL] UUID's as

From

Tom Lane

Date:

01 July 2006, 12:24:46

Martijn van Oosterhout <kleptog@svana.org> writes:
> Something that has been discussed in the past is allowing non-strict
> type input/output functions to be evaluated at query execution time,
> rather than during parse time. This would give the type input/output
> functions the Expr node they need to extract this info.

We could make that happen for literals used in queries (see comment in
coerce_type()), but it's not appealing to expect all of the ad-hoc I/O
function calls in the whole system to supply dummy expression trees.
That would be adding overhead to all cases that's only useful in a few.
I think requiring the functions that need this info to do extra work
is probably the right answer.  (It's already possible to cache whatever
lookups you have to do, cf array_in or record_in, so the overhead isn't
*that* daunting.)
        regards, tom lane