Thread: uuid type for postgres

uuid type for postgres

From
nathan wagner
Date:
I have been in need of a uuid type and ran across the pguuid download
by Xiongjian (Mike) Wang.  This wasn't really useful to me for two
reasons: first, it is GPLed and I would prefer a more liberal license,
secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
ioctl() under darwin.

While I could dike out the code that calls it, that seems like a suboptimal
solution.  So after a bit of poking around the interweb i ran across
Ralf Engelschall's ossp uuid library.  This compiled with minimal
effort on mac os.  Some reading, and an evening later, i've made
a server plugin with supporting SQL that implements an 'ossp_uuid'
type.

Now i have four questions:

1: Is it feasible for this to be included in the contrib section of
the regular download?  The uuid library is a "notice of copyright" style
license, and I am willing to put my own code into the public domain.

2: Would just calling the type 'uuid' be better than 'ossp_uuid'?  It's
certainly a nicer name.

3: Would it be possible to include such a type as a postgres extension
to the usual SQL types.  It seems to me that having an officially
supported type would be better than a user contributed type on the grounds
that you could then rely on it being avaiable if postgres was.
In particular, installing it as an extension would require the cooperation
of the DBA, which may be infeasible in some environments.

-- 
Nathan Wagner


Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
Hey Nathan.

I've started to make heavy use of pguuid. It had several bugs in it
that required fixing before I could use it. I have no preference on
pguuid. It was the only such PostgreSQL project I found that provided
a UUID type.

I'd be willing to work with you on ensuring that such a patch is in an
acceptable form for the core developers to accept, and help maintain
it.

My personal preference is that the type be called 'uuid' and accepted
into the core. Tom? Is their history on this issue?

Should it remain an extension, or can be get it built-in?

I find the UUID concept more useful than the SERIAL concept for data
that must now, or may in the future, be stored on multiple servers.
For example, submitting transactions to two different sites, where the
sites periodically synchronize up with each other, requiring a
relatively safe 'merge'. SERIAL can't do this at all. UUID can do
this as long as the rows are split into domains appropriately,
such as ensuring that unique column constraints do not need to
be enforced across sites.

UUID is also desirable over SERIAL for a unique identifier that will
be accessed from an outside source directly. Exposing the SERIAL
number to the outside isn't appealing for reasons including the
predictability of the sequence, the size of the database is exposed,
the internel implementation is exposed, and so on.

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a UUID,
it may not be useful to also have a SERIAL).

Ok, you can all tell me I'm wrong now... :-)

mark


On Tue, Sep 06, 2005 at 01:50:57PM +0000, nathan wagner wrote:
> 
> I have been in need of a uuid type and ran across the pguuid download
> by Xiongjian (Mike) Wang.  This wasn't really useful to me for two
> reasons: first, it is GPLed and I would prefer a more liberal license,
> secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
> SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
> ioctl() under darwin.
> 
> While I could dike out the code that calls it, that seems like a suboptimal
> solution.  So after a bit of poking around the interweb i ran across
> Ralf Engelschall's ossp uuid library.  This compiled with minimal
> effort on mac os.  Some reading, and an evening later, i've made
> a server plugin with supporting SQL that implements an 'ossp_uuid'
> type.
> 
> Now i have four questions:
> 
> 1: Is it feasible for this to be included in the contrib section of
> the regular download?  The uuid library is a "notice of copyright" style
> license, and I am willing to put my own code into the public domain.
> 
> 2: Would just calling the type 'uuid' be better than 'ossp_uuid'?  It's
> certainly a nicer name.
> 
> 3: Would it be possible to include such a type as a postgres extension
> to the usual SQL types.  It seems to me that having an officially
> supported type would be better than a user contributed type on the grounds
> that you could then rely on it being avaiable if postgres was.
> In particular, installing it as an extension would require the cooperation
> of the DBA, which may be infeasible in some environments.
> 
> -- 
> Nathan Wagner
> 

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> My personal preference is that the type be called 'uuid' and accepted
> into the core. Tom? Is their history on this issue?
> Should it remain an extension, or can be get it built-in?

There is pretty much zero chance of being accepted into contrib, much
less core, if the code isn't pure BSD license.  (We used to be a bit
lax about that, but are trying to clean things up.)  Nathan's comment
about starting with code that was sorta-BSD-with-advertising alarmed me.

However, you can certainly set up a pgfoundry project with code of any
license you like.  That would be a good starting point anyway --- at
some point you'd have to convince people that there's enough demand
for the feature to justify putting it in core, and being able to point
to X number of downloads from pgfoundry would go a long way towards
making that case.
        regards, tom lane


Re: uuid type for postgres

From
nathan wagner
Date:
On Tue, Sep 06, 2005 at 11:38:57AM -0400, tgl@sss.pgh.pa.us wrote:
> 
> There is pretty much zero chance of being accepted into contrib, much
> less core, if the code isn't pure BSD license.

Hmm.  Here is the copyright and license portion of the readme...
 COPYRIGHT AND LICENSE
 Copyright (c) 2004-2005 Ralf S. Engelschall <rse@engelschall.com> Copyright (c) 2004-2005 The OSSP Project
<http://www.ossp.org/>
 This file is part of OSSP uuid, a library for the generation of UUIDs which can found at
http://www.ossp.org/pkg/lib/uuid/
 Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted,
providedthat the above copyright notice and this permission notice appear in all copies.
 
 THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIEDWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS
ANDCOPYRIGHT HOLDERS AND THEIR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIALDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS;OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
ORTORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITYOF SUCH DAMAGE.
 

Reading the postgres license from the faq...

PostgreSQL Data Base Management System

Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group Portions
Copyright (c) 1994-1996 Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement is
hereby granted, provided that the above copyright notice and this paragraph and
the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST
PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF
THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

These two licenses appear to be functionally identical.  Both require
notice of the copyright to be included, and both allow use for any purpose.
The wording of the requirement is even nearly identical.

By way of disclaimer, I am not (yet) an attorney.  

> (We used to be a bit
> lax about that, but are trying to clean things up.)  Nathan's comment
> about starting with code that was sorta-BSD-with-advertising alarmed me.

Perhaps i was a bit lax in my wording.  I don't read the license
as requiring any advertising at run time, just as a requirement that
the copyright notice be kept with the source code.  That is, an
identical requirement to the one that postgres itself uses.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Josh Berkus
Date:
Mark,

> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a UUID,
> it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL 
8.2, that you'd better separate your database design arguments from your 
inclusion arguments.   

For example, you might get my agreement that it would be useful to have a 
UUID as a core type; you would *never* get my agreement to recommend using 
UUID to newbies.   I have seen *far* too many abuses of UUIDs in really 
bad database design.   People who use them should be experienced enough to 
know what they're doing.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
I agree with Josh on the UUID type, it gets abused far too often and (IMHO) isn't widely enough used to belong in the core.  Couldn't you just fix the problem in pguuid rather than write a whole new type?

On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:
Mark,

> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a UUID,
> it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.

For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies.   I have seen *far* too many abuses of UUIDs in really
bad database design.   People who use them should be experienced enough to
know what they're doing.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: uuid type for postgres

From
nathan wagner
Date:
On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:
> I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused?  It doesn't seem to me
that it would be any more prone to abuse than any other type.

> and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle.  There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used.  Since no one uses one, it's not in core or contrib.

> Couldn't you just fix the 
> problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem.  Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself.  UUIDs themselves don't
really interest me, I just happen to need one for my application.

I'm basically done.  I'll tar it up and post a link to an alpha version in a
couple of hours.  As far as i can tell, it works fine.  I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log or
throw an error even if i detect one.  Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Bob Ippolito
Date:
On Sep 6, 2005, at 12:57 PM, Jonah H. Harris wrote:

On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:
Mark,

> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a UUID,
> it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.

For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies.   I have seen *far* too many abuses of UUIDs in really
bad database design.   People who use them should be experienced enough to
know what they're doing.

I agree with Josh on the UUID type, it gets abused far too often and (IMHO) isn't widely enough used to belong in the core.  Couldn't you just fix the problem in pguuid rather than write a whole new type?


This sounds like a strawman argument.  People abuse a lot of types to do a lot of things they shouldn't be doing, adding types to the core isn't really going to change that much one way or the other.

If the documentation gives the user a good idea of when to use UUID and when not, I think it would be a good addition.  Worst case, it could become part of contrib, so at least it ships with PostgreSQL with the same license.  The GPL license of pguuid is a show-stopper for some (and seems like a silly choice for what it is and who it's for).

-bob

Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
The only time I've seen someone use UUIDs in PostgreSQL is when they were converting from SQL Server.  I've seen many bad data models using UUID that could've/should've used normal sequences for portability.

I look forward to seeing you're code.

Thanks!


On 9/6/05, nathan wagner <nw@hydaspes.if.org> wrote:
On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:
> I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused?  It doesn't seem to me
that it would be any more prone to abuse than any other type.

> and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle.  There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used.  Since no one uses one, it's not in core or contrib.

> Couldn't you just fix the
> problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem.  Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself.  UUIDs themselves don't
really interest me, I just happen to need one for my application.

I'm basically done.  I'll tar it up and post a link to an alpha version in a
couple of hours.  As far as i can tell, it works fine.  I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log or
throw an error even if i detect one.  Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.

--
Nathan Wagner

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Tue, Sep 06, 2005 at 03:57:55PM -0400, Jonah H. Harris wrote:
> I agree with Josh on the UUID type, it gets abused far too often and (IMHO) 
> isn't widely enough used to belong in the core.

There is much in PostgreSQL from my perspective that falls under the
category of 'most advanced open source SQL server in the world', and
not at all in the category of 'widely used'. Unless the history of
PostgreSQL isn't to be accepted in terms of direction for PostgreSQL,
I don't see why such a useful building block shouldn't be supported
by the core. It may not be widely used, because it doesn't exist.
Not yet, anyways. pguuid is broken, remember?

Which isn't to say that it should be in the core, or it must be in the
core. It is to say, however, that I don't buy your arguments. Abused how?
How can you judge widely used for something that doesn't exist?

I'm not very newbie oriented. I think I tend to assume that anybody
using PostgreSQL must know what they are doing. This may be very
wrong of me to assume. I assume you all know what you are doing,
for example.

I'm curious as to what sort of abuses you and Josh are speaking about.
I'd particularly like to know if you think I am abusing it, as I don't
believe that I am. I would also be interested in knowing what you felt
would be an abuse with UUID, that wouldn't be an abuse with SERIAL.
Should we tell newbies not to use SERIAL? Perhaps we should. Do we?

> Couldn't you just fix the 
> problem in pguuid rather than write a whole new type?

I'll submit my patches to pguuid once I've had some more production
use of it. For the little bit of code that it was, there were some
pretty bad bugs. It was sometimes crashing, due to what appears to be
a palloc() that was one byte short (it reliably crashed on me, the
line is wrong as written, and after adding + 1, it no longer crashes
at all). The operator definitions weren't correct, and the indices on
columns of the type were not being preferred by the query planner (a
good thing too - if they were used for any operator other than '=',
the errors in the operator definitions would have caused very
unexpected results).

So yes, pguuid can be fixed. I'm not sure that pguuid makes a good
model for a PostgreSQL extension, but it could be resurrected and
used. (I think the project showed no real updates since 2003?)

But - it's GPL, limiting its use with regard to PostgreSQL
distribution, and it requires special compilation, which as Nathan
found out, doesn't work on MacOS.

I think the GPL reason alone is a compelling reason to create a new
extension. Using a more portable UUID base library (either written
from scratch, or re-used from some other place providing a
compatible license) is a compelling reason to create a new
extension.

As to whether it should be in core - I think that a fully functional
module could be widely used, especially in larger systems that are
having difficulty spreading transactions across multiple
machines. This is the domain that UUID shines in. I can write a
transaction at my site without being very worried that it will collide
with a transaction at your site. We don't have to be connected in real
time.

In my choice of use, I'm using them instead of SERIAL columns, as I
wish to have more freedom merging production data with test data.  I
wish to continually import production data into my test environment,
in a single direction. UUID will prevent conflicts from occurring.
SERIAL cannot (although in theory, I could set my copy of the serial
value to 1 billion or something hacky - but that doesn't scale in a
simple fashion to having several test environments). My other
preferred use, is to expose a handle on the data to the world in
select circumstances. I don't want to give them a SERIAL column, as
it lets them be able to predict what else they might have access
to. I'm not using it to secure the data (other routines will do this),
but I am using it to hide the data. Why should people accessing my
system know how many records exist in my tables? Why should they be
able to predict the next value? Why should they be able to relate
the data, or mine my data in the case that I allow read to all?

Yes, I can avoid exposing the UUID/SERIAL in most circumstances. They
can get at most of the data through a name based path. For a few of my
pieces of data, though, I want to provide a secondary means of
accessing the data that does not require a path. Names cause problems,
especially if the names contain UNICODE characters, or if the names
are very long. Giving them a reliably unique handle of fixed length
is highly desirable to me in these circumstances.

Anyways, I'm using a fixed up pguuid right now, and getting along fine.
Nathan, under MacOS, isn't - with or without my patch. And I believe
he stated he had issues with the GPL license.

Josh may be correct, that in terms of a position statement, my
arguments are all over the map, and ignorant of newbies. I'm not a
speaker, or a writer. I'm a technical person who was frustrated with
SERIAL, happy with UUID, frustrated to find pguuid broken, happy that
it was fixable, frustrated with having to compile and install it
separately, happy that there is interest from others (Nathan) in
attempting to have this building block worked on, and perhaps
eventually added to the core.

Personally, I'm not sure what the big opposition to UUID is all about.
Even abused, it can do a better job than SERIAL at quadruple the
storage requirements (not a big deal when considering PostgreSQL's
record overhead). If it'll make it more acceptable, I'll write the
UUID implementation myself, and release it under whatever license
you find most beneficial to PostgreSQL. :-)

Cheers,
mark



> On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:
> > 
> > Mark,
> > 
> > > I suggest that UUID be recommended in place of SERIAL for certain
> > > classes of applications, and that it therefore belongs in the core.
> > > UUID and SERIAL can be used together (although, once you have a UUID,
> > > it may not be useful to also have a SERIAL).
> > 
> > I think that, if you want to push a refactored UUID type for PostgreSQL
> > 8.2, that you'd better separate your database design arguments from your
> > inclusion arguments.
> > 
> > For example, you might get my agreement that it would be useful to have a
> > UUID as a core type; you would *never* get my agreement to recommend using
> > UUID to newbies. I have seen *far* too many abuses of UUIDs in really
> > bad database design. People who use them should be experienced enough to
> > know what they're doing.
> > 
> > --
> > --Josh
> > 
> > Josh Berkus
> > Aglio Database Solutions
> > San Francisco
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > choose an index scan if your joining column's datatypes do not
> > match
> >

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
David Fetter
Date:
On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote:
> On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:
> > I agree with Josh on the UUID type, it gets abused far too often
> 
> Out of curiosity, how does it get abused?  It doesn't seem to me
> that it would be any more prone to abuse than any other type.

A typical abuse, as I have seen it, is for OO coders on the front-end
to turn the entire databse into what amounts to an associative array.
It then becomes somewhere between difficult and impossible to get good
performance, as UUID is the only thing useful as a PK/FK.

> > and (IMHO) isn't widely enough used to belong in the core.
> 
> Seems like a self-fulfilling cycle.  There isn't one in core or
> contrib (pguuid is not in contrib, IIRC), and so one doesn't get
> used.  Since no one uses one, it's not in core or contrib.

In this case, not having it easy to get to is, IMHO, an *excellent*
idea.  The whole thing is a giant foot-gun.

Just my $.02.

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: uuid type for postgres

From
Bob Ippolito
Date:
On Sep 6, 2005, at 2:16 PM, nathan wagner wrote:

> On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com  
> wrote:
>
>> I agree with Josh on the UUID type, it gets abused far too often
>>
>
> Out of curiosity, how does it get abused?  It doesn't seem to me
> that it would be any more prone to abuse than any other type.
>
>
>> and (IMHO) isn't widely enough used to belong in the core.
>>
>
> Seems like a self-fulfilling cycle.  There isn't one in core
> or contrib (pguuid is not in contrib, IIRC), and so one doesn't
> get used.  Since no one uses one, it's not in core or contrib.
>
>
>> Couldn't you just fix the
>> problem in pguuid rather than write a whole new type?
>>
>
> Not sure which you you're addressing here, but I can't fix the license
> problem.  Nor can I readily fix the lack of a mac ioctl() to get
> the hardware mac address.

It's not that hard to get a MAC address or serial number out of a  
Mac, the problem is I'm not going to bother writing that code for a  
GPL extension to PostgreSQL.  There's a better way, anyhow.  Darwin  
ships with UUID parse/generate/etc functions in its Libc (<uuid/ 
uuid.h>).

FWIW, that code (the Theodore Ts'o UUID implementation) looks to be  
suitably licensed for PostgreSQL, and it actually it appears that  
someone has already written a PostgreSQL UUID type using this it...  
which is unfortunately LGPL'ed (why?!).

-bob



Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Tue, Sep 06, 2005 at 01:49:28PM -0700, David Fetter wrote:
> On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote:
> > On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:
> > > I agree with Josh on the UUID type, it gets abused far too often
> > Out of curiosity, how does it get abused?  It doesn't seem to me
> > that it would be any more prone to abuse than any other type.
> A typical abuse, as I have seen it, is for OO coders on the front-end
> to turn the entire databse into what amounts to an associative array.
> It then becomes somewhere between difficult and impossible to get good
> performance, as UUID is the only thing useful as a PK/FK.

Replace "UUID" with "SERIAL" in your above paragraph, and I think your
argument is still true. Perhaps UUID makes it easier, as it crosses
the table boundary - but somebody so inclined, can use *val() to
define their own "database wide SERIAL identifier".

I don't see anything intrinisic about SERIAL, UUID, or OO abstracting
techniques that requires people to write inefficient code. People
write inefficient code because they don't know any better. They don't
properly look at how their data will be used, and what path the data
will take to get from the tables on disk, to the user they are
presenting the information to. If they had, in the case you describe,
perhaps they'd be horrified. Or perhaps they would STILL not know
any better. :-)

Technically, the UUID is 4x the size of a SERIAL, or 2x the size of a
SERIAL8. For this cost in terms of efficiency (table size, index size),
you are purchasing the scaleability of being able to, at any point in
the future, more easily (than SERIAL) merge your tables with other
tables, and have a relatively opaque handle to give out to allow
people to uniquely identify a collection of rows. For me, both are
appealing. For others, one or the other may be more appealing.

In terms of portability, perhaps (somebody else made this
arguments). If other systems don't have a UUID type, you end screwed
if you ever wished to ditch PostgreSQL. The reason I don't buy this
argument, though, is that in my current application, I dumped being
able to ditch PostgreSQL a long time ago. There are just too many neat
things I can do with PostgreSQL, that I can't do with MySQL, and that
I either can't, or don't want to do with Oracle. You're stuck with me
now... :-)

If people truly felt this way, there are many PostgreSQL extensions
that should be phased out, rather than introduced.

I'd like to point out the obvious - a UUID is just an identifier. As
it isn't linked to the table, it doesn't require generation to be
server-side. Any application can generate a UUID as two BIGINTs, and
use these two BIGINTs as a primary key, and accomplish the same.
Before I stumbled upon pguuid, and became interested in the rather
impressive extension interfaces provided by PostgreSQL, I was
considering doing this myself. The costs would have been more
complicated queries, and limited practical use from the psql command
line.

I've become comfortable with the concept of a UUID over the last two
or three years, as they have been a requirement for us to use for a
product at my primary place of employment. We use a source management
system with replicas at each site, that is synchronized periodically.
We write software on top of this system to perform automated tasks at
different sites (sometimes executing at the other sites), or that
requires configuration information to be associated with the data
stored in this source management system. In a previous product, we
used names to access the data. This failed horribly when the names
changed. We now frequently, and liberally use the UUID for the
objects. This allows us to associate data with, access or manipulate
the objects at any site, without ANY problem. To go back to a name
based access method seems a step BACKWARDS.

UUID definately has a place. It's only a question of the imagination
of the user as to how abused, or how well used, it is.

I think it deserves a spot in PostgreSQL, and that people would
possibly use it more than they would SERIAL. Eventually, SERIAL wraps
around. So you switch to SERIAL8. At the point that you have SERIAL8,
you aren't worried terribly about disk space, and you realize there is
usually no benefit at all to the numbers being ordered so closely. Why
not pick a scheme that is based on time? Perhaps create a revision
code field to deal with objects created simultaneously from the same
source. And why not identify the source to prevent collisions from
multiple sources? If we go from 8 bytes, to 16 bytes, we can encode
all of this information neatly. Welcome, UUID. :-)

mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
Greg Stark
Date:
mark@mark.mielke.cc writes:

> In my choice of use, I'm using them instead of SERIAL columns, as I
> wish to have more freedom merging production data with test data.  I
> wish to continually import production data into my test environment,
> in a single direction. UUID will prevent conflicts from occurring.
> SERIAL cannot (although in theory, I could set my copy of the serial
> value to 1 billion or something hacky - but that doesn't scale in a
> simple fashion to having several test environments). 

Just do something like this for every sequence:
 ALTER SEQUENCE foo INCREMENT BY 100

And then choose a particular initial value for each server.

I agree with the others that uuid seems to be overused in lots of cases where
a simple serial would serve just as well.

However I don't see why a good uuid type is any less of a valid addition to
the set of data types than any of the others. inet, macaddr, all the geometric
types, for example. 

Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty
fundamental component of that whole ball of wax. A note in the documentation
that it's designed for use as part of a multiple-database system like that
might be helpful.

-- 
greg



Re: uuid type for postgres

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> Personally, I'm not sure what the big opposition to UUID is all about.

I don't see any "big opposition".  People are simply questioning the
idea whether it belongs in core PG.  The reason we don't want to accept
everything-and-the-kitchen-sink in core is that we have only limited
manpower to maintain it.  So you've got to justify that we should spend
our effort here and not elsewhere.  There's a fair amount of nearly
unmaintained cruft in the core distro already (eg, the never-finished
"line" datatype ... or the entire rtree index module ...) and a datatype
that might be used by only a few people is a likely candidate to become
an unmaintained backwater.  And yet it's hard to get rid of stuff that's
been there awhile.  So one of the questions that's going to be asked is
how useful/popular it's really going to be.

One thing that is raising my own level of concern quite a bit is the
apparent portability issues.  Code that isn't completely portable is a
huge maintainability problem; in particular, stuff that requires
system-dependent behavior used nowhere else in Postgres is a real pain.
It sounds like the UUID code expects to be able to get at the machine's
MAC address, which suggests serious issues in (a) relying on
not-too-standard APIs, (b) possible protection issues (will an
unprivileged process be able to get at the MAC address?), and (c)
ill-defined behavior on machines with more or less than one MAC address.
Not to mention that MAC addresses aren't so unique as all that.

The bottom line is that we're willing to listen, but it's not by any
means a slam dunk to get this into the distribution.
        regards, tom lane


Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Tue, Sep 06, 2005 at 05:31:43PM -0400, Greg Stark wrote:
> Just do something like this for every sequence:
>   ALTER SEQUENCE foo INCREMENT BY 100
> And then choose a particular initial value for each server.

*shudder*

But you are right. That would work. :-)  (I shudder from the maintenance
head-ache - if I re-cloned the test database, and reset the sequence
to the wrong value, I would have overlap)

> I agree with the others that uuid seems to be overused in lots of cases where
> a simple serial would serve just as well.

Yes.

In March I only used serial columns (actually, I originally used the
hacky MySQL auto_increment type before I migrated the database to
PostgreSQL). I found problems under actual usage with regard to
testing, and with exposing the numbers to the users. I started by
introducing the uuid along side the serial column, but shortly after
realized that it was ridiculous. I effectively had two primary keys,
with two unique indexes. Drop the serial column, and I'm left with
one that does everything I want.

I now used mixed serial and uuid. I retained serial columns for space
reasons. Serial makes it more common for some of my table primary key
indexes or even tables themselves (enumerated type tables) to fit in
fewer pages, improving search time. Some tables may cluster rows more
naturally on a serial column (would depend if the uuid type was sorted
by embedded time stamp, or by literal byte array value - memcmp()).

For the objects represented by a collection of rows, that I expose
to external interfaces, however, I only use the uuid.

My point in this long and winded set of paragraphs, is that although I
agree that uuid might be overused in some situations, I think people
may be currently underusing it in others. They're both ways of
generating unique identifiers. Depending on the context, one will be
better than the other. They may both work for most circumstances.

> However I don't see why a good uuid type is any less of a valid
> addition to the set of data types than any of the others. inet,
> macaddr, all the geometric types, for example.

I've never used geometric types, and don't see myself doing so any
time soon. :-)

> Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty
> fundamental component of that whole ball of wax. A note in the documentation
> that it's designed for use as part of a multiple-database system like that
> might be helpful.

Yep.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:
> I don't see any "big opposition".  People are simply questioning the
> idea whether it belongs in core PG.  The reason we don't want to accept
> everything-and-the-kitchen-sink in core is that we have only limited
> manpower to maintain it.  So you've got to justify that we should spend
> our effort here and not elsewhere.  There's a fair amount of nearly
> ...
> been there awhile.  So one of the questions that's going to be asked is
> how useful/popular it's really going to be.

Sounds reasonable, and certainly no more than I expected. If Nathan
hadn't raised the issue, it probably would have been a few months
before I raised it myself.

> One thing that is raising my own level of concern quite a bit is the
> apparent portability issues.  Code that isn't completely portable is a
> huge maintainability problem; in particular, stuff that requires
> system-dependent behavior used nowhere else in Postgres is a real pain.
> It sounds like the UUID code expects to be able to get at the machine's
> MAC address, which suggests serious issues in (a) relying on
> not-too-standard APIs, (b) possible protection issues (will an
> unprivileged process be able to get at the MAC address?), and (c)
> ill-defined behavior on machines with more or less than one MAC address.
> Not to mention that MAC addresses aren't so unique as all that.

I'll try to prepare an answer for this. (I started to write a lot of
information - but is it unverified from memory, and perhaps should be
more authoritative before presented as truth)

> The bottom line is that we're willing to listen, but it's not by any
> means a slam dunk to get this into the distribution.

Sounds good.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
nathan wagner
Date:
As promised a link to the code.

http://granicus.if.org/~nw/ossp_pg_uuid-0.1.tar.gz

You'll also need Ralf Engelschall's uuid library, which mine is a postgres
interface to.  It's available at

ftp://ftp.ossp.org/pkg/lib/uuid/uuid-1.3.0.tar.gz

It probably has a few warts.  I'm mainly posting it for suggestions,
comments, and so we have something that isn't vaporware to argue about.

:)

-- 
Nathan Wagner


Re: uuid type for postgres

From
nathan wagner
Date:
On Tue, Sep 06, 2005 at 05:54:34PM -0400, tgl@sss.pgh.pa.us wrote:
> One thing that is raising my own level of concern quite a bit is the
> apparent portability issues.

I can't speak to the portability in general, but there is a PORTING file
in the ossp uuid library that states

OSSP uuid was already written with maximum portability in mind, so there should be no great effort required to get it
runningon any Unix platform with a reasonable POSIX API. Additionally, the portability was tested by successfully
buildingand running it on the following particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):
 
     alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc)
ia64-hpux11.23 (cc) ix86-debian2.2 (gcc, icc) ix86-debian3.0 (gcc)
ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2 (gcc, icc)
ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3
(cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc)

On my end I managed to compile it with nothing more than a "configure",
"make", followed by a "make install".

> Code that isn't completely portable is a huge maintainability problem; in
> particular, stuff that requires system-dependent behavior used nowhere
> else in Postgres is a real pain. It sounds like the UUID code expects to
> be able to get at the machine's MAC address,

If the mac address isn't available, I believe it falls back on using
a random 47 bit number with the 48th bit set to make the mac address
fall within the multicast mac numberspace.  You could also use a version
4 uuid, or derive a version 3 or 5 uuid from some other source.

The better answer though, is these sort of questions are exactly why
I would prefer to rely on someone else's library.  Just as I basically
trust that the folks maintaining postgres aren't going to munge my tables
and destroy my data if i mess up a transaction and roll it back,
because they've spent time thinking about just that sort of problem, I
also (having worked with the code a bit now) trust the UUID folks
to have thought about "just how do we make a unique number without
centralized coordination?"  "base on the mac address?"  "what if we
don't have one?  or don't know it for some reason?"  I assume here that
the answer they came up with wasn't "oh, hell, just return a 42 then".

The point being, that other people have already written a better uuid
library than i am likely to, so, license permitting, let's use it.

> The bottom line is that we're willing to listen, but it's not by any
> means a slam dunk to get this into the distribution.

Fair enough.

Personally, I think it should be a core type, but would be quite happy
if it were in contrib.  At least that way it would save the next guy
from having to hunt around the net.

I guess i'm volunteering to maintain it in contrib.  I'm not certain
if i have the requisite knowledge to maintain it in the core.  While
I could acquire the familiarity if need be, for the next year and nine
months law school is going to take up the bulk of my free time.  And
of course I'll still need time to play around with my ticketing and gis
databases i'm developing.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Bob Ippolito
Date:
On Sep 6, 2005, at 3:06 PM, mark@mark.mielke.cc wrote:

> On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:
>
>> I don't see any "big opposition".  People are simply questioning the
>> idea whether it belongs in core PG.  The reason we don't want to  
>> accept
>> everything-and-the-kitchen-sink in core is that we have only limited
>> manpower to maintain it.  So you've got to justify that we should  
>> spend
>> our effort here and not elsewhere.  There's a fair amount of nearly
>> ...
>> been there awhile.  So one of the questions that's going to be  
>> asked is
>> how useful/popular it's really going to be.
>>
>
> Sounds reasonable, and certainly no more than I expected. If Nathan
> hadn't raised the issue, it probably would have been a few months
> before I raised it myself.
>
>
>> One thing that is raising my own level of concern quite a bit is the
>> apparent portability issues.  Code that isn't completely portable  
>> is a
>> huge maintainability problem; in particular, stuff that requires
>> system-dependent behavior used nowhere else in Postgres is a real  
>> pain.
>> It sounds like the UUID code expects to be able to get at the  
>> machine's
>> MAC address, which suggests serious issues in (a) relying on
>> not-too-standard APIs, (b) possible protection issues (will an
>> unprivileged process be able to get at the MAC address?), and (c)
>> ill-defined behavior on machines with more or less than one MAC  
>> address.
>> Not to mention that MAC addresses aren't so unique as all that.
>>
>
> I'll try to prepare an answer for this. (I started to write a lot of
> information - but is it unverified from memory, and perhaps should be
> more authoritative before presented as truth)

Some modern UUID implementations prefer /dev/urandom or similar to  
the time or MAC address unless you really beg them to give you a  
weaker UUID.

You can take a look at the man page for the Theodore Y. Ts'o  
implementation that is in Darwin's Libc here:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/ 
man3/uuid_generate.3.html

Specifically:
       The uuid_generate function creates a new universally unique   
identifier       (UUID).   The  uuid  will be generated based on high-quality  
randomness       from  /dev/urandom,  if  available.   If  it  is  not   
available,  then       uuid_generate  will use an alternative algorithm which uses  
the current       time, the local ethernet MAC address (if available),  and   
random  data       generated using a pseudo-random generator.

The Apache Portable Runtime has a apr_os_uuid_get() that supports two  
flavors of UUID for unix (Linux/Mac OS X uuid_generate and FreeBSD's  
uuid_create, may be available elsewhere), and the UuidCreate API on  
Win32.  apr-util's apr_uuid_get() will use apr_os_uuid_get() if  
available, and otherwise will default to a relatively weak mostly- 
timestamp-based UUID.

It would probably be reasonable and easy to do what Apache does  
here.  A platform UUID implementation, if present, is generally going  
to be better than anything included into PostgreSQL itself.

-bob



Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
Tom, you worded my thoughts much better than I did.

Bob, I too had heard that host-based UUIDs/GUIDs had issues with uniqueness.  I think Microsoft's implementation was hosed and they ended up eliminating using the MAC completely.

I'll check out the code & get back.


On 9/6/05, Bob Ippolito <bob@redivi.com> wrote:

On Sep 6, 2005, at 3:06 PM, mark@mark.mielke.cc wrote:

> On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:
>
>> I don't see any "big opposition".  People are simply questioning the
>> idea whether it belongs in core PG.  The reason we don't want to
>> accept
>> everything-and-the-kitchen-sink in core is that we have only limited
>> manpower to maintain it.  So you've got to justify that we should
>> spend
>> our effort here and not elsewhere.  There's a fair amount of nearly
>> ...
>> been there awhile.  So one of the questions that's going to be
>> asked is
>> how useful/popular it's really going to be.
>>
>
> Sounds reasonable, and certainly no more than I expected. If Nathan
> hadn't raised the issue, it probably would have been a few months
> before I raised it myself.
>
>
>> One thing that is raising my own level of concern quite a bit is the
>> apparent portability issues.  Code that isn't completely portable
>> is a
>> huge maintainability problem; in particular, stuff that requires
>> system-dependent behavior used nowhere else in Postgres is a real
>> pain.
>> It sounds like the UUID code expects to be able to get at the
>> machine's
>> MAC address, which suggests serious issues in (a) relying on
>> not-too-standard APIs, (b) possible protection issues (will an
>> unprivileged process be able to get at the MAC address?), and (c)
>> ill-defined behavior on machines with more or less than one MAC
>> address.
>> Not to mention that MAC addresses aren't so unique as all that.
>>
>
> I'll try to prepare an answer for this. (I started to write a lot of
> information - but is it unverified from memory, and perhaps should be
> more authoritative before presented as truth)

Some modern UUID implementations prefer /dev/urandom or similar to
the time or MAC address unless you really beg them to give you a
weaker UUID.

You can take a look at the man page for the Theodore Y. Ts'o
implementation that is in Darwin's Libc here:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/
man3/uuid_generate.3.html

Specifically:

        The uuid_generate function creates a new universally unique
identifier
        (UUID).   The  uuid  will be generated based on high-quality
randomness
        from  /dev/urandom,  if  available.   If  it  is  not
available,  then
        uuid_generate  will use an alternative algorithm which uses
the current
        time, the local ethernet MAC address (if available),  and
random  data
        generated using a pseudo-random generator.

The Apache Portable Runtime has a apr_os_uuid_get() that supports two
flavors of UUID for unix (Linux/Mac OS X uuid_generate and FreeBSD's
uuid_create, may be available elsewhere), and the UuidCreate API on
Win32.  apr-util's apr_uuid_get() will use apr_os_uuid_get() if
available, and otherwise will default to a relatively weak mostly-
timestamp-based UUID.

It would probably be reasonable and easy to do what Apache does
here.  A platform UUID implementation, if present, is generally going
to be better than anything included into PostgreSQL itself.

-bob


Re: uuid type for postgres

From
Greg Stark
Date:
mark@mark.mielke.cc writes:

> Eventually, SERIAL wraps around. So you switch to SERIAL8. At the point that
> you have SERIAL8, you aren't worried terribly about disk space, and you
> realize there is usually no benefit at all to the numbers being ordered so
> closely.

a) Except for trivially small applications you are _always_ worried terribly  about disk space. The more money you
spendon high end raid arrays the  *more* you're worried about an incremental drain on performance.
 

b) You only have to go to SERIAL8 for the few tables that actually have that  many records. You may well still have
SERIALdata types in 99% of your  tables.
 

c) You underestimate the cost of the added space. Don't forget it's not just  an extra few bytes in the primary key.
It'salso quadrupling the size of  your primary key index (doubling over bigint).
 
  Most importantly it's also adding a few extra bytes to every foreign key  column in every table that references that
primarykey. For very relational  databases with tables doing things like many-to-many joins or having 4+  foreign key
referencingcolumns increasing all those integers to be 16  bytes increases the size of your database *immensely*.
 

> Why not pick a scheme that is based on time? Perhaps create a revision code
> field to deal with objects created simultaneously from the same source. And
> why not identify the source to prevent collisions from multiple sources? If
> we go from 8 bytes, to 16 bytes, we can encode all of this information
> neatly. Welcome, UUID. :-)

Welcome to exactly the abuse that people are fearing if it were included as a
built-in type.

Personally I don't think the argument that some people might abuse it is a
good reason not to provide it. There are uses for which it's very effective --
even necessary. And I think it's important enough for the people that need it
that it should be considered a fundamental database feature these days. 

The people who will abuse it (like yourself, imho) will always exist and the
more powerful the tool the bigger the holes in their feet. Shouldn't stop us
from having powerful tools when we need them.

-- 
greg



Re: uuid type for postgres

From
Josh Berkus
Date:
Bob, People,

Let me clarify my stance here, because it seems to be getting
misrepresented.

Mark (and Nathan) pushed at repaired UUID type for possible inclusion in
the core PostgreSQL distribution.  I'm not opposed to that, provided that
the portability, licensing, and bugs are worked out.  Why not?  We have
ipv6 data types, after all.

However, Mark went on to suggest that we should recommend UUID over SERIAL
in the docs, and that we could consider dropping SERIAL entirely in favor
of UUID:

---quoth Mark------------------
I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a  
UUID, it may not be useful to also have a SERIAL).
---------------------------------

This was what I objected to; I believe that the use-case for UUIDs is
actually quite narrow and assert that it's a very bad idea to promote them
to most users.

I have a "problem" with SERIAL abuse, too.   In general, new DB designers
have come to increasingly believe that surrogate keys (SERIALs, UUIDs,
hash ids etc.) are an intrinsic part of the relational model and a
requirement for all tables.   Terrible database designs have resulted,
chock full of tables which lack real keys and cannot be normalized.

UUIDs tend to encourage this sort of behavior even more than SERIALs, not
because of any intrinsic quality in the data type, but because much of the
literature on the subject treats them like some kind of "universal object
identifier" and not distinguishing servers, relations, or real keys.

To repeat, though, this isn't a reason to keep them out of core, but it
*is* a reason not to throw them at newbies as the holy grail of row
identifiers.

For my part, I generally push implementing the UUID concept in a better way
that keeps server, table, and surrogate keys atomic (and thus more useful
and easier to debug).

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
Bob Ippolito
Date:
On Sep 6, 2005, at 6:02 PM, Josh Berkus wrote:

> Bob, People,
>
> Let me clarify my stance here, because it seems to be getting
> misrepresented.
>
> Mark (and Nathan) pushed at repaired UUID type for possible  
> inclusion in
> the core PostgreSQL distribution.  I'm not opposed to that,  
> provided that
> the portability, licensing, and bugs are worked out.  Why not?  We  
> have
> ipv6 data types, after all.
>
> However, Mark went on to suggest that we should recommend UUID over  
> SERIAL
> in the docs, and that we could consider dropping SERIAL entirely in  
> favor
> of UUID:
>
> ---quoth Mark------------------
> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a
> UUID, it may not be useful to also have a SERIAL).
> ---------------------------------
>
> This was what I objected to; I believe that the use-case for UUIDs is
> actually quite narrow and assert that it's a very bad idea to  
> promote them
> to most users.

I agree with you (Josh) completely, which is why I said:
"If the documentation gives the user a good idea of when to use UUID  
and when not, I think it would be a good addition."

.. the fact that the use-cases are narrow was implicit :)

Everything else I talked about was just implementation details.   
Summary: there are (several) UUID implementations out there that are  
appropriately licensed and easy enough to use, and a lot of OSes ship  
with pretty good implementations already.  Creating a decent UUID  
type should be relatively trivial, as far as those things go.

-bob



Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Tue, Sep 06, 2005 at 06:02:50PM -0700, Josh Berkus wrote:
> However, Mark went on to suggest that we should recommend UUID over SERIAL 
> in the docs, and that we could consider dropping SERIAL entirely in favor 
> of UUID:
> ---quoth Mark------------------
> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a  
> UUID, it may not be useful to also have a SERIAL).
> ---------------------------------
> This was what I objected to; I believe that the use-case for UUIDs is 
> actually quite narrow and assert that it's a very bad idea to promote them 
> to most users.   

Ahhh... :-)

I intended the word 'certain' to be emphasized in some way, rather
than dropped. There are genuine uses for UUIDs. I didn't intend for
everybody to pull out their database definitions and change them all to
use UUID instead of SERIAL.

Although - I don't think really bad things would happen if people did.
They would simply be making a non-optimal choice (abusing the type?).
Certainly nothing they weren't capable of before this particular
capability were to arrive. :-)

> I have a "problem" with SERIAL abuse, too.   In general, new DB designers 
> have come to increasingly believe that surrogate keys (SERIALs, UUIDs, 
> hash ids etc.) are an intrinsic part of the relational model and a 
> requirement for all tables.   Terrible database designs have resulted, 
> chock full of tables which lack real keys and cannot be normalized.

> UUIDs tend to encourage this sort of behavior even more than SERIALs, not 
> because of any intrinsic quality in the data type, but because much of the 
> literature on the subject treats them like some kind of "universal object 
> identifier" and not distinguishing servers, relations, or real keys.  

> To repeat, though, this isn't a reason to keep them out of core, but it 
> *is* a reason not to throw them at newbies as the holy grail of row 
> identifiers.

I agree. Although I lost it on the "cannot be normalized". I'm assuming
there are designs you have seen much worse than the ones I have seen. :-)

> For my part, I generally push implementing the UUID concept in a better way 
> that keeps server, table, and surrogate keys atomic (and thus more useful 
> and easier to debug).

My eyes are glazing over a bit on this last one. Atomic?

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
Josh Berkus
Date:
Mark,

> I agree. Although I lost it on the "cannot be normalized". I'm assuming
> there are designs you have seen much worse than the ones I have seen.
> :-)

Mostly it's the problem of tables that don't have a real key, only a 
surrogate key.  How do you know what's a duplicate?

> > For my part, I generally push implementing the UUID concept in a
> > better way that keeps server, table, and surrogate keys atomic (and
> > thus more useful and easier to debug).
>
> My eyes are glazing over a bit on this last one. Atomic?

Sure.  What's a UUID, after all?   It's three pieces of information:
1) A server or database instance identifier
2) A table identifier
3) A row identifier

e.g.:
chayote.sf.agliodbs.com | public.customers | 4271

Generally, it's pretty easy to just grab these 3 pieces of information 
separately and pass them with the data when you want to transmit between 
servers.  If there are space/overhead considerations, you can put them 
into a reversable hash.  I've never understood the complex measures which 
application developers take to create "universal" IDs.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Tuesday, September 06, 2005 6:40 PM
> To: mark@mark.mielke.cc
> Cc: Bob Ippolito; jonah.harris@gmail.com;
pgsql-hackers@postgresql.org;
> nathan wagner
> Subject: Re: [HACKERS] uuid type for postgres
>
> Mark,
>
> > I agree. Although I lost it on the "cannot be normalized". I'm
assuming
> > there are designs you have seen much worse than the ones I have
seen.
> > :-)
>
> Mostly it's the problem of tables that don't have a real key, only a
> surrogate key.  How do you know what's a duplicate?
>
> > > For my part, I generally push implementing the UUID concept in a
> > > better way that keeps server, table, and surrogate keys atomic
(and
> > > thus more useful and easier to debug).
> >
> > My eyes are glazing over a bit on this last one. Atomic?
>
> Sure.  What's a UUID, after all?   It's three pieces of information:
> 1) A server or database instance identifier
> 2) A table identifier
> 3) A row identifier
>
> e.g.:
> chayote.sf.agliodbs.com | public.customers | 4271
>
> Generally, it's pretty easy to just grab these 3 pieces of information
> separately and pass them with the data when you want to transmit
between
> servers.  If there are space/overhead considerations, you can put them
> into a reversable hash.  I've never understood the complex measures
which
> application developers take to create "universal" IDs.

What if you don't want to move anything from one system to another, but
you would like to do joins across many servers on a network?

It happens all the time. If you have a built in primary key for related
tables and they are guaranteed unique, then you have a leg up.

It is not unusual for the same schema to be used in many different
regional offices, but to have site specific data in each area.


Re: uuid type for postgres

From
nathan wagner
Date:
On Tue, Sep 06, 2005 at 06:40:27PM -0700, josh@agliodbs.com wrote:
> Sure.  What's a UUID, after all?   It's three pieces of information:
> 1) A server or database instance identifier
> 2) A table identifier
> 3) A row identifier
> 
> e.g.:
> chayote.sf.agliodbs.com | public.customers | 4271
> 
> Generally, it's pretty easy to just grab these 3 pieces of information 
> separately and pass them with the data when you want to transmit between 
> servers.  If there are space/overhead considerations, you can put them 
> into a reversable hash.

Take a look at the version 3 or version 5 UUIDs.  They essentially do this.
The hash isn't reversable, but rather recreatable.  Seems that if it
were reversable, it would be compression, not a hash.  Anyway.

You're assuming though that you want to leak this information.  If you do,
or you don't care, go ahead with the v3 or v5 uuid creator.

Or use v1 if you want mac/time based uuids, or v4 for random uuids.

The convenient thing about using the uuid library is that all of these
mechanisms result in the same type, and are thus compatible.

> I've never understood the complex measures which 
> application developers take to create "universal" IDs.

Different applications have different goals.  Unguessability might
be important in some contexts.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Paul Ramsey
Date:
Just an FYI:

We also ended up rolling our own uuid type, against libuuid.  It  
seems that uuid is a widespread enough concept that implementors  
*will* be asked to support it, moderately often.  We *could* roll our  
own (were capable), others are not so lucky, and would have to point  
out the lack of a uuid type as a limitation of pgsql.  Which is too  
bad, given how relatively simple they are.

That said:

- linking against libuuid is fine for a contrib/ extension, but no  
good for a built-in type.  A real uuid would have to do a proper  
independent implementation of uuid creation within pgsql.
- we cannot snarf libuuid code, it is LGPL (though perhaps the author  
would re-license. if that is the *only* objection, it is well worth  
asking)

I think having a built-in uuid type is something that a large number  
of people will use.  Whether they use it will or badly is not our  
problem.  It is possible to build crappy databases with all the types  
that already exist, adding uuid is hardly going to bring the walls  
down.  Having uuid removes another excuse for people not doing a  
pgsql implementation.

I am not sure if I heard clearly from the core team that a self- 
contained, BSD-licensed uuid would be accepted(able)?  If so, I'll  
contact the libuuid author about a re-license (shortest path from A  
to B).

P.

On 6-Sep-05, at 6:50 AM, nathan wagner wrote:

>
>
> I have been in need of a uuid type and ran across the pguuid download
> by Xiongjian (Mike) Wang.  This wasn't really useful to me for two
> reasons: first, it is GPLed and I would prefer a more liberal license,
> secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
> SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
> ioctl() under darwin.
>
> While I could dike out the code that calls it, that seems like a  
> suboptimal
> solution.  So after a bit of poking around the interweb i ran across
> Ralf Engelschall's ossp uuid library.  This compiled with minimal
> effort on mac os.  Some reading, and an evening later, i've made
> a server plugin with supporting SQL that implements an 'ossp_uuid'
> type.
>
> Now i have four questions:
>
> 1: Is it feasible for this to be included in the contrib section of
> the regular download?  The uuid library is a "notice of copyright"  
> style
> license, and I am willing to put my own code into the public domain.
>
> 2: Would just calling the type 'uuid' be better than 'ossp_uuid'?   
> It's
> certainly a nicer name.
>
> 3: Would it be possible to include such a type as a postgres extension
> to the usual SQL types.  It seems to me that having an officially
> supported type would be better than a user contributed type on the  
> grounds
> that you could then rely on it being avaiable if postgres was.
> In particular, installing it as an extension would require the  
> cooperation
> of the DBA, which may be infeasible in some environments.
>
> -- 
> Nathan Wagner
>
>
>



Re: uuid type for postgres

From
Tom Lane
Date:
Paul Ramsey <pramsey@refractions.net> writes:
> I am not sure if I heard clearly from the core team that a self- 
> contained, BSD-licensed uuid would be accepted(able)?

I don't think any of the other core members weighed in on this thread,
so speaking strictly for myself: BSD license is an issue, and
portability is an issue.  I've heard enough to convince me that the
demand exists, but the legal and technical issues remain.
        regards, tom lane


Re: uuid type for postgres

From
nathan wagner
Date:
On Tue, Sep 06, 2005 at 08:40:08PM -0700, pramsey@refractions.net wrote:
> - linking against libuuid is fine for a contrib/ extension, but no  
> good for a built-in type.  A real uuid would have to do a proper  
> independent implementation of uuid creation within pgsql.

Why?

I'm not sure what the default build instructions are, but i've got
the following on my linux box...

granicus:~% ldd /usr/bin/postgres       linux-gate.so.1 =>  (0x00368000)       libpam.so.0 => /lib/libpam.so.0
(0x00311000)      libssl.so.5 => /lib/libssl.so.5 (0x009b3000)       libcrypto.so.5 => /lib/libcrypto.so.5 (0x00507000)
     libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x003ab000)       libz.so.1 => /usr/lib/libz.so.1 (0x00c55000)
libreadline.so.5=> /usr/lib/libreadline.so.5 (0x00c6a000)       libtermcap.so.2 => /lib/libtermcap.so.2 (0x00dfa000)
  libcrypt.so.1 => /lib/libcrypt.so.1 (0x009f7000)       libresolv.so.2 => /lib/libresolv.so.2 (0x0014e000)
libnsl.so.1=> /lib/libnsl.so.1 (0x00a62000)       libdl.so.2 => /lib/libdl.so.2 (0x00c4f000)       libm.so.6 =>
/lib/libm.so.6(0x00c29000)       libc.so.6 => /lib/libc.so.6 (0x00afd000)       libcom_err.so.2 => /lib/libcom_err.so.2
(0x0038e000)      libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x0099a000)       libk5crypto.so.3 =>
/usr/lib/libk5crypto.so.3(0x00483000)       /lib/ld-linux.so.2 (0x00adf000)       libkrb5support.so.0 =>
/usr/lib/libkrb5support.so.0(0x0041f000)
 

Quite a list.  I wonder what readline is doing there.

It would appear that linking against libraries is just fine, I don't
see that uuid is any different.

> - we cannot snarf libuuid code, it is LGPL (though perhaps the author  
> would re-license. if that is the *only* objection, it is well worth  
> asking)

Not sure what uuid library you were using, but the one i used
is not LGPL.  I posted earlier the copyright and license portion of
the readme.  No relicensing would be necessary.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Roman Neuhauser
Date:
# tgl@sss.pgh.pa.us / 2005-09-06 17:54:34 -0400:
> There's a fair amount of nearly
> unmaintained cruft in the core distro already (eg, the never-finished
> "line" datatype ... or the entire rtree index module ...) and a datatype
> that might be used by only a few people is a likely candidate to become
> an unmaintained backwater.  And yet it's hard to get rid of stuff that's
> been there awhile.  So one of the questions that's going to be asked is
> how useful/popular it's really going to be.   We'd have use for uuid in tables of N*10^6 rows (N<10 in most cases).
   I'm far from claiming to be an experienced C programmer, but count   me in for whatever I'll be able to do.
   I think that coming up with code that meets the general criteria for   inclusion in PostgreSQL first, before it's
consideredfor inclusion,   is a reasonable thing to do.
 

> One thing that is raising my own level of concern quite a bit is the
> apparent portability issues.
   That's understood.

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991


Re: uuid type for postgres

From
Josh Berkus
Date:
Nathan,

> Take a look at the version 3 or version 5 UUIDs.  They essentially do this.
> The hash isn't reversable, but rather recreatable.  Seems that if it
> were reversable, it would be compression, not a hash.  Anyway.

Hmmm, yes, true.

> You're assuming though that you want to leak this information.  If you do,
> or you don't care, go ahead with the v3 or v5 uuid creator.
>
> Or use v1 if you want mac/time based uuids, or v4 for random uuids.
>
> The convenient thing about using the uuid library is that all of these
> mechanisms result in the same type, and are thus compatible.

That's good, it gives users options.  And I can see why you don't want to 
re-create the functionality in PG code, it's probably pretty large.

> > I've never understood the complex measures which
> > application developers take to create "universal" IDs.
>
> Different applications have different goals.  Unguessability might
> be important in some contexts.

Well, IMHO, if the unguessability of UUIDs is an important part of your spec, 
you have some design problems.   But it's not my job to lock up the 
foot-guns, just to make sure they come with warnings ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
Josh Berkus
Date:
Nathan,

> > - linking against libuuid is fine for a contrib/ extension, but no
> > good for a built-in type.  A real uuid would have to do a proper
> > independent implementation of uuid creation within pgsql.
>
> Why?

I think the issue is portability.  Remember that this type needs to work on 
Windows as well as all POSIX platforms and AIX.

> I'm not sure what the default build instructions are, but i've got
> the following on my linux box...

> Quite a list.  I wonder what readline is doing there.

Readline is for PSQL command completion and history.  As for the rest, they 
are *optional* modules that apparently your RPM builder chose to include; I 
build from source and my only dependencies are bison, flex, gcc and perl.

> It would appear that linking against libraries is just fine, I don't
> see that uuid is any different.

Is it portable to all platforms we support?  How does it generate its machine 
identifier?

One of the differences between an add-in and core code is support for all 
PostgreSQL platforms.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
nathan wagner
Date:
On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote:

> I think the issue is portability.  Remember that this type needs to work on 
> Windows as well as all POSIX platforms and AIX.

I had forgotten about windows.  I'll see to what extent the library i'm
using is portable to windows.

> > It would appear that linking against libraries is just fine, I don't
> > see that uuid is any different.
> 
> Is it portable to all platforms we support?

I don't know yet.  It could be made to be so though.  I don't have any way
to compile on windows, so someone would have to be willing to try that.
Same with AIX.  If someone with an AIX box could download and test
it out I would appreciate a compatibility report.

> How does it generate its machine identifier?

No idea.  Does it matter?  Not having to fret this kind of detail is
the advantage of using someone else's library.

> One of the differences between an add-in and core code is support for all 
> PostgreSQL platforms.

So would including my type in contrib be possible then?  To me, that
is perhaps more important than making it a core type, though i think
there should be a core UUID type as well.

I'm testing out elog().  I'll post a new version later today.  There really
won't be any new functionality, it's just code clean up.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Patrick Welche
Date:
On Wed, Sep 07, 2005 at 09:45:17AM -0700, Josh Berkus wrote:
> > Nathan wrote:
> 
> > Quite a list.  I wonder what readline is doing there.
> 
> Readline is for PSQL command completion and history.  As for the rest, they 
> are *optional* modules that apparently your RPM builder chose to include; I 
> build from source and my only dependencies are bison, flex, gcc and perl.

Still seems odd to me: I would expect psql to have readline, not postgres.

Cheers,

Patrick


Re: uuid type for postgres

From
nathan wagner
Date:
On Wed, Sep 07, 2005 at 09:48:08AM -0700, josh@agliodbs.com wrote:

> That's good, it gives users options.  And I can see why you don't want to 
> re-create the functionality in PG code, it's probably pretty large.

It would also be something else that would have to be maintained and
debugged.  Which i would be starting from scratch.  The OSSP
UUID library is at version 1.3.  From this I infer that he has spent
some time maintaining and debugging it.  The less duplication of
code and effort the better.

There may be a problem with using the library as a core type though,
due to windows portability concerns.  That said, it would probably
be easier to port the underlying library than to start from scratch.

> Well, IMHO, if the unguessability of UUIDs is an important part of your spec, 
> you have some design problems.

I was thinking of cryptographic applications.  Then, to use an
example from another domain, initial TCP sequence numbers should
be random (i.e. unguessable).  A problem with TCP perhaps.  It's
been a while since i've read over my copy of _Applied Cryptography_,
but I seem to recall that unguessable numbers were sometimes useful.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Josh Berkus
Date:
Nathan,

> I was thinking of cryptographic applications.  Then, to use an
> example from another domain, initial TCP sequence numbers should
> be random (i.e. unguessable).  A problem with TCP perhaps.  It's
> been a while since i've read over my copy of _Applied Cryptography_,
> but I seem to recall that unguessable numbers were sometimes useful.

Yeah, the problem is what it does to your data.   If there's no analyzable
correspondence between the key and the server/table/row it attaches to,
then you have no way to detect if the key and the data have become
disassociated.

This is, btw, a problem with conventional SERIALs as well.   "unguessable"
UUIDs just compound the problem by adding additional variables (the server
and the table), as well as generally leading to applications which depend
*entirely* on UUID-based integrity and thus can't cope with any failures
in UUID generation or transmission.

I'm also a little baffled to come up with any real application where making
an id number for most tables "unguessable" would provide any kind of real
protection not far better provided by other means.   For your "users"
table, sure, but that's a very special case.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
nathan wagner
Date:
I have made a new version, and made a web page for it.

http://granicus.if.org/~nw/uuid/

Given the statement that it won't be accepted for contrib or core
unless it compiles on windows, I guess I won't really have anything
further to say on the topic.  I don't have any way to compile on windows,
so it's not an obstacle I can readily address.

Comments for improvements are welcome.  Other sorts of comments on the code are
welcome as well.  I hope someone else will find it useful.

I won't post any further update/release notices to the list, they're
not really on topic, I posted these two so that folks could look at the code
i was proposing if they wished.

-- 
Nathan Wagner


Re: uuid type for postgres

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> One of the differences between an add-in and core code is support for all 
> PostgreSQL platforms.

These days, things won't get into contrib either if they don't work on
all the buildfarm machines.
        regards, tom lane


Re: uuid type for postgres

From
Tom Lane
Date:
nathan wagner <nw@hydaspes.if.org> writes:
> On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote:
>> How does it generate its machine identifier?

> No idea.  Does it matter?  Not having to fret this kind of detail is
> the advantage of using someone else's library.

It absolutely matters, because without knowing that we can have no idea
what sort of portability issues we might face with it.  I don't buy the
"not having to fret" line in the least.
        regards, tom lane


Re: uuid type for postgres

From
"Jim C. Nasby"
Date:
On Wed, Sep 07, 2005 at 06:35:51PM +0000, nathan wagner wrote:
> I have made a new version, and made a web page for it.
> 
> http://granicus.if.org/~nw/uuid/
> 
> Given the statement that it won't be accepted for contrib or core
> unless it compiles on windows, I guess I won't really have anything
> further to say on the topic.  I don't have any way to compile on windows,
> so it's not an obstacle I can readily address.
> 
> Comments for improvements are welcome.  Other sorts of comments on the code are
> welcome as well.  I hope someone else will find it useful.
> 
> I won't post any further update/release notices to the list, they're
> not really on topic, I posted these two so that folks could look at the code
> i was proposing if they wished.

Rather than just give up on it, why not create a project in pgFoundry? I
don't really see why this *has* to be in core (or contrib) anyway.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: uuid type for postgres

From
Tom Lane
Date:
Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> Still seems odd to me: I would expect psql to have readline, not postgres.

That's because we generate just one LIBS list and use it for all the
executables we build.  Autoconf makes it a bit difficult to do
otherwise.

There is an option in the linux linker to suppress ldd references to
shared libraries that aren't actually used by the program, but when
we tried to use it we found out it was pretty buggy (see archives).
        regards, tom lane


Re: uuid type for postgres

From
Josh Berkus
Date:
Nathan,

> http://granicus.if.org/~nw/uuid/
>
> Given the statement that it won't be accepted for contrib or core
> unless it compiles on windows, I guess I won't really have anything
> further to say on the topic.  I don't have any way to compile on
> windows, so it's not an obstacle I can readily address.

Wanna start a pgFoundry project so that someone *else* can do the Windows 
version, if they want?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: uuid type for postgres

From
Alvaro Herrera
Date:
On Wed, Sep 07, 2005 at 06:48:41PM +0100, Patrick Welche wrote:
> On Wed, Sep 07, 2005 at 09:45:17AM -0700, Josh Berkus wrote:
> > > Nathan wrote:
> > 
> > > Quite a list.  I wonder what readline is doing there.
> > 
> > Readline is for PSQL command completion and history.  As for the rest, they 
> > are *optional* modules that apparently your RPM builder chose to include; I 
> > build from source and my only dependencies are bison, flex, gcc and perl.
> 
> Still seems odd to me: I would expect psql to have readline, not postgres.

It's an issue with the linker.  I seem to remember that 8.1 uses the
--as-needed flag to ld, which means it will only link the libraries that
each executable actually uses.

Hmm, poking my build, this doesn't seem to be the case:

$ ldd `which postgres`       linux-gate.so.1 =>  (0xffffe000)       libz.so.1 => /usr/lib/libz.so.1 (0x41c54000)
libreadline.so.5=> /lib/libreadline.so.5 (0xb7f51000)       libcrypt.so.1 => /lib/tls/libcrypt.so.1 (0xb7f23000)
libresolv.so.2=> /lib/tls/libresolv.so.2 (0x41c6a000)       libnsl.so.1 => /lib/tls/libnsl.so.1 (0x41c25000)
libdl.so.2=> /lib/tls/libdl.so.2 (0x4118f000)       libm.so.6 => /lib/tls/libm.so.6 (0x41153000)       libc.so.6 =>
/lib/tls/libc.so.6(0x41019000)       libncurses.so.5 => /lib/libncurses.so.5 (0xb7ee0000)       /lib/ld-linux.so.2
(0x41000000)

This is a fairly recent 8.1, maybe post-beta1.  Not sure what happened.
I certainly don't see the --as-needed in LDFLAGS:

$ pg_config  | grep LDFLA
LDFLAGS = -Wl,-rpath,/pg/install/00orig/lib
LDFLAGS_SL = 

-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han
perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)


Re: uuid type for postgres

From
Bob Ippolito
Date:
On Sep 7, 2005, at 10:04 AM, nathan wagner wrote:

> On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote:
>
>
>> I think the issue is portability.  Remember that this type needs  
>> to work on
>> Windows as well as all POSIX platforms and AIX.
>>
>
> I had forgotten about windows.  I'll see to what extent the library  
> i'm
> using is portable to windows.

You don't need to use a non-standard library for a lot of platforms,  
you're pretty much guaranteed UUID support for (at least):
Linux/Darwin/Mac OS X: uuid_generate <uuid/uuid.h> (part of libc, at  
least on Darwin)
FreeBSD: uuid_create <uuid.h> (again, libc)
Windows: UuidCreate <Rpc.h> (link to Rpcrt4.dll, IIRC)

Where a native UUID generate is not available, you can include some  
implementation, but surely other platforms also include UUID  
implementations.

I think Windows portability is a non-issue here.

-bob



Re: uuid type for postgres

From
Andrew Dunstan
Date:

Alvaro Herrera wrote:

>
>This is a fairly recent 8.1, maybe post-beta1.  Not sure what happened.
>I certainly don't see the --as-needed in LDFLAGS:
>
>$ pg_config  | grep LDFLA
>LDFLAGS = -Wl,-rpath,/pg/install/00orig/lib
>LDFLAGS_SL = 
>  
>


It was removed because it was very badly broken.

cheers

andrew


Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote:
> I'm also a little baffled to come up with any real application where making 
> an id number for most tables "unguessable" would provide any kind of real 
> protection not far better provided by other means.   For your "users" 
> table, sure, but that's a very special case.

It should never be the sole means of defense, however, it can be quite
effective at prevention.

For a rather simple example, consider a site that associates a picture
with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
it makes it ridiculously easy to write a script to pull all of the
pictures off the site. This can be bothersome, as the only type of
person who would do this, is the type of person with an illegitimate
motivation. I want the data to be easily and freely accessible as
specific objects, but I do not wish to provide an easy way of
dumping all of the data as a unit.

By making the picture identifier unguessable, it discourages the most
common sort of abuse of the system. If the number is unguessable, and
they can't access the directory as a listing, it will be sufficiently
difficult as to discourage the common abuser of the system. On the
other hand, an obviously guessable identifier may *encourage* the
common person to consider abuse.

In my case, it isn't only pictures. I don't want people pulling all
the data off the site as a dump, and using it how they wish, but I do
wish to make the data freely available, and easily accessible from a
web browser.

I'm not under the impression that it is impossible for a competent
person to dump my database. I am under the impression that the people
who would do such a thing, tend not to be intelligent, and will be
stopped by this simple tactic.

I could use any identifier at all. It could be a random sequence of
characters. The UUID appeals to me, as I don't have to re-invent
the concept. This use of UUID falls outside the scope of using it
to join tables. It's a handle that is associated with the data,
for external identification of the object.

I happen to also use it as an internal primary key for the objects
that fit this category, as I wish to benefit from the built-in merge
capabilities of UUID over SERIAL, and I don't currently see the
point of keeping a SERIAL and a UUID for each object. On the last
point, I did start to do this, but every single one of my queries
become more complicated as a result. Using the SERIAL for joining,
and the UUID for identifying a set of rows was becoming a little
ridiculous for my purposes. Using only the UUID to provide for
all my purposes is suiting my requirements for the cost of 1.5X
the size of a primary key index, 2X the size of a index for
a n to n relation mapping UUID to UUID, and an far less significant
increase in table space (much less than 1.5X, although I haven't
finished calculating it yet).

Not that everybody should rip out SERIAL and replace it with UUID,
but it really isn't that bad, and in some cases, such as mine,
I don't see the point of using both, and choose to instead allow
UUID to solve many of my concerns at the same time, with an
acceptable for me cost in database pages.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> This is a fairly recent 8.1, maybe post-beta1.  Not sure what happened.

This:

2005-05-07 01:48  neilc
* configure, configure.in: Revert the ld --as-needed patch. Thisbreaks Fedora Core 3, due to a strange interaction
betweenld,readline, termcap, and psql. The symptom is psql failing with thiserror on startup:    symbol lookup error:
/usr/lib64/libreadline.so.4:undefinedsymbol: BCI'm still trying to find the best way to solve this, but in themean time
I'mreverting the patch in order to unbreak FC3.
 

        regards, tom lane


Re: uuid type for postgres

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>> I'm also a little baffled to come up with any real application where making
>> an id number for most tables "unguessable" would provide any kind of real
>> protection not far better provided by other means.   For your "users"
>> table, sure, but that's a very special case.

> It should never be the sole means of defense, however, it can be quite
> effective at prevention.

> For a rather simple example, consider a site that associates a picture
> with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> it makes it ridiculously easy to write a script to pull all of the
> pictures off the site. This can be bothersome, as the only type of
> person who would do this, is the type of person with an illegitimate
> motivation. I want the data to be easily and freely accessible as
> specific objects, but I do not wish to provide an easy way of
> dumping all of the data as a unit.

I don't think security through obscurity is a good reason to add something
to PostgreSQL. Either make the pictures accessible to everyone, or put
some other security mechanism in place. As far as pictures, you could
certainly used the SHA1 checksum as your primary key, which would satisfy
all your requirements.

> In my case, it isn't only pictures. I don't want people pulling all
> the data off the site as a dump, and using it how they wish, but I do
> wish to make the data freely available, and easily accessible from a
> web browser.

.htaccess? Cookies? Encrypted data? iptables? All of these are better
solutions than random filenames.

> Not that everybody should rip out SERIAL and replace it with UUID,
> but it really isn't that bad, and in some cases, such as mine,
> I don't see the point of using both, and choose to instead allow
> UUID to solve many of my concerns at the same time, with an
> acceptable for me cost in database pages.

We're doing something similar to UUIDs here - LSID (Life Science
Identifiers). But being as the whole point of LSIDs is to share
data, the "random" factor is not needed. If ever PG did do something
like UUID (and despite the paragraphs above, I could see a use for it),
I'd rather do something like picking from a predetermined range of values
at random and deplete the pool as you go rather than create large
values with a low (but finite) probability of collision.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509080939
https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMgQAwACgkQvJuQZxSWSsgaOQCgl2Mb0cY5NuaTVuIAGdYKXqNE
oQYAoJ5pm3xHcu3Wr382Jja59Ll1/YOP
=IeqX
-----END PGP SIGNATURE-----




Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
Greg, thanks for saying it... I was thinking the same thing.

Not that it really relates to the UUID data type inclusion discussion itself, but I think this application design and use case for UUID is an example of using a data type for the wrong purpose.  Application design-wise, security should be handled in a true *security* way, not by obfuscating a key (unless you're Microsoft :)).  And, as Greg said, why not use a SHA1 hash instead... it would be easier than writing a whole new type implementation.

I haven't yet gotten a chance to check out the code but plan to soon.


On 9/8/05, Greg Sabino Mullane <greg@turnstep.com> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>> I'm also a little baffled to come up with any real application where making
>> an id number for most tables "unguessable" would provide any kind of real
>> protection not far better provided by other means.   For your "users"
>> table, sure, but that's a very special case.

> It should never be the sole means of defense, however, it can be quite
> effective at prevention.

> For a rather simple example, consider a site that associates a picture
> with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> it makes it ridiculously easy to write a script to pull all of the
> pictures off the site. This can be bothersome, as the only type of
> person who would do this, is the type of person with an illegitimate
> motivation. I want the data to be easily and freely accessible as
> specific objects, but I do not wish to provide an easy way of
> dumping all of the data as a unit.

I don't think security through obscurity is a good reason to add something
to PostgreSQL. Either make the pictures accessible to everyone, or put
some other security mechanism in place. As far as pictures, you could
certainly used the SHA1 checksum as your primary key, which would satisfy
all your requirements.

> In my case, it isn't only pictures. I don't want people pulling all
> the data off the site as a dump, and using it how they wish, but I do
> wish to make the data freely available, and easily accessible from a
> web browser.

.htaccess? Cookies? Encrypted data? iptables? All of these are better
solutions than random filenames.

> Not that everybody should rip out SERIAL and replace it with UUID,
> but it really isn't that bad, and in some cases, such as mine,
> I don't see the point of using both, and choose to instead allow
> UUID to solve many of my concerns at the same time, with an
> acceptable for me cost in database pages.

We're doing something similar to UUIDs here - LSID (Life Science
Identifiers). But being as the whole point of LSIDs is to share
data, the "random" factor is not needed. If ever PG did do something
like UUID (and despite the paragraphs above, I could see a use for it),
I'd rather do something like picking from a predetermined range of values
at random and deplete the pool as you go rather than create large
values with a low (but finite) probability of collision.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509080939
https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMgQAwACgkQvJuQZxSWSsgaOQCgl2Mb0cY5NuaTVuIAGdYKXqNE
oQYAoJ5pm3xHcu3Wr382Jja59Ll1/YOP
=IeqX
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.
> I don't think security through obscurity is a good reason to add something
> to PostgreSQL. Either make the pictures accessible to everyone, or put
> some other security mechanism in place. As far as pictures, you could
> certainly used the SHA1 checksum as your primary key, which would satisfy
> all your requirements.

I think you missed two parts. The first is, that it isn't really an
attempt at security. The pictures are still freely available. There
is no need to block accesses.

The goal is to prevent a dump of the database. Nobody should be able to
trivially download all the pictures at one time.

I'm not sure what you are suggesting the SHA1 checksum be applied to.
Not the picture, for sure, as this would require that my tables store
a primary key, and a picture key, leading me right back to something
that seems a little ridiculous.

If you mean the SHA1 checksum of some constant secret password, and
the primary key, possibly SERIAL, then sure - but that's ignoring the
other benefits of UUID. If I'm going to use UUID, I might as well use
only UUID, and use the SHA1 checksum of the constant secret password,
and the UUID. So, the idea has merit - and was suggested by another
person - but it doesn't mean using the UUID alone, without all the
extra layers on top I would have to write, is the wrong choice. Why
would I go out of my way to jump through so many hoops, when I can
very easily, use the UUID?

You're asking me to do a lot, only so that you can point out that
'UUID' is not necessary. Heck, I could use the timestamp, and a
random key as two separate fields. There are always work arounds.

Is it good to use work arounds? Or a type that already works fine
for the purposes? :-)

> > In my case, it isn't only pictures. I don't want people pulling all
> > the data off the site as a dump, and using it how they wish, but I do
> > wish to make the data freely available, and easily accessible from a
> > web browser.
> .htaccess? Cookies? Encrypted data? iptables? All of these are better
> solutions than random filenames.

Nope. The files are publically accessible. No reason the secure their
access, and securing their access makes the whole system less efficient.

> > Not that everybody should rip out SERIAL and replace it with UUID,
> > but it really isn't that bad, and in some cases, such as mine,
> > I don't see the point of using both, and choose to instead allow
> > UUID to solve many of my concerns at the same time, with an
> > acceptable for me cost in database pages.
> We're doing something similar to UUIDs here - LSID (Life Science
> Identifiers). But being as the whole point of LSIDs is to share
> data, the "random" factor is not needed. If ever PG did do something
> like UUID (and despite the paragraphs above, I could see a use for it),
> I'd rather do something like picking from a predetermined range of values
> at random and deplete the pool as you go rather than create large
> values with a low (but finite) probability of collision.

I don't need the random factor. I only need it to be difficult to guess.

The timestamps have sufficient granularity, that they are effectively
difficult to guess, and a foreach loop would be expensive. If none of
the bits were random, I would still be happy.

If I had LSID support, and no UUID support, perhaps I'd use LSID
instead. I have no special attachment to UUID, other than it satisfies
my concerns and purposes in a convenient manner. I can merge my tables
without fiddling with the sequence definitions. I can hand out
external identifiers for specific objects which are effectively
unguessable. I can avoid having two primary keys, and use only the
one. Any type that satisfied these concerns would make me happy.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
Mark,

I think what Greg suggested was sha1(number) as the key instead of requiring uuid as the key... it would perform the same function as far as you r use case is concerned.

As a similar example (using MD5):

CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
CREATE TABLE your_tbl (
    your_key          VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
    your_picture      BYTEA NOT NULL,
    PRIMARY KEY (your_key));

INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');

select * from your_tbl ;
             your_key             |  your_picture
----------------------------------+-----------------
 c4ca4238a0b923820dcc509a6f75849b | some bytea data

same functionality from your standpoint.



On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.
> I don't think security through obscurity is a good reason to add something
> to PostgreSQL. Either make the pictures accessible to everyone, or put
> some other security mechanism in place. As far as pictures, you could
> certainly used the SHA1 checksum as your primary key, which would satisfy
> all your requirements.

I think you missed two parts. The first is, that it isn't really an
attempt at security. The pictures are still freely available. There
is no need to block accesses.

The goal is to prevent a dump of the database. Nobody should be able to
trivially download all the pictures at one time.

I'm not sure what you are suggesting the SHA1 checksum be applied to.
Not the picture, for sure, as this would require that my tables store
a primary key, and a picture key, leading me right back to something
that seems a little ridiculous.

If you mean the SHA1 checksum of some constant secret password, and
the primary key, possibly SERIAL, then sure - but that's ignoring the
other benefits of UUID. If I'm going to use UUID, I might as well use
only UUID, and use the SHA1 checksum of the constant secret password,
and the UUID. So, the idea has merit - and was suggested by another
person - but it doesn't mean using the UUID alone, without all the
extra layers on top I would have to write, is the wrong choice. Why
would I go out of my way to jump through so many hoops, when I can
very easily, use the UUID?

You're asking me to do a lot, only so that you can point out that
'UUID' is not necessary. Heck, I could use the timestamp, and a
random key as two separate fields. There are always work arounds.

Is it good to use work arounds? Or a type that already works fine
for the purposes? :-)

> > In my case, it isn't only pictures. I don't want people pulling all
> > the data off the site as a dump, and using it how they wish, but I do
> > wish to make the data freely available, and easily accessible from a
> > web browser.
> .htaccess? Cookies? Encrypted data? iptables? All of these are better
> solutions than random filenames.

Nope. The files are publically accessible. No reason the secure their
access, and securing their access makes the whole system less efficient.

> > Not that everybody should rip out SERIAL and replace it with UUID,
> > but it really isn't that bad, and in some cases, such as mine,
> > I don't see the point of using both, and choose to instead allow
> > UUID to solve many of my concerns at the same time, with an
> > acceptable for me cost in database pages.
> We're doing something similar to UUIDs here - LSID (Life Science
> Identifiers). But being as the whole point of LSIDs is to share
> data, the "random" factor is not needed. If ever PG did do something
> like UUID (and despite the paragraphs above, I could see a use for it),
> I'd rather do something like picking from a predetermined range of values
> at random and deplete the pool as you go rather than create large
> values with a low (but finite) probability of collision.

I don't need the random factor. I only need it to be difficult to guess.

The timestamps have sufficient granularity, that they are effectively
difficult to guess, and a foreach loop would be expensive. If none of
the bits were random, I would still be happy.

If I had LSID support, and no UUID support, perhaps I'd use LSID
instead. I have no special attachment to UUID, other than it satisfies
my concerns and purposes in a convenient manner. I can merge my tables
without fiddling with the sequence definitions. I can hand out
external identifiers for specific objects which are effectively
unguessable. I can avoid having two primary keys, and use only the
one. Any type that satisfied these concerns would make me happy.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: uuid type for postgres

From
mark@mark.mielke.cc
Date:
On Thu, Sep 08, 2005 at 12:02:54PM -0400, Jonah H. Harris wrote:
> I think what Greg suggested was sha1(number) as the key instead of requiring 
> uuid as the key... it would perform the same function as far as you r use 
> case is concerned.

I'm sure he meant something like this. But I am still failing to see
how the below is any better than UUID. UUID is 16 bytes instead of
4+32=36. UUID doesn't require synchronization of the SEQUENCE
configuration (INCREMENT 100 START <SITE-NUMBER> as suggested by
another person) when working with tables at multiple sites that will
be merged together.

So this would leave having the primary key for the row being SERIAL,
and naming the picture as the MD5/SHA1 using data only from constant
strings, and the row data, the only part of the row data remaining
constant being the primary key. I agreed this would work, and enhanced
this by copying a trick from the SASL people where the key would be
concatenated with a constant secret string to further prevent people
from guessing how to crack the numbering scheme, but it means I have
to jump through hoops, and it doesn't solve the multisite / joining
tables after the fact, issue. I can't take my picture databases from
two separate sites, and merge them together nightly without jumping
through hoops. I might have a site in New York, and a site in Toronto.
I don't want to obtain sequence numbers remotely between sites,
forcing a real-time dependency between the sites.

I'm sure we could debate about alternatives for a long, long time. :-)
In the case I describe above, I could have a "ORIGINAL SITE" field,
and include that in a multi-column primary key.

I stubbornly wish to use a type which doesn't require alternatives, or
messy complications to the many SQL queries I will be performing.  I
want it to work out of the box, without any magic. UUID does exactly
what I want, and perhaps more than I need. Most importantly, it does
exactly what I want. I don't care at all that it does more than I need,
as something less isn't available or standard at the moment. UUID is
standard.

You guys are amusing me - not in an insulting way, but by making me
think outside my own conclusions. In the end, though, although there
have been arguments that have made me doubt my decision (1.5X the
number of index pages required over a SERIAL primary key, 2X the
number of index pages required for a multi-column primary key used in
a join table), for the most part, I've become more confident with my
decision. There's nothing wrong with what I'm doing. Yes, there are
other ways to do it - but these would only increase the complexity of
my application, with an insufficient return for this sacrifice. My
application is quite simple as is, using UUID as my preferred type
for objects that will be accessed on their own through external
interfaces.

Cheers,
mark


> As a similar example (using MD5):
> 
> CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
> CREATE TABLE your_tbl (
> your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
> your_picture BYTEA NOT NULL,
> PRIMARY KEY (your_key));
> 
> INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
> 
> select * from your_tbl ;
> your_key | your_picture
> ----------------------------------+-----------------
> c4ca4238a0b923820dcc509a6f75849b | some bytea data
> 
> same functionality from your standpoint.

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: uuid type for postgres

From
"Jonah H. Harris"
Date:
>From what you said:

> I agreed this would work, and enhanced
> this by copying a trick from the SASL people where the key would be
> concatenated with a constant secret string to further prevent people
> from guessing how to crack the numbering scheme

under definition of security, "Something that gives or assures safety"

Firstly, the use case is a security-related thing, i.e. it RESTRICTS someone from doing something (mass dump of data) by introducing a SECURITY MEASURE (obfuscating a key).  Enough about that discussion.

Second, while you did mention a distributed environment as an advantage of using UUID, there was no mention of distributed sites (New York, Toronto) in your actual use case; so don't expect us to be psychic.

Third, if you feel that you need UUID, by all means, USE IT!  This topic, IMHO, is now *WAY* out of scope.  You have the information from Tom et al and if you want to go ahead with a new PostgreSQL uuid type then develop it, test it, and submit it for 8.2.

Lastly, people are here to help you, not to argue with you (most of the time).  No one told you not to use it, they just offered alternatives and discussion on whether it should be added to core.

You've made the case that it would be useful to have, so go forth and develop it.  I don't see how there can be much more discussion here.



On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
On Thu, Sep 08, 2005 at 12:02:54PM -0400, Jonah H. Harris wrote:
> I think what Greg suggested was sha1(number) as the key instead of requiring
> uuid as the key... it would perform the same function as far as you r use
> case is concerned.

I'm sure he meant something like this. But I am still failing to see
how the below is any better than UUID. UUID is 16 bytes instead of
4+32=36. UUID doesn't require synchronization of the SEQUENCE
configuration (INCREMENT 100 START <SITE-NUMBER> as suggested by
another person) when working with tables at multiple sites that will
be merged together.

So this would leave having the primary key for the row being SERIAL,
and naming the picture as the MD5/SHA1 using data only from constant
strings, and the row data, the only part of the row data remaining
constant being the primary key. I agreed this would work, and enhanced
this by copying a trick from the SASL people where the key would be
concatenated with a constant secret string to further prevent people
from guessing how to crack the numbering scheme, but it means I have
to jump through hoops, and it doesn't solve the multisite / joining
tables after the fact, issue. I can't take my picture databases from
two separate sites, and merge them together nightly without jumping
through hoops. I might have a site in New York, and a site in Toronto.
I don't want to obtain sequence numbers remotely between sites,
forcing a real-time dependency between the sites.

I'm sure we could debate about alternatives for a long, long time. :-)
In the case I describe above, I could have a "ORIGINAL SITE" field,
and include that in a multi-column primary key.

I stubbornly wish to use a type which doesn't require alternatives, or
messy complications to the many SQL queries I will be performing.  I
want it to work out of the box, without any magic. UUID does exactly
what I want, and perhaps more than I need. Most importantly, it does
exactly what I want. I don't care at all that it does more than I need,
as something less isn't available or standard at the moment. UUID is
standard.

You guys are amusing me - not in an insulting way, but by making me
think outside my own conclusions. In the end, though, although there
have been arguments that have made me doubt my decision (1.5X the
number of index pages required over a SERIAL primary key, 2X the
number of index pages required for a multi-column primary key used in
a join table), for the most part, I've become more confident with my
decision. There's nothing wrong with what I'm doing. Yes, there are
other ways to do it - but these would only increase the complexity of
my application, with an insufficient return for this sacrifice. My
application is quite simple as is, using UUID as my preferred type
for objects that will be accessed on their own through external
interfaces.

Cheers,
mark


> As a similar example (using MD5):
>
> CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
> CREATE TABLE your_tbl (
> your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
> your_picture BYTEA NOT NULL,
> PRIMARY KEY (your_key));
>
> INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
>
> select * from your_tbl ;
> your_key | your_picture
> ----------------------------------+-----------------
> c4ca4238a0b923820dcc509a6f75849b | some bytea data
>
> same functionality from your standpoint.

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/




--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: uuid type for postgres

From
Bob Ippolito
Date:
One reason to use a UUID type over a naively stored hash for this purpose is that it takes up half the space as naively stored MD5 and 40% of the space as naively stored SHA1.  Granted, it's easy enough to pack them, but packed MD5 does have the same storage requirements as UUID and it won't be quite as convenient to use as a native UUID type out of the box.

-bob

On Sep 8, 2005, at 9:02 AM, Jonah H. Harris wrote:

Mark,

I think what Greg suggested was sha1(number) as the key instead of requiring uuid as the key... it would perform the same function as far as you r use case is concerned.

As a similar example (using MD5):

CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
CREATE TABLE your_tbl (
    your_key          VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
    your_picture      BYTEA NOT NULL,
    PRIMARY KEY (your_key));

INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');

select * from your_tbl ;
             your_key             |  your_picture
----------------------------------+-----------------
 c4ca4238a0b923820dcc509a6f75849b | some bytea data

same functionality from your standpoint.



On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.
> I don't think security through obscurity is a good reason to add something
> to PostgreSQL. Either make the pictures accessible to everyone, or put
> some other security mechanism in place. As far as pictures, you could
> certainly used the SHA1 checksum as your primary key, which would satisfy
> all your requirements.

I think you missed two parts. The first is, that it isn't really an
attempt at security. The pictures are still freely available. There
is no need to block accesses.

The goal is to prevent a dump of the database. Nobody should be able to
trivially download all the pictures at one time.

I'm not sure what you are suggesting the SHA1 checksum be applied to.
Not the picture, for sure, as this would require that my tables store
a primary key, and a picture key, leading me right back to something
that seems a little ridiculous.

If you mean the SHA1 checksum of some constant secret password, and
the primary key, possibly SERIAL, then sure - but that's ignoring the
other benefits of UUID. If I'm going to use UUID, I might as well use
only UUID, and use the SHA1 checksum of the constant secret password,
and the UUID. So, the idea has merit - and was suggested by another
person - but it doesn't mean using the UUID alone, without all the
extra layers on top I would have to write, is the wrong choice. Why
would I go out of my way to jump through so many hoops, when I can
very easily, use the UUID?

You're asking me to do a lot, only so that you can point out that
'UUID' is not necessary. Heck, I could use the timestamp, and a
random key as two separate fields. There are always work arounds.

Is it good to use work arounds? Or a type that already works fine
for the purposes? :-)

> > In my case, it isn't only pictures. I don't want people pulling all
> > the data off the site as a dump, and using it how they wish, but I do
> > wish to make the data freely available, and easily accessible from a
> > web browser.
> .htaccess? Cookies? Encrypted data? iptables? All of these are better
> solutions than random filenames.

Nope. The files are publically accessible. No reason the secure their
access, and securing their access makes the whole system less efficient.

> > Not that everybody should rip out SERIAL and replace it with UUID,
> > but it really isn't that bad, and in some cases, such as mine,
> > I don't see the point of using both, and choose to instead allow
> > UUID to solve many of my concerns at the same time, with an
> > acceptable for me cost in database pages.
> We're doing something similar to UUIDs here - LSID (Life Science
> Identifiers). But being as the whole point of LSIDs is to share
> data, the "random" factor is not needed. If ever PG did do something
> like UUID (and despite the paragraphs above, I could see a use for it),
> I'd rather do something like picking from a predetermined range of values
> at random and deplete the pool as you go rather than create large
> values with a low (but finite) probability of collision.

I don't need the random factor. I only need it to be difficult to guess.

The timestamps have sufficient granularity, that they are effectively
difficult to guess, and a foreach loop would be expensive. If none of
the bits were random, I would still be happy.

If I had LSID support, and no UUID support, perhaps I'd use LSID
instead. I have no special attachment to UUID, other than it satisfies
my concerns and purposes in a convenient manner. I can merge my tables
without fiddling with the sequence definitions. I can hand out
external identifiers for specific objects which are effectively
unguessable. I can avoid having two primary keys, and use only the
one. Any type that satisfied these concerns would make me happy.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/