Thread: uuid type for postgres
I have been in need of a uuid type and ran across the pguuid download by Xiongjian (Mike) Wang. This wasn't really useful to me for two reasons: first, it is GPLed and I would prefer a more liberal license, secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a SIOCGETIFHWADDR (? i think, i can get the exact name if you want it) ioctl() under darwin. While I could dike out the code that calls it, that seems like a suboptimal solution. So after a bit of poking around the interweb i ran across Ralf Engelschall's ossp uuid library. This compiled with minimal effort on mac os. Some reading, and an evening later, i've made a server plugin with supporting SQL that implements an 'ossp_uuid' type. Now i have four questions: 1: Is it feasible for this to be included in the contrib section of the regular download? The uuid library is a "notice of copyright" style license, and I am willing to put my own code into the public domain. 2: Would just calling the type 'uuid' be better than 'ossp_uuid'? It's certainly a nicer name. 3: Would it be possible to include such a type as a postgres extension to the usual SQL types. It seems to me that having an officially supported type would be better than a user contributed type on the grounds that you could then rely on it being avaiable if postgres was. In particular, installing it as an extension would require the cooperation of the DBA, which may be infeasible in some environments. -- Nathan Wagner
Hey Nathan. I've started to make heavy use of pguuid. It had several bugs in it that required fixing before I could use it. I have no preference on pguuid. It was the only such PostgreSQL project I found that provided a UUID type. I'd be willing to work with you on ensuring that such a patch is in an acceptable form for the core developers to accept, and help maintain it. My personal preference is that the type be called 'uuid' and accepted into the core. Tom? Is their history on this issue? Should it remain an extension, or can be get it built-in? I find the UUID concept more useful than the SERIAL concept for data that must now, or may in the future, be stored on multiple servers. For example, submitting transactions to two different sites, where the sites periodically synchronize up with each other, requiring a relatively safe 'merge'. SERIAL can't do this at all. UUID can do this as long as the rows are split into domains appropriately, such as ensuring that unique column constraints do not need to be enforced across sites. UUID is also desirable over SERIAL for a unique identifier that will be accessed from an outside source directly. Exposing the SERIAL number to the outside isn't appealing for reasons including the predictability of the sequence, the size of the database is exposed, the internel implementation is exposed, and so on. I suggest that UUID be recommended in place of SERIAL for certain classes of applications, and that it therefore belongs in the core. UUID and SERIAL can be used together (although, once you have a UUID, it may not be useful to also have a SERIAL). Ok, you can all tell me I'm wrong now... :-) mark On Tue, Sep 06, 2005 at 01:50:57PM +0000, nathan wagner wrote: > > I have been in need of a uuid type and ran across the pguuid download > by Xiongjian (Mike) Wang. This wasn't really useful to me for two > reasons: first, it is GPLed and I would prefer a more liberal license, > secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a > SIOCGETIFHWADDR (? i think, i can get the exact name if you want it) > ioctl() under darwin. > > While I could dike out the code that calls it, that seems like a suboptimal > solution. So after a bit of poking around the interweb i ran across > Ralf Engelschall's ossp uuid library. This compiled with minimal > effort on mac os. Some reading, and an evening later, i've made > a server plugin with supporting SQL that implements an 'ossp_uuid' > type. > > Now i have four questions: > > 1: Is it feasible for this to be included in the contrib section of > the regular download? The uuid library is a "notice of copyright" style > license, and I am willing to put my own code into the public domain. > > 2: Would just calling the type 'uuid' be better than 'ossp_uuid'? It's > certainly a nicer name. > > 3: Would it be possible to include such a type as a postgres extension > to the usual SQL types. It seems to me that having an officially > supported type would be better than a user contributed type on the grounds > that you could then rely on it being avaiable if postgres was. > In particular, installing it as an extension would require the cooperation > of the DBA, which may be infeasible in some environments. > > -- > Nathan Wagner > -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc writes: > My personal preference is that the type be called 'uuid' and accepted > into the core. Tom? Is their history on this issue? > Should it remain an extension, or can be get it built-in? There is pretty much zero chance of being accepted into contrib, much less core, if the code isn't pure BSD license. (We used to be a bit lax about that, but are trying to clean things up.) Nathan's comment about starting with code that was sorta-BSD-with-advertising alarmed me. However, you can certainly set up a pgfoundry project with code of any license you like. That would be a good starting point anyway --- at some point you'd have to convince people that there's enough demand for the feature to justify putting it in core, and being able to point to X number of downloads from pgfoundry would go a long way towards making that case. regards, tom lane
On Tue, Sep 06, 2005 at 11:38:57AM -0400, tgl@sss.pgh.pa.us wrote: > > There is pretty much zero chance of being accepted into contrib, much > less core, if the code isn't pure BSD license. Hmm. Here is the copyright and license portion of the readme... COPYRIGHT AND LICENSE Copyright (c) 2004-2005 Ralf S. Engelschall <rse@engelschall.com> Copyright (c) 2004-2005 The OSSP Project <http://www.ossp.org/> This file is part of OSSP uuid, a library for the generation of UUIDs which can found at http://www.ossp.org/pkg/lib/uuid/ Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, providedthat the above copyright notice and this permission notice appear in all copies. THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS ANDCOPYRIGHT HOLDERS AND THEIR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIALDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, ORTORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITYOF SUCH DAMAGE. Reading the postgres license from the faq... PostgreSQL Data Base Management System Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group Portions Copyright (c) 1994-1996 Regents of the University of California Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. These two licenses appear to be functionally identical. Both require notice of the copyright to be included, and both allow use for any purpose. The wording of the requirement is even nearly identical. By way of disclaimer, I am not (yet) an attorney. > (We used to be a bit > lax about that, but are trying to clean things up.) Nathan's comment > about starting with code that was sorta-BSD-with-advertising alarmed me. Perhaps i was a bit lax in my wording. I don't read the license as requiring any advertising at run time, just as a requirement that the copyright notice be kept with the source code. That is, an identical requirement to the one that postgres itself uses. -- Nathan Wagner
Mark, > I suggest that UUID be recommended in place of SERIAL for certain > classes of applications, and that it therefore belongs in the core. > UUID and SERIAL can be used together (although, once you have a UUID, > it may not be useful to also have a SERIAL). I think that, if you want to push a refactored UUID type for PostgreSQL 8.2, that you'd better separate your database design arguments from your inclusion arguments. For example, you might get my agreement that it would be useful to have a UUID as a core type; you would *never* get my agreement to recommend using UUID to newbies. I have seen *far* too many abuses of UUIDs in really bad database design. People who use them should be experienced enough to know what they're doing. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
I agree with Josh on the UUID type, it gets abused far too often and (IMHO) isn't widely enough used to belong in the core. Couldn't you just fix the problem in pguuid rather than write a whole new type?
On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:
Mark,
> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a UUID,
> it may not be useful to also have a SERIAL).
I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.
For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in really
bad database design. People who use them should be experienced enough to
know what they're doing.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote: > I agree with Josh on the UUID type, it gets abused far too often Out of curiosity, how does it get abused? It doesn't seem to me that it would be any more prone to abuse than any other type. > and (IMHO) isn't widely enough used to belong in the core. Seems like a self-fulfilling cycle. There isn't one in core or contrib (pguuid is not in contrib, IIRC), and so one doesn't get used. Since no one uses one, it's not in core or contrib. > Couldn't you just fix the > problem in pguuid rather than write a whole new type? Not sure which you you're addressing here, but I can't fix the license problem. Nor can I readily fix the lack of a mac ioctl() to get the hardware mac address. In any case, from an economy of effort view, I'd much rather maintain a postgres extension/interface to an existing uuid library, than fret the details of writing a uuid library myself. UUIDs themselves don't really interest me, I just happen to need one for my application. I'm basically done. I'll tar it up and post a link to an alpha version in a couple of hours. As far as i can tell, it works fine. I'm calling it alpha because i'm not doing proper error checking, and i'm not sure how to log or throw an error even if i detect one. Also, I have only compiled it on Mac OS 10.3, I'll want to test it on my linux box. -- Nathan Wagner
On Sep 6, 2005, at 12:57 PM, Jonah H. Harris wrote:
On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:Mark,
> I suggest that UUID be recommended in place of SERIAL for certain
> classes of applications, and that it therefore belongs in the core.
> UUID and SERIAL can be used together (although, once you have a UUID,
> it may not be useful to also have a SERIAL).
I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.
For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in really
bad database design. People who use them should be experienced enough to
know what they're doing.
I agree with Josh on the UUID type, it gets abused far too often and (IMHO) isn't widely enough used to belong in the core. Couldn't you just fix the problem in pguuid rather than write a whole new type?
This sounds like a strawman argument. People abuse a lot of types to do a lot of things they shouldn't be doing, adding types to the core isn't really going to change that much one way or the other.
If the documentation gives the user a good idea of when to use UUID and when not, I think it would be a good addition. Worst case, it could become part of contrib, so at least it ships with PostgreSQL with the same license. The GPL license of pguuid is a show-stopper for some (and seems like a silly choice for what it is and who it's for).
-bob
The only time I've seen someone use UUIDs in PostgreSQL is when they were converting from SQL Server. I've seen many bad data models using UUID that could've/should've used normal sequences for portability.
I look forward to seeing you're code.
Thanks!
I look forward to seeing you're code.
Thanks!
On 9/6/05, nathan wagner <nw@hydaspes.if.org> wrote:
On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:
> I agree with Josh on the UUID type, it gets abused far too often
Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.
> and (IMHO) isn't widely enough used to belong in the core.
Seems like a self-fulfilling cycle. There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used. Since no one uses one, it's not in core or contrib.
> Couldn't you just fix the
> problem in pguuid rather than write a whole new type?
Not sure which you you're addressing here, but I can't fix the license
problem. Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.
In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself. UUIDs themselves don't
really interest me, I just happen to need one for my application.
I'm basically done. I'll tar it up and post a link to an alpha version in a
couple of hours. As far as i can tell, it works fine. I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log or
throw an error even if i detect one. Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.
--
Nathan Wagner
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
On Tue, Sep 06, 2005 at 03:57:55PM -0400, Jonah H. Harris wrote: > I agree with Josh on the UUID type, it gets abused far too often and (IMHO) > isn't widely enough used to belong in the core. There is much in PostgreSQL from my perspective that falls under the category of 'most advanced open source SQL server in the world', and not at all in the category of 'widely used'. Unless the history of PostgreSQL isn't to be accepted in terms of direction for PostgreSQL, I don't see why such a useful building block shouldn't be supported by the core. It may not be widely used, because it doesn't exist. Not yet, anyways. pguuid is broken, remember? Which isn't to say that it should be in the core, or it must be in the core. It is to say, however, that I don't buy your arguments. Abused how? How can you judge widely used for something that doesn't exist? I'm not very newbie oriented. I think I tend to assume that anybody using PostgreSQL must know what they are doing. This may be very wrong of me to assume. I assume you all know what you are doing, for example. I'm curious as to what sort of abuses you and Josh are speaking about. I'd particularly like to know if you think I am abusing it, as I don't believe that I am. I would also be interested in knowing what you felt would be an abuse with UUID, that wouldn't be an abuse with SERIAL. Should we tell newbies not to use SERIAL? Perhaps we should. Do we? > Couldn't you just fix the > problem in pguuid rather than write a whole new type? I'll submit my patches to pguuid once I've had some more production use of it. For the little bit of code that it was, there were some pretty bad bugs. It was sometimes crashing, due to what appears to be a palloc() that was one byte short (it reliably crashed on me, the line is wrong as written, and after adding + 1, it no longer crashes at all). The operator definitions weren't correct, and the indices on columns of the type were not being preferred by the query planner (a good thing too - if they were used for any operator other than '=', the errors in the operator definitions would have caused very unexpected results). So yes, pguuid can be fixed. I'm not sure that pguuid makes a good model for a PostgreSQL extension, but it could be resurrected and used. (I think the project showed no real updates since 2003?) But - it's GPL, limiting its use with regard to PostgreSQL distribution, and it requires special compilation, which as Nathan found out, doesn't work on MacOS. I think the GPL reason alone is a compelling reason to create a new extension. Using a more portable UUID base library (either written from scratch, or re-used from some other place providing a compatible license) is a compelling reason to create a new extension. As to whether it should be in core - I think that a fully functional module could be widely used, especially in larger systems that are having difficulty spreading transactions across multiple machines. This is the domain that UUID shines in. I can write a transaction at my site without being very worried that it will collide with a transaction at your site. We don't have to be connected in real time. In my choice of use, I'm using them instead of SERIAL columns, as I wish to have more freedom merging production data with test data. I wish to continually import production data into my test environment, in a single direction. UUID will prevent conflicts from occurring. SERIAL cannot (although in theory, I could set my copy of the serial value to 1 billion or something hacky - but that doesn't scale in a simple fashion to having several test environments). My other preferred use, is to expose a handle on the data to the world in select circumstances. I don't want to give them a SERIAL column, as it lets them be able to predict what else they might have access to. I'm not using it to secure the data (other routines will do this), but I am using it to hide the data. Why should people accessing my system know how many records exist in my tables? Why should they be able to predict the next value? Why should they be able to relate the data, or mine my data in the case that I allow read to all? Yes, I can avoid exposing the UUID/SERIAL in most circumstances. They can get at most of the data through a name based path. For a few of my pieces of data, though, I want to provide a secondary means of accessing the data that does not require a path. Names cause problems, especially if the names contain UNICODE characters, or if the names are very long. Giving them a reliably unique handle of fixed length is highly desirable to me in these circumstances. Anyways, I'm using a fixed up pguuid right now, and getting along fine. Nathan, under MacOS, isn't - with or without my patch. And I believe he stated he had issues with the GPL license. Josh may be correct, that in terms of a position statement, my arguments are all over the map, and ignorant of newbies. I'm not a speaker, or a writer. I'm a technical person who was frustrated with SERIAL, happy with UUID, frustrated to find pguuid broken, happy that it was fixable, frustrated with having to compile and install it separately, happy that there is interest from others (Nathan) in attempting to have this building block worked on, and perhaps eventually added to the core. Personally, I'm not sure what the big opposition to UUID is all about. Even abused, it can do a better job than SERIAL at quadruple the storage requirements (not a big deal when considering PostgreSQL's record overhead). If it'll make it more acceptable, I'll write the UUID implementation myself, and release it under whatever license you find most beneficial to PostgreSQL. :-) Cheers, mark > On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote: > > > > Mark, > > > > > I suggest that UUID be recommended in place of SERIAL for certain > > > classes of applications, and that it therefore belongs in the core. > > > UUID and SERIAL can be used together (although, once you have a UUID, > > > it may not be useful to also have a SERIAL). > > > > I think that, if you want to push a refactored UUID type for PostgreSQL > > 8.2, that you'd better separate your database design arguments from your > > inclusion arguments. > > > > For example, you might get my agreement that it would be useful to have a > > UUID as a core type; you would *never* get my agreement to recommend using > > UUID to newbies. I have seen *far* too many abuses of UUIDs in really > > bad database design. People who use them should be experienced enough to > > know what they're doing. > > > > -- > > --Josh > > > > Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote: > On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote: > > I agree with Josh on the UUID type, it gets abused far too often > > Out of curiosity, how does it get abused? It doesn't seem to me > that it would be any more prone to abuse than any other type. A typical abuse, as I have seen it, is for OO coders on the front-end to turn the entire databse into what amounts to an associative array. It then becomes somewhere between difficult and impossible to get good performance, as UUID is the only thing useful as a PK/FK. > > and (IMHO) isn't widely enough used to belong in the core. > > Seems like a self-fulfilling cycle. There isn't one in core or > contrib (pguuid is not in contrib, IIRC), and so one doesn't get > used. Since no one uses one, it's not in core or contrib. In this case, not having it easy to get to is, IMHO, an *excellent* idea. The whole thing is a giant foot-gun. Just my $.02. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Sep 6, 2005, at 2:16 PM, nathan wagner wrote: > On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com > wrote: > >> I agree with Josh on the UUID type, it gets abused far too often >> > > Out of curiosity, how does it get abused? It doesn't seem to me > that it would be any more prone to abuse than any other type. > > >> and (IMHO) isn't widely enough used to belong in the core. >> > > Seems like a self-fulfilling cycle. There isn't one in core > or contrib (pguuid is not in contrib, IIRC), and so one doesn't > get used. Since no one uses one, it's not in core or contrib. > > >> Couldn't you just fix the >> problem in pguuid rather than write a whole new type? >> > > Not sure which you you're addressing here, but I can't fix the license > problem. Nor can I readily fix the lack of a mac ioctl() to get > the hardware mac address. It's not that hard to get a MAC address or serial number out of a Mac, the problem is I'm not going to bother writing that code for a GPL extension to PostgreSQL. There's a better way, anyhow. Darwin ships with UUID parse/generate/etc functions in its Libc (<uuid/ uuid.h>). FWIW, that code (the Theodore Ts'o UUID implementation) looks to be suitably licensed for PostgreSQL, and it actually it appears that someone has already written a PostgreSQL UUID type using this it... which is unfortunately LGPL'ed (why?!). -bob
On Tue, Sep 06, 2005 at 01:49:28PM -0700, David Fetter wrote: > On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote: > > On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote: > > > I agree with Josh on the UUID type, it gets abused far too often > > Out of curiosity, how does it get abused? It doesn't seem to me > > that it would be any more prone to abuse than any other type. > A typical abuse, as I have seen it, is for OO coders on the front-end > to turn the entire databse into what amounts to an associative array. > It then becomes somewhere between difficult and impossible to get good > performance, as UUID is the only thing useful as a PK/FK. Replace "UUID" with "SERIAL" in your above paragraph, and I think your argument is still true. Perhaps UUID makes it easier, as it crosses the table boundary - but somebody so inclined, can use *val() to define their own "database wide SERIAL identifier". I don't see anything intrinisic about SERIAL, UUID, or OO abstracting techniques that requires people to write inefficient code. People write inefficient code because they don't know any better. They don't properly look at how their data will be used, and what path the data will take to get from the tables on disk, to the user they are presenting the information to. If they had, in the case you describe, perhaps they'd be horrified. Or perhaps they would STILL not know any better. :-) Technically, the UUID is 4x the size of a SERIAL, or 2x the size of a SERIAL8. For this cost in terms of efficiency (table size, index size), you are purchasing the scaleability of being able to, at any point in the future, more easily (than SERIAL) merge your tables with other tables, and have a relatively opaque handle to give out to allow people to uniquely identify a collection of rows. For me, both are appealing. For others, one or the other may be more appealing. In terms of portability, perhaps (somebody else made this arguments). If other systems don't have a UUID type, you end screwed if you ever wished to ditch PostgreSQL. The reason I don't buy this argument, though, is that in my current application, I dumped being able to ditch PostgreSQL a long time ago. There are just too many neat things I can do with PostgreSQL, that I can't do with MySQL, and that I either can't, or don't want to do with Oracle. You're stuck with me now... :-) If people truly felt this way, there are many PostgreSQL extensions that should be phased out, rather than introduced. I'd like to point out the obvious - a UUID is just an identifier. As it isn't linked to the table, it doesn't require generation to be server-side. Any application can generate a UUID as two BIGINTs, and use these two BIGINTs as a primary key, and accomplish the same. Before I stumbled upon pguuid, and became interested in the rather impressive extension interfaces provided by PostgreSQL, I was considering doing this myself. The costs would have been more complicated queries, and limited practical use from the psql command line. I've become comfortable with the concept of a UUID over the last two or three years, as they have been a requirement for us to use for a product at my primary place of employment. We use a source management system with replicas at each site, that is synchronized periodically. We write software on top of this system to perform automated tasks at different sites (sometimes executing at the other sites), or that requires configuration information to be associated with the data stored in this source management system. In a previous product, we used names to access the data. This failed horribly when the names changed. We now frequently, and liberally use the UUID for the objects. This allows us to associate data with, access or manipulate the objects at any site, without ANY problem. To go back to a name based access method seems a step BACKWARDS. UUID definately has a place. It's only a question of the imagination of the user as to how abused, or how well used, it is. I think it deserves a spot in PostgreSQL, and that people would possibly use it more than they would SERIAL. Eventually, SERIAL wraps around. So you switch to SERIAL8. At the point that you have SERIAL8, you aren't worried terribly about disk space, and you realize there is usually no benefit at all to the numbers being ordered so closely. Why not pick a scheme that is based on time? Perhaps create a revision code field to deal with objects created simultaneously from the same source. And why not identify the source to prevent collisions from multiple sources? If we go from 8 bytes, to 16 bytes, we can encode all of this information neatly. Welcome, UUID. :-) mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc writes: > In my choice of use, I'm using them instead of SERIAL columns, as I > wish to have more freedom merging production data with test data. I > wish to continually import production data into my test environment, > in a single direction. UUID will prevent conflicts from occurring. > SERIAL cannot (although in theory, I could set my copy of the serial > value to 1 billion or something hacky - but that doesn't scale in a > simple fashion to having several test environments). Just do something like this for every sequence: ALTER SEQUENCE foo INCREMENT BY 100 And then choose a particular initial value for each server. I agree with the others that uuid seems to be overused in lots of cases where a simple serial would serve just as well. However I don't see why a good uuid type is any less of a valid addition to the set of data types than any of the others. inet, macaddr, all the geometric types, for example. Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty fundamental component of that whole ball of wax. A note in the documentation that it's designed for use as part of a multiple-database system like that might be helpful. -- greg
mark@mark.mielke.cc writes: > Personally, I'm not sure what the big opposition to UUID is all about. I don't see any "big opposition". People are simply questioning the idea whether it belongs in core PG. The reason we don't want to accept everything-and-the-kitchen-sink in core is that we have only limited manpower to maintain it. So you've got to justify that we should spend our effort here and not elsewhere. There's a fair amount of nearly unmaintained cruft in the core distro already (eg, the never-finished "line" datatype ... or the entire rtree index module ...) and a datatype that might be used by only a few people is a likely candidate to become an unmaintained backwater. And yet it's hard to get rid of stuff that's been there awhile. So one of the questions that's going to be asked is how useful/popular it's really going to be. One thing that is raising my own level of concern quite a bit is the apparent portability issues. Code that isn't completely portable is a huge maintainability problem; in particular, stuff that requires system-dependent behavior used nowhere else in Postgres is a real pain. It sounds like the UUID code expects to be able to get at the machine's MAC address, which suggests serious issues in (a) relying on not-too-standard APIs, (b) possible protection issues (will an unprivileged process be able to get at the MAC address?), and (c) ill-defined behavior on machines with more or less than one MAC address. Not to mention that MAC addresses aren't so unique as all that. The bottom line is that we're willing to listen, but it's not by any means a slam dunk to get this into the distribution. regards, tom lane
On Tue, Sep 06, 2005 at 05:31:43PM -0400, Greg Stark wrote: > Just do something like this for every sequence: > ALTER SEQUENCE foo INCREMENT BY 100 > And then choose a particular initial value for each server. *shudder* But you are right. That would work. :-) (I shudder from the maintenance head-ache - if I re-cloned the test database, and reset the sequence to the wrong value, I would have overlap) > I agree with the others that uuid seems to be overused in lots of cases where > a simple serial would serve just as well. Yes. In March I only used serial columns (actually, I originally used the hacky MySQL auto_increment type before I migrated the database to PostgreSQL). I found problems under actual usage with regard to testing, and with exposing the numbers to the users. I started by introducing the uuid along side the serial column, but shortly after realized that it was ridiculous. I effectively had two primary keys, with two unique indexes. Drop the serial column, and I'm left with one that does everything I want. I now used mixed serial and uuid. I retained serial columns for space reasons. Serial makes it more common for some of my table primary key indexes or even tables themselves (enumerated type tables) to fit in fewer pages, improving search time. Some tables may cluster rows more naturally on a serial column (would depend if the uuid type was sorted by embedded time stamp, or by literal byte array value - memcmp()). For the objects represented by a collection of rows, that I expose to external interfaces, however, I only use the uuid. My point in this long and winded set of paragraphs, is that although I agree that uuid might be overused in some situations, I think people may be currently underusing it in others. They're both ways of generating unique identifiers. Depending on the context, one will be better than the other. They may both work for most circumstances. > However I don't see why a good uuid type is any less of a valid > addition to the set of data types than any of the others. inet, > macaddr, all the geometric types, for example. I've never used geometric types, and don't see myself doing so any time soon. :-) > Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty > fundamental component of that whole ball of wax. A note in the documentation > that it's designed for use as part of a multiple-database system like that > might be helpful. Yep. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote: > I don't see any "big opposition". People are simply questioning the > idea whether it belongs in core PG. The reason we don't want to accept > everything-and-the-kitchen-sink in core is that we have only limited > manpower to maintain it. So you've got to justify that we should spend > our effort here and not elsewhere. There's a fair amount of nearly > ... > been there awhile. So one of the questions that's going to be asked is > how useful/popular it's really going to be. Sounds reasonable, and certainly no more than I expected. If Nathan hadn't raised the issue, it probably would have been a few months before I raised it myself. > One thing that is raising my own level of concern quite a bit is the > apparent portability issues. Code that isn't completely portable is a > huge maintainability problem; in particular, stuff that requires > system-dependent behavior used nowhere else in Postgres is a real pain. > It sounds like the UUID code expects to be able to get at the machine's > MAC address, which suggests serious issues in (a) relying on > not-too-standard APIs, (b) possible protection issues (will an > unprivileged process be able to get at the MAC address?), and (c) > ill-defined behavior on machines with more or less than one MAC address. > Not to mention that MAC addresses aren't so unique as all that. I'll try to prepare an answer for this. (I started to write a lot of information - but is it unverified from memory, and perhaps should be more authoritative before presented as truth) > The bottom line is that we're willing to listen, but it's not by any > means a slam dunk to get this into the distribution. Sounds good. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
As promised a link to the code. http://granicus.if.org/~nw/ossp_pg_uuid-0.1.tar.gz You'll also need Ralf Engelschall's uuid library, which mine is a postgres interface to. It's available at ftp://ftp.ossp.org/pkg/lib/uuid/uuid-1.3.0.tar.gz It probably has a few warts. I'm mainly posting it for suggestions, comments, and so we have something that isn't vaporware to argue about. :) -- Nathan Wagner
On Tue, Sep 06, 2005 at 05:54:34PM -0400, tgl@sss.pgh.pa.us wrote: > One thing that is raising my own level of concern quite a bit is the > apparent portability issues. I can't speak to the portability in general, but there is a PORTING file in the ossp uuid library that states OSSP uuid was already written with maximum portability in mind, so there should be no great effort required to get it runningon any Unix platform with a reasonable POSIX API. Additionally, the portability was tested by successfully buildingand running it on the following particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"): alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc) ia64-hpux11.23 (cc) ix86-debian2.2 (gcc, icc) ix86-debian3.0 (gcc) ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2 (gcc, icc) ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3 (cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc) On my end I managed to compile it with nothing more than a "configure", "make", followed by a "make install". > Code that isn't completely portable is a huge maintainability problem; in > particular, stuff that requires system-dependent behavior used nowhere > else in Postgres is a real pain. It sounds like the UUID code expects to > be able to get at the machine's MAC address, If the mac address isn't available, I believe it falls back on using a random 47 bit number with the 48th bit set to make the mac address fall within the multicast mac numberspace. You could also use a version 4 uuid, or derive a version 3 or 5 uuid from some other source. The better answer though, is these sort of questions are exactly why I would prefer to rely on someone else's library. Just as I basically trust that the folks maintaining postgres aren't going to munge my tables and destroy my data if i mess up a transaction and roll it back, because they've spent time thinking about just that sort of problem, I also (having worked with the code a bit now) trust the UUID folks to have thought about "just how do we make a unique number without centralized coordination?" "base on the mac address?" "what if we don't have one? or don't know it for some reason?" I assume here that the answer they came up with wasn't "oh, hell, just return a 42 then". The point being, that other people have already written a better uuid library than i am likely to, so, license permitting, let's use it. > The bottom line is that we're willing to listen, but it's not by any > means a slam dunk to get this into the distribution. Fair enough. Personally, I think it should be a core type, but would be quite happy if it were in contrib. At least that way it would save the next guy from having to hunt around the net. I guess i'm volunteering to maintain it in contrib. I'm not certain if i have the requisite knowledge to maintain it in the core. While I could acquire the familiarity if need be, for the next year and nine months law school is going to take up the bulk of my free time. And of course I'll still need time to play around with my ticketing and gis databases i'm developing. -- Nathan Wagner
On Sep 6, 2005, at 3:06 PM, mark@mark.mielke.cc wrote: > On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote: > >> I don't see any "big opposition". People are simply questioning the >> idea whether it belongs in core PG. The reason we don't want to >> accept >> everything-and-the-kitchen-sink in core is that we have only limited >> manpower to maintain it. So you've got to justify that we should >> spend >> our effort here and not elsewhere. There's a fair amount of nearly >> ... >> been there awhile. So one of the questions that's going to be >> asked is >> how useful/popular it's really going to be. >> > > Sounds reasonable, and certainly no more than I expected. If Nathan > hadn't raised the issue, it probably would have been a few months > before I raised it myself. > > >> One thing that is raising my own level of concern quite a bit is the >> apparent portability issues. Code that isn't completely portable >> is a >> huge maintainability problem; in particular, stuff that requires >> system-dependent behavior used nowhere else in Postgres is a real >> pain. >> It sounds like the UUID code expects to be able to get at the >> machine's >> MAC address, which suggests serious issues in (a) relying on >> not-too-standard APIs, (b) possible protection issues (will an >> unprivileged process be able to get at the MAC address?), and (c) >> ill-defined behavior on machines with more or less than one MAC >> address. >> Not to mention that MAC addresses aren't so unique as all that. >> > > I'll try to prepare an answer for this. (I started to write a lot of > information - but is it unverified from memory, and perhaps should be > more authoritative before presented as truth) Some modern UUID implementations prefer /dev/urandom or similar to the time or MAC address unless you really beg them to give you a weaker UUID. You can take a look at the man page for the Theodore Y. Ts'o implementation that is in Darwin's Libc here: http://developer.apple.com/documentation/Darwin/Reference/ManPages/ man3/uuid_generate.3.html Specifically: The uuid_generate function creates a new universally unique identifier (UUID). The uuid will be generated based on high-quality randomness from /dev/urandom, if available. If it is not available, then uuid_generate will use an alternative algorithm which uses the current time, the local ethernet MAC address (if available), and random data generated using a pseudo-random generator. The Apache Portable Runtime has a apr_os_uuid_get() that supports two flavors of UUID for unix (Linux/Mac OS X uuid_generate and FreeBSD's uuid_create, may be available elsewhere), and the UuidCreate API on Win32. apr-util's apr_uuid_get() will use apr_os_uuid_get() if available, and otherwise will default to a relatively weak mostly- timestamp-based UUID. It would probably be reasonable and easy to do what Apache does here. A platform UUID implementation, if present, is generally going to be better than anything included into PostgreSQL itself. -bob
Tom, you worded my thoughts much better than I did.
Bob, I too had heard that host-based UUIDs/GUIDs had issues with uniqueness. I think Microsoft's implementation was hosed and they ended up eliminating using the MAC completely.
I'll check out the code & get back.
Bob, I too had heard that host-based UUIDs/GUIDs had issues with uniqueness. I think Microsoft's implementation was hosed and they ended up eliminating using the MAC completely.
I'll check out the code & get back.
On 9/6/05, Bob Ippolito <bob@redivi.com> wrote:
On Sep 6, 2005, at 3:06 PM, mark@mark.mielke.cc wrote:
> On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:
>
>> I don't see any "big opposition". People are simply questioning the
>> idea whether it belongs in core PG. The reason we don't want to
>> accept
>> everything-and-the-kitchen-sink in core is that we have only limited
>> manpower to maintain it. So you've got to justify that we should
>> spend
>> our effort here and not elsewhere. There's a fair amount of nearly
>> ...
>> been there awhile. So one of the questions that's going to be
>> asked is
>> how useful/popular it's really going to be.
>>
>
> Sounds reasonable, and certainly no more than I expected. If Nathan
> hadn't raised the issue, it probably would have been a few months
> before I raised it myself.
>
>
>> One thing that is raising my own level of concern quite a bit is the
>> apparent portability issues. Code that isn't completely portable
>> is a
>> huge maintainability problem; in particular, stuff that requires
>> system-dependent behavior used nowhere else in Postgres is a real
>> pain.
>> It sounds like the UUID code expects to be able to get at the
>> machine's
>> MAC address, which suggests serious issues in (a) relying on
>> not-too-standard APIs, (b) possible protection issues (will an
>> unprivileged process be able to get at the MAC address?), and (c)
>> ill-defined behavior on machines with more or less than one MAC
>> address.
>> Not to mention that MAC addresses aren't so unique as all that.
>>
>
> I'll try to prepare an answer for this. (I started to write a lot of
> information - but is it unverified from memory, and perhaps should be
> more authoritative before presented as truth)
Some modern UUID implementations prefer /dev/urandom or similar to
the time or MAC address unless you really beg them to give you a
weaker UUID.
You can take a look at the man page for the Theodore Y. Ts'o
implementation that is in Darwin's Libc here:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/
man3/uuid_generate.3.html
Specifically:
The uuid_generate function creates a new universally unique
identifier
(UUID). The uuid will be generated based on high-quality
randomness
from /dev/urandom, if available. If it is not
available, then
uuid_generate will use an alternative algorithm which uses
the current
time, the local ethernet MAC address (if available), and
random data
generated using a pseudo-random generator.
The Apache Portable Runtime has a apr_os_uuid_get() that supports two
flavors of UUID for unix (Linux/Mac OS X uuid_generate and FreeBSD's
uuid_create, may be available elsewhere), and the UuidCreate API on
Win32. apr-util's apr_uuid_get() will use apr_os_uuid_get() if
available, and otherwise will default to a relatively weak mostly-
timestamp-based UUID.
It would probably be reasonable and easy to do what Apache does
here. A platform UUID implementation, if present, is generally going
to be better than anything included into PostgreSQL itself.
-bob
mark@mark.mielke.cc writes: > Eventually, SERIAL wraps around. So you switch to SERIAL8. At the point that > you have SERIAL8, you aren't worried terribly about disk space, and you > realize there is usually no benefit at all to the numbers being ordered so > closely. a) Except for trivially small applications you are _always_ worried terribly about disk space. The more money you spendon high end raid arrays the *more* you're worried about an incremental drain on performance. b) You only have to go to SERIAL8 for the few tables that actually have that many records. You may well still have SERIALdata types in 99% of your tables. c) You underestimate the cost of the added space. Don't forget it's not just an extra few bytes in the primary key. It'salso quadrupling the size of your primary key index (doubling over bigint). Most importantly it's also adding a few extra bytes to every foreign key column in every table that references that primarykey. For very relational databases with tables doing things like many-to-many joins or having 4+ foreign key referencingcolumns increasing all those integers to be 16 bytes increases the size of your database *immensely*. > Why not pick a scheme that is based on time? Perhaps create a revision code > field to deal with objects created simultaneously from the same source. And > why not identify the source to prevent collisions from multiple sources? If > we go from 8 bytes, to 16 bytes, we can encode all of this information > neatly. Welcome, UUID. :-) Welcome to exactly the abuse that people are fearing if it were included as a built-in type. Personally I don't think the argument that some people might abuse it is a good reason not to provide it. There are uses for which it's very effective -- even necessary. And I think it's important enough for the people that need it that it should be considered a fundamental database feature these days. The people who will abuse it (like yourself, imho) will always exist and the more powerful the tool the bigger the holes in their feet. Shouldn't stop us from having powerful tools when we need them. -- greg
Bob, People, Let me clarify my stance here, because it seems to be getting misrepresented. Mark (and Nathan) pushed at repaired UUID type for possible inclusion in the core PostgreSQL distribution. I'm not opposed to that, provided that the portability, licensing, and bugs are worked out. Why not? We have ipv6 data types, after all. However, Mark went on to suggest that we should recommend UUID over SERIAL in the docs, and that we could consider dropping SERIAL entirely in favor of UUID: ---quoth Mark------------------ I suggest that UUID be recommended in place of SERIAL for certain classes of applications, and that it therefore belongs in the core. UUID and SERIAL can be used together (although, once you have a UUID, it may not be useful to also have a SERIAL). --------------------------------- This was what I objected to; I believe that the use-case for UUIDs is actually quite narrow and assert that it's a very bad idea to promote them to most users. I have a "problem" with SERIAL abuse, too. In general, new DB designers have come to increasingly believe that surrogate keys (SERIALs, UUIDs, hash ids etc.) are an intrinsic part of the relational model and a requirement for all tables. Terrible database designs have resulted, chock full of tables which lack real keys and cannot be normalized. UUIDs tend to encourage this sort of behavior even more than SERIALs, not because of any intrinsic quality in the data type, but because much of the literature on the subject treats them like some kind of "universal object identifier" and not distinguishing servers, relations, or real keys. To repeat, though, this isn't a reason to keep them out of core, but it *is* a reason not to throw them at newbies as the holy grail of row identifiers. For my part, I generally push implementing the UUID concept in a better way that keeps server, table, and surrogate keys atomic (and thus more useful and easier to debug). -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Sep 6, 2005, at 6:02 PM, Josh Berkus wrote: > Bob, People, > > Let me clarify my stance here, because it seems to be getting > misrepresented. > > Mark (and Nathan) pushed at repaired UUID type for possible > inclusion in > the core PostgreSQL distribution. I'm not opposed to that, > provided that > the portability, licensing, and bugs are worked out. Why not? We > have > ipv6 data types, after all. > > However, Mark went on to suggest that we should recommend UUID over > SERIAL > in the docs, and that we could consider dropping SERIAL entirely in > favor > of UUID: > > ---quoth Mark------------------ > I suggest that UUID be recommended in place of SERIAL for certain > classes of applications, and that it therefore belongs in the core. > UUID and SERIAL can be used together (although, once you have a > UUID, it may not be useful to also have a SERIAL). > --------------------------------- > > This was what I objected to; I believe that the use-case for UUIDs is > actually quite narrow and assert that it's a very bad idea to > promote them > to most users. I agree with you (Josh) completely, which is why I said: "If the documentation gives the user a good idea of when to use UUID and when not, I think it would be a good addition." .. the fact that the use-cases are narrow was implicit :) Everything else I talked about was just implementation details. Summary: there are (several) UUID implementations out there that are appropriately licensed and easy enough to use, and a lot of OSes ship with pretty good implementations already. Creating a decent UUID type should be relatively trivial, as far as those things go. -bob
On Tue, Sep 06, 2005 at 06:02:50PM -0700, Josh Berkus wrote: > However, Mark went on to suggest that we should recommend UUID over SERIAL > in the docs, and that we could consider dropping SERIAL entirely in favor > of UUID: > ---quoth Mark------------------ > I suggest that UUID be recommended in place of SERIAL for certain > classes of applications, and that it therefore belongs in the core. > UUID and SERIAL can be used together (although, once you have a > UUID, it may not be useful to also have a SERIAL). > --------------------------------- > This was what I objected to; I believe that the use-case for UUIDs is > actually quite narrow and assert that it's a very bad idea to promote them > to most users. Ahhh... :-) I intended the word 'certain' to be emphasized in some way, rather than dropped. There are genuine uses for UUIDs. I didn't intend for everybody to pull out their database definitions and change them all to use UUID instead of SERIAL. Although - I don't think really bad things would happen if people did. They would simply be making a non-optimal choice (abusing the type?). Certainly nothing they weren't capable of before this particular capability were to arrive. :-) > I have a "problem" with SERIAL abuse, too. In general, new DB designers > have come to increasingly believe that surrogate keys (SERIALs, UUIDs, > hash ids etc.) are an intrinsic part of the relational model and a > requirement for all tables. Terrible database designs have resulted, > chock full of tables which lack real keys and cannot be normalized. > UUIDs tend to encourage this sort of behavior even more than SERIALs, not > because of any intrinsic quality in the data type, but because much of the > literature on the subject treats them like some kind of "universal object > identifier" and not distinguishing servers, relations, or real keys. > To repeat, though, this isn't a reason to keep them out of core, but it > *is* a reason not to throw them at newbies as the holy grail of row > identifiers. I agree. Although I lost it on the "cannot be normalized". I'm assuming there are designs you have seen much worse than the ones I have seen. :-) > For my part, I generally push implementing the UUID concept in a better way > that keeps server, table, and surrogate keys atomic (and thus more useful > and easier to debug). My eyes are glazing over a bit on this last one. Atomic? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Mark, > I agree. Although I lost it on the "cannot be normalized". I'm assuming > there are designs you have seen much worse than the ones I have seen. > :-) Mostly it's the problem of tables that don't have a real key, only a surrogate key. How do you know what's a duplicate? > > For my part, I generally push implementing the UUID concept in a > > better way that keeps server, table, and surrogate keys atomic (and > > thus more useful and easier to debug). > > My eyes are glazing over a bit on this last one. Atomic? Sure. What's a UUID, after all? It's three pieces of information: 1) A server or database instance identifier 2) A table identifier 3) A row identifier e.g.: chayote.sf.agliodbs.com | public.customers | 4271 Generally, it's pretty easy to just grab these 3 pieces of information separately and pass them with the data when you want to transmit between servers. If there are space/overhead considerations, you can put them into a reversable hash. I've never understood the complex measures which application developers take to create "universal" IDs. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Josh Berkus > Sent: Tuesday, September 06, 2005 6:40 PM > To: mark@mark.mielke.cc > Cc: Bob Ippolito; jonah.harris@gmail.com; pgsql-hackers@postgresql.org; > nathan wagner > Subject: Re: [HACKERS] uuid type for postgres > > Mark, > > > I agree. Although I lost it on the "cannot be normalized". I'm assuming > > there are designs you have seen much worse than the ones I have seen. > > :-) > > Mostly it's the problem of tables that don't have a real key, only a > surrogate key. How do you know what's a duplicate? > > > > For my part, I generally push implementing the UUID concept in a > > > better way that keeps server, table, and surrogate keys atomic (and > > > thus more useful and easier to debug). > > > > My eyes are glazing over a bit on this last one. Atomic? > > Sure. What's a UUID, after all? It's three pieces of information: > 1) A server or database instance identifier > 2) A table identifier > 3) A row identifier > > e.g.: > chayote.sf.agliodbs.com | public.customers | 4271 > > Generally, it's pretty easy to just grab these 3 pieces of information > separately and pass them with the data when you want to transmit between > servers. If there are space/overhead considerations, you can put them > into a reversable hash. I've never understood the complex measures which > application developers take to create "universal" IDs. What if you don't want to move anything from one system to another, but you would like to do joins across many servers on a network? It happens all the time. If you have a built in primary key for related tables and they are guaranteed unique, then you have a leg up. It is not unusual for the same schema to be used in many different regional offices, but to have site specific data in each area.
On Tue, Sep 06, 2005 at 06:40:27PM -0700, josh@agliodbs.com wrote: > Sure. What's a UUID, after all? It's three pieces of information: > 1) A server or database instance identifier > 2) A table identifier > 3) A row identifier > > e.g.: > chayote.sf.agliodbs.com | public.customers | 4271 > > Generally, it's pretty easy to just grab these 3 pieces of information > separately and pass them with the data when you want to transmit between > servers. If there are space/overhead considerations, you can put them > into a reversable hash. Take a look at the version 3 or version 5 UUIDs. They essentially do this. The hash isn't reversable, but rather recreatable. Seems that if it were reversable, it would be compression, not a hash. Anyway. You're assuming though that you want to leak this information. If you do, or you don't care, go ahead with the v3 or v5 uuid creator. Or use v1 if you want mac/time based uuids, or v4 for random uuids. The convenient thing about using the uuid library is that all of these mechanisms result in the same type, and are thus compatible. > I've never understood the complex measures which > application developers take to create "universal" IDs. Different applications have different goals. Unguessability might be important in some contexts. -- Nathan Wagner
Just an FYI: We also ended up rolling our own uuid type, against libuuid. It seems that uuid is a widespread enough concept that implementors *will* be asked to support it, moderately often. We *could* roll our own (were capable), others are not so lucky, and would have to point out the lack of a uuid type as a limitation of pgsql. Which is too bad, given how relatively simple they are. That said: - linking against libuuid is fine for a contrib/ extension, but no good for a built-in type. A real uuid would have to do a proper independent implementation of uuid creation within pgsql. - we cannot snarf libuuid code, it is LGPL (though perhaps the author would re-license. if that is the *only* objection, it is well worth asking) I think having a built-in uuid type is something that a large number of people will use. Whether they use it will or badly is not our problem. It is possible to build crappy databases with all the types that already exist, adding uuid is hardly going to bring the walls down. Having uuid removes another excuse for people not doing a pgsql implementation. I am not sure if I heard clearly from the core team that a self- contained, BSD-licensed uuid would be accepted(able)? If so, I'll contact the libuuid author about a re-license (shortest path from A to B). P. On 6-Sep-05, at 6:50 AM, nathan wagner wrote: > > > I have been in need of a uuid type and ran across the pguuid download > by Xiongjian (Mike) Wang. This wasn't really useful to me for two > reasons: first, it is GPLed and I would prefer a more liberal license, > secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a > SIOCGETIFHWADDR (? i think, i can get the exact name if you want it) > ioctl() under darwin. > > While I could dike out the code that calls it, that seems like a > suboptimal > solution. So after a bit of poking around the interweb i ran across > Ralf Engelschall's ossp uuid library. This compiled with minimal > effort on mac os. Some reading, and an evening later, i've made > a server plugin with supporting SQL that implements an 'ossp_uuid' > type. > > Now i have four questions: > > 1: Is it feasible for this to be included in the contrib section of > the regular download? The uuid library is a "notice of copyright" > style > license, and I am willing to put my own code into the public domain. > > 2: Would just calling the type 'uuid' be better than 'ossp_uuid'? > It's > certainly a nicer name. > > 3: Would it be possible to include such a type as a postgres extension > to the usual SQL types. It seems to me that having an officially > supported type would be better than a user contributed type on the > grounds > that you could then rely on it being avaiable if postgres was. > In particular, installing it as an extension would require the > cooperation > of the DBA, which may be infeasible in some environments. > > -- > Nathan Wagner > > >
Paul Ramsey <pramsey@refractions.net> writes: > I am not sure if I heard clearly from the core team that a self- > contained, BSD-licensed uuid would be accepted(able)? I don't think any of the other core members weighed in on this thread, so speaking strictly for myself: BSD license is an issue, and portability is an issue. I've heard enough to convince me that the demand exists, but the legal and technical issues remain. regards, tom lane
On Tue, Sep 06, 2005 at 08:40:08PM -0700, pramsey@refractions.net wrote: > - linking against libuuid is fine for a contrib/ extension, but no > good for a built-in type. A real uuid would have to do a proper > independent implementation of uuid creation within pgsql. Why? I'm not sure what the default build instructions are, but i've got the following on my linux box... granicus:~% ldd /usr/bin/postgres linux-gate.so.1 => (0x00368000) libpam.so.0 => /lib/libpam.so.0 (0x00311000) libssl.so.5 => /lib/libssl.so.5 (0x009b3000) libcrypto.so.5 => /lib/libcrypto.so.5 (0x00507000) libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x003ab000) libz.so.1 => /usr/lib/libz.so.1 (0x00c55000) libreadline.so.5=> /usr/lib/libreadline.so.5 (0x00c6a000) libtermcap.so.2 => /lib/libtermcap.so.2 (0x00dfa000) libcrypt.so.1 => /lib/libcrypt.so.1 (0x009f7000) libresolv.so.2 => /lib/libresolv.so.2 (0x0014e000) libnsl.so.1=> /lib/libnsl.so.1 (0x00a62000) libdl.so.2 => /lib/libdl.so.2 (0x00c4f000) libm.so.6 => /lib/libm.so.6(0x00c29000) libc.so.6 => /lib/libc.so.6 (0x00afd000) libcom_err.so.2 => /lib/libcom_err.so.2 (0x0038e000) libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x0099a000) libk5crypto.so.3 => /usr/lib/libk5crypto.so.3(0x00483000) /lib/ld-linux.so.2 (0x00adf000) libkrb5support.so.0 => /usr/lib/libkrb5support.so.0(0x0041f000) Quite a list. I wonder what readline is doing there. It would appear that linking against libraries is just fine, I don't see that uuid is any different. > - we cannot snarf libuuid code, it is LGPL (though perhaps the author > would re-license. if that is the *only* objection, it is well worth > asking) Not sure what uuid library you were using, but the one i used is not LGPL. I posted earlier the copyright and license portion of the readme. No relicensing would be necessary. -- Nathan Wagner
# tgl@sss.pgh.pa.us / 2005-09-06 17:54:34 -0400: > There's a fair amount of nearly > unmaintained cruft in the core distro already (eg, the never-finished > "line" datatype ... or the entire rtree index module ...) and a datatype > that might be used by only a few people is a likely candidate to become > an unmaintained backwater. And yet it's hard to get rid of stuff that's > been there awhile. So one of the questions that's going to be asked is > how useful/popular it's really going to be. We'd have use for uuid in tables of N*10^6 rows (N<10 in most cases). I'm far from claiming to be an experienced C programmer, but count me in for whatever I'll be able to do. I think that coming up with code that meets the general criteria for inclusion in PostgreSQL first, before it's consideredfor inclusion, is a reasonable thing to do. > One thing that is raising my own level of concern quite a bit is the > apparent portability issues. That's understood. -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991
Nathan, > Take a look at the version 3 or version 5 UUIDs. They essentially do this. > The hash isn't reversable, but rather recreatable. Seems that if it > were reversable, it would be compression, not a hash. Anyway. Hmmm, yes, true. > You're assuming though that you want to leak this information. If you do, > or you don't care, go ahead with the v3 or v5 uuid creator. > > Or use v1 if you want mac/time based uuids, or v4 for random uuids. > > The convenient thing about using the uuid library is that all of these > mechanisms result in the same type, and are thus compatible. That's good, it gives users options. And I can see why you don't want to re-create the functionality in PG code, it's probably pretty large. > > I've never understood the complex measures which > > application developers take to create "universal" IDs. > > Different applications have different goals. Unguessability might > be important in some contexts. Well, IMHO, if the unguessability of UUIDs is an important part of your spec, you have some design problems. But it's not my job to lock up the foot-guns, just to make sure they come with warnings ... -- Josh Berkus Aglio Database Solutions San Francisco
Nathan, > > - linking against libuuid is fine for a contrib/ extension, but no > > good for a built-in type. A real uuid would have to do a proper > > independent implementation of uuid creation within pgsql. > > Why? I think the issue is portability. Remember that this type needs to work on Windows as well as all POSIX platforms and AIX. > I'm not sure what the default build instructions are, but i've got > the following on my linux box... > Quite a list. I wonder what readline is doing there. Readline is for PSQL command completion and history. As for the rest, they are *optional* modules that apparently your RPM builder chose to include; I build from source and my only dependencies are bison, flex, gcc and perl. > It would appear that linking against libraries is just fine, I don't > see that uuid is any different. Is it portable to all platforms we support? How does it generate its machine identifier? One of the differences between an add-in and core code is support for all PostgreSQL platforms. -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote: > I think the issue is portability. Remember that this type needs to work on > Windows as well as all POSIX platforms and AIX. I had forgotten about windows. I'll see to what extent the library i'm using is portable to windows. > > It would appear that linking against libraries is just fine, I don't > > see that uuid is any different. > > Is it portable to all platforms we support? I don't know yet. It could be made to be so though. I don't have any way to compile on windows, so someone would have to be willing to try that. Same with AIX. If someone with an AIX box could download and test it out I would appreciate a compatibility report. > How does it generate its machine identifier? No idea. Does it matter? Not having to fret this kind of detail is the advantage of using someone else's library. > One of the differences between an add-in and core code is support for all > PostgreSQL platforms. So would including my type in contrib be possible then? To me, that is perhaps more important than making it a core type, though i think there should be a core UUID type as well. I'm testing out elog(). I'll post a new version later today. There really won't be any new functionality, it's just code clean up. -- Nathan Wagner
On Wed, Sep 07, 2005 at 09:45:17AM -0700, Josh Berkus wrote: > > Nathan wrote: > > > Quite a list. I wonder what readline is doing there. > > Readline is for PSQL command completion and history. As for the rest, they > are *optional* modules that apparently your RPM builder chose to include; I > build from source and my only dependencies are bison, flex, gcc and perl. Still seems odd to me: I would expect psql to have readline, not postgres. Cheers, Patrick
On Wed, Sep 07, 2005 at 09:48:08AM -0700, josh@agliodbs.com wrote: > That's good, it gives users options. And I can see why you don't want to > re-create the functionality in PG code, it's probably pretty large. It would also be something else that would have to be maintained and debugged. Which i would be starting from scratch. The OSSP UUID library is at version 1.3. From this I infer that he has spent some time maintaining and debugging it. The less duplication of code and effort the better. There may be a problem with using the library as a core type though, due to windows portability concerns. That said, it would probably be easier to port the underlying library than to start from scratch. > Well, IMHO, if the unguessability of UUIDs is an important part of your spec, > you have some design problems. I was thinking of cryptographic applications. Then, to use an example from another domain, initial TCP sequence numbers should be random (i.e. unguessable). A problem with TCP perhaps. It's been a while since i've read over my copy of _Applied Cryptography_, but I seem to recall that unguessable numbers were sometimes useful. -- Nathan Wagner
Nathan, > I was thinking of cryptographic applications. Then, to use an > example from another domain, initial TCP sequence numbers should > be random (i.e. unguessable). A problem with TCP perhaps. It's > been a while since i've read over my copy of _Applied Cryptography_, > but I seem to recall that unguessable numbers were sometimes useful. Yeah, the problem is what it does to your data. If there's no analyzable correspondence between the key and the server/table/row it attaches to, then you have no way to detect if the key and the data have become disassociated. This is, btw, a problem with conventional SERIALs as well. "unguessable" UUIDs just compound the problem by adding additional variables (the server and the table), as well as generally leading to applications which depend *entirely* on UUID-based integrity and thus can't cope with any failures in UUID generation or transmission. I'm also a little baffled to come up with any real application where making an id number for most tables "unguessable" would provide any kind of real protection not far better provided by other means. For your "users" table, sure, but that's a very special case. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
I have made a new version, and made a web page for it. http://granicus.if.org/~nw/uuid/ Given the statement that it won't be accepted for contrib or core unless it compiles on windows, I guess I won't really have anything further to say on the topic. I don't have any way to compile on windows, so it's not an obstacle I can readily address. Comments for improvements are welcome. Other sorts of comments on the code are welcome as well. I hope someone else will find it useful. I won't post any further update/release notices to the list, they're not really on topic, I posted these two so that folks could look at the code i was proposing if they wished. -- Nathan Wagner
Josh Berkus <josh@agliodbs.com> writes: > One of the differences between an add-in and core code is support for all > PostgreSQL platforms. These days, things won't get into contrib either if they don't work on all the buildfarm machines. regards, tom lane
nathan wagner <nw@hydaspes.if.org> writes: > On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote: >> How does it generate its machine identifier? > No idea. Does it matter? Not having to fret this kind of detail is > the advantage of using someone else's library. It absolutely matters, because without knowing that we can have no idea what sort of portability issues we might face with it. I don't buy the "not having to fret" line in the least. regards, tom lane
On Wed, Sep 07, 2005 at 06:35:51PM +0000, nathan wagner wrote: > I have made a new version, and made a web page for it. > > http://granicus.if.org/~nw/uuid/ > > Given the statement that it won't be accepted for contrib or core > unless it compiles on windows, I guess I won't really have anything > further to say on the topic. I don't have any way to compile on windows, > so it's not an obstacle I can readily address. > > Comments for improvements are welcome. Other sorts of comments on the code are > welcome as well. I hope someone else will find it useful. > > I won't post any further update/release notices to the list, they're > not really on topic, I posted these two so that folks could look at the code > i was proposing if they wished. Rather than just give up on it, why not create a project in pgFoundry? I don't really see why this *has* to be in core (or contrib) anyway. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Patrick Welche <prlw1@newn.cam.ac.uk> writes: > Still seems odd to me: I would expect psql to have readline, not postgres. That's because we generate just one LIBS list and use it for all the executables we build. Autoconf makes it a bit difficult to do otherwise. There is an option in the linux linker to suppress ldd references to shared libraries that aren't actually used by the program, but when we tried to use it we found out it was pretty buggy (see archives). regards, tom lane
Nathan, > http://granicus.if.org/~nw/uuid/ > > Given the statement that it won't be accepted for contrib or core > unless it compiles on windows, I guess I won't really have anything > further to say on the topic. I don't have any way to compile on > windows, so it's not an obstacle I can readily address. Wanna start a pgFoundry project so that someone *else* can do the Windows version, if they want? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Wed, Sep 07, 2005 at 06:48:41PM +0100, Patrick Welche wrote: > On Wed, Sep 07, 2005 at 09:45:17AM -0700, Josh Berkus wrote: > > > Nathan wrote: > > > > > Quite a list. I wonder what readline is doing there. > > > > Readline is for PSQL command completion and history. As for the rest, they > > are *optional* modules that apparently your RPM builder chose to include; I > > build from source and my only dependencies are bison, flex, gcc and perl. > > Still seems odd to me: I would expect psql to have readline, not postgres. It's an issue with the linker. I seem to remember that 8.1 uses the --as-needed flag to ld, which means it will only link the libraries that each executable actually uses. Hmm, poking my build, this doesn't seem to be the case: $ ldd `which postgres` linux-gate.so.1 => (0xffffe000) libz.so.1 => /usr/lib/libz.so.1 (0x41c54000) libreadline.so.5=> /lib/libreadline.so.5 (0xb7f51000) libcrypt.so.1 => /lib/tls/libcrypt.so.1 (0xb7f23000) libresolv.so.2=> /lib/tls/libresolv.so.2 (0x41c6a000) libnsl.so.1 => /lib/tls/libnsl.so.1 (0x41c25000) libdl.so.2=> /lib/tls/libdl.so.2 (0x4118f000) libm.so.6 => /lib/tls/libm.so.6 (0x41153000) libc.so.6 => /lib/tls/libc.so.6(0x41019000) libncurses.so.5 => /lib/libncurses.so.5 (0xb7ee0000) /lib/ld-linux.so.2 (0x41000000) This is a fairly recent 8.1, maybe post-beta1. Not sure what happened. I certainly don't see the --as-needed in LDFLAGS: $ pg_config | grep LDFLA LDFLAGS = -Wl,-rpath,/pg/install/00orig/lib LDFLAGS_SL = -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)
On Sep 7, 2005, at 10:04 AM, nathan wagner wrote: > On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote: > > >> I think the issue is portability. Remember that this type needs >> to work on >> Windows as well as all POSIX platforms and AIX. >> > > I had forgotten about windows. I'll see to what extent the library > i'm > using is portable to windows. You don't need to use a non-standard library for a lot of platforms, you're pretty much guaranteed UUID support for (at least): Linux/Darwin/Mac OS X: uuid_generate <uuid/uuid.h> (part of libc, at least on Darwin) FreeBSD: uuid_create <uuid.h> (again, libc) Windows: UuidCreate <Rpc.h> (link to Rpcrt4.dll, IIRC) Where a native UUID generate is not available, you can include some implementation, but surely other platforms also include UUID implementations. I think Windows portability is a non-issue here. -bob
Alvaro Herrera wrote: > >This is a fairly recent 8.1, maybe post-beta1. Not sure what happened. >I certainly don't see the --as-needed in LDFLAGS: > >$ pg_config | grep LDFLA >LDFLAGS = -Wl,-rpath,/pg/install/00orig/lib >LDFLAGS_SL = > > It was removed because it was very badly broken. cheers andrew
On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote: > I'm also a little baffled to come up with any real application where making > an id number for most tables "unguessable" would provide any kind of real > protection not far better provided by other means. For your "users" > table, sure, but that's a very special case. It should never be the sole means of defense, however, it can be quite effective at prevention. For a rather simple example, consider a site that associates a picture with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc. it makes it ridiculously easy to write a script to pull all of the pictures off the site. This can be bothersome, as the only type of person who would do this, is the type of person with an illegitimate motivation. I want the data to be easily and freely accessible as specific objects, but I do not wish to provide an easy way of dumping all of the data as a unit. By making the picture identifier unguessable, it discourages the most common sort of abuse of the system. If the number is unguessable, and they can't access the directory as a listing, it will be sufficiently difficult as to discourage the common abuser of the system. On the other hand, an obviously guessable identifier may *encourage* the common person to consider abuse. In my case, it isn't only pictures. I don't want people pulling all the data off the site as a dump, and using it how they wish, but I do wish to make the data freely available, and easily accessible from a web browser. I'm not under the impression that it is impossible for a competent person to dump my database. I am under the impression that the people who would do such a thing, tend not to be intelligent, and will be stopped by this simple tactic. I could use any identifier at all. It could be a random sequence of characters. The UUID appeals to me, as I don't have to re-invent the concept. This use of UUID falls outside the scope of using it to join tables. It's a handle that is associated with the data, for external identification of the object. I happen to also use it as an internal primary key for the objects that fit this category, as I wish to benefit from the built-in merge capabilities of UUID over SERIAL, and I don't currently see the point of keeping a SERIAL and a UUID for each object. On the last point, I did start to do this, but every single one of my queries become more complicated as a result. Using the SERIAL for joining, and the UUID for identifying a set of rows was becoming a little ridiculous for my purposes. Using only the UUID to provide for all my purposes is suiting my requirements for the cost of 1.5X the size of a primary key index, 2X the size of a index for a n to n relation mapping UUID to UUID, and an far less significant increase in table space (much less than 1.5X, although I haven't finished calculating it yet). Not that everybody should rip out SERIAL and replace it with UUID, but it really isn't that bad, and in some cases, such as mine, I don't see the point of using both, and choose to instead allow UUID to solve many of my concerns at the same time, with an acceptable for me cost in database pages. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > This is a fairly recent 8.1, maybe post-beta1. Not sure what happened. This: 2005-05-07 01:48 neilc * configure, configure.in: Revert the ld --as-needed patch. Thisbreaks Fedora Core 3, due to a strange interaction betweenld,readline, termcap, and psql. The symptom is psql failing with thiserror on startup: symbol lookup error: /usr/lib64/libreadline.so.4:undefinedsymbol: BCI'm still trying to find the best way to solve this, but in themean time I'mreverting the patch in order to unbreak FC3. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> I'm also a little baffled to come up with any real application where making >> an id number for most tables "unguessable" would provide any kind of real >> protection not far better provided by other means. For your "users" >> table, sure, but that's a very special case. > It should never be the sole means of defense, however, it can be quite > effective at prevention. > For a rather simple example, consider a site that associates a picture > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc. > it makes it ridiculously easy to write a script to pull all of the > pictures off the site. This can be bothersome, as the only type of > person who would do this, is the type of person with an illegitimate > motivation. I want the data to be easily and freely accessible as > specific objects, but I do not wish to provide an easy way of > dumping all of the data as a unit. I don't think security through obscurity is a good reason to add something to PostgreSQL. Either make the pictures accessible to everyone, or put some other security mechanism in place. As far as pictures, you could certainly used the SHA1 checksum as your primary key, which would satisfy all your requirements. > In my case, it isn't only pictures. I don't want people pulling all > the data off the site as a dump, and using it how they wish, but I do > wish to make the data freely available, and easily accessible from a > web browser. .htaccess? Cookies? Encrypted data? iptables? All of these are better solutions than random filenames. > Not that everybody should rip out SERIAL and replace it with UUID, > but it really isn't that bad, and in some cases, such as mine, > I don't see the point of using both, and choose to instead allow > UUID to solve many of my concerns at the same time, with an > acceptable for me cost in database pages. We're doing something similar to UUIDs here - LSID (Life Science Identifiers). But being as the whole point of LSIDs is to share data, the "random" factor is not needed. If ever PG did do something like UUID (and despite the paragraphs above, I could see a use for it), I'd rather do something like picking from a predetermined range of values at random and deplete the pool as you go rather than create large values with a low (but finite) probability of collision. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200509080939 https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkMgQAwACgkQvJuQZxSWSsgaOQCgl2Mb0cY5NuaTVuIAGdYKXqNE oQYAoJ5pm3xHcu3Wr382Jja59Ll1/YOP =IeqX -----END PGP SIGNATURE-----
Greg, thanks for saying it... I was thinking the same thing.
Not that it really relates to the UUID data type inclusion discussion itself, but I think this application design and use case for UUID is an example of using a data type for the wrong purpose. Application design-wise, security should be handled in a true *security* way, not by obfuscating a key (unless you're Microsoft :)). And, as Greg said, why not use a SHA1 hash instead... it would be easier than writing a whole new type implementation.
I haven't yet gotten a chance to check out the code but plan to soon.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Not that it really relates to the UUID data type inclusion discussion itself, but I think this application design and use case for UUID is an example of using a data type for the wrong purpose. Application design-wise, security should be handled in a true *security* way, not by obfuscating a key (unless you're Microsoft :)). And, as Greg said, why not use a SHA1 hash instead... it would be easier than writing a whole new type implementation.
I haven't yet gotten a chance to check out the code but plan to soon.
On 9/8/05, Greg Sabino Mullane <greg@turnstep.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>> I'm also a little baffled to come up with any real application where making
>> an id number for most tables "unguessable" would provide any kind of real
>> protection not far better provided by other means. For your "users"
>> table, sure, but that's a very special case.
> It should never be the sole means of defense, however, it can be quite
> effective at prevention.
> For a rather simple example, consider a site that associates a picture
> with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> it makes it ridiculously easy to write a script to pull all of the
> pictures off the site. This can be bothersome, as the only type of
> person who would do this, is the type of person with an illegitimate
> motivation. I want the data to be easily and freely accessible as
> specific objects, but I do not wish to provide an easy way of
> dumping all of the data as a unit.
I don't think security through obscurity is a good reason to add something
to PostgreSQL. Either make the pictures accessible to everyone, or put
some other security mechanism in place. As far as pictures, you could
certainly used the SHA1 checksum as your primary key, which would satisfy
all your requirements.
> In my case, it isn't only pictures. I don't want people pulling all
> the data off the site as a dump, and using it how they wish, but I do
> wish to make the data freely available, and easily accessible from a
> web browser.
.htaccess? Cookies? Encrypted data? iptables? All of these are better
solutions than random filenames.
> Not that everybody should rip out SERIAL and replace it with UUID,
> but it really isn't that bad, and in some cases, such as mine,
> I don't see the point of using both, and choose to instead allow
> UUID to solve many of my concerns at the same time, with an
> acceptable for me cost in database pages.
We're doing something similar to UUIDs here - LSID (Life Science
Identifiers). But being as the whole point of LSIDs is to share
data, the "random" factor is not needed. If ever PG did do something
like UUID (and despite the paragraphs above, I could see a use for it),
I'd rather do something like picking from a predetermined range of values
at random and deplete the pool as you go rather than create large
values with a low (but finite) probability of collision.
- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509080939
https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkMgQAwACgkQvJuQZxSWSsgaOQCgl2Mb0cY5NuaTVuIAGdYKXqNE
oQYAoJ5pm3xHcu3Wr382Jja59Ll1/YOP
=IeqX
-----END PGP SIGNATURE-----
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote: > > For a rather simple example, consider a site that associates a picture > > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc. > > it makes it ridiculously easy to write a script to pull all of the > > pictures off the site. This can be bothersome, as the only type of > > person who would do this, is the type of person with an illegitimate > > motivation. I want the data to be easily and freely accessible as > > specific objects, but I do not wish to provide an easy way of > > dumping all of the data as a unit. > I don't think security through obscurity is a good reason to add something > to PostgreSQL. Either make the pictures accessible to everyone, or put > some other security mechanism in place. As far as pictures, you could > certainly used the SHA1 checksum as your primary key, which would satisfy > all your requirements. I think you missed two parts. The first is, that it isn't really an attempt at security. The pictures are still freely available. There is no need to block accesses. The goal is to prevent a dump of the database. Nobody should be able to trivially download all the pictures at one time. I'm not sure what you are suggesting the SHA1 checksum be applied to. Not the picture, for sure, as this would require that my tables store a primary key, and a picture key, leading me right back to something that seems a little ridiculous. If you mean the SHA1 checksum of some constant secret password, and the primary key, possibly SERIAL, then sure - but that's ignoring the other benefits of UUID. If I'm going to use UUID, I might as well use only UUID, and use the SHA1 checksum of the constant secret password, and the UUID. So, the idea has merit - and was suggested by another person - but it doesn't mean using the UUID alone, without all the extra layers on top I would have to write, is the wrong choice. Why would I go out of my way to jump through so many hoops, when I can very easily, use the UUID? You're asking me to do a lot, only so that you can point out that 'UUID' is not necessary. Heck, I could use the timestamp, and a random key as two separate fields. There are always work arounds. Is it good to use work arounds? Or a type that already works fine for the purposes? :-) > > In my case, it isn't only pictures. I don't want people pulling all > > the data off the site as a dump, and using it how they wish, but I do > > wish to make the data freely available, and easily accessible from a > > web browser. > .htaccess? Cookies? Encrypted data? iptables? All of these are better > solutions than random filenames. Nope. The files are publically accessible. No reason the secure their access, and securing their access makes the whole system less efficient. > > Not that everybody should rip out SERIAL and replace it with UUID, > > but it really isn't that bad, and in some cases, such as mine, > > I don't see the point of using both, and choose to instead allow > > UUID to solve many of my concerns at the same time, with an > > acceptable for me cost in database pages. > We're doing something similar to UUIDs here - LSID (Life Science > Identifiers). But being as the whole point of LSIDs is to share > data, the "random" factor is not needed. If ever PG did do something > like UUID (and despite the paragraphs above, I could see a use for it), > I'd rather do something like picking from a predetermined range of values > at random and deplete the pool as you go rather than create large > values with a low (but finite) probability of collision. I don't need the random factor. I only need it to be difficult to guess. The timestamps have sufficient granularity, that they are effectively difficult to guess, and a foreach loop would be expensive. If none of the bits were random, I would still be happy. If I had LSID support, and no UUID support, perhaps I'd use LSID instead. I have no special attachment to UUID, other than it satisfies my concerns and purposes in a convenient manner. I can merge my tables without fiddling with the sequence definitions. I can hand out external identifiers for specific objects which are effectively unguessable. I can avoid having two primary keys, and use only the one. Any type that satisfied these concerns would make me happy. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Mark,
I think what Greg suggested was sha1(number) as the key instead of requiring uuid as the key... it would perform the same function as far as you r use case is concerned.
As a similar example (using MD5):
CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
CREATE TABLE your_tbl (
your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
your_picture BYTEA NOT NULL,
PRIMARY KEY (your_key));
INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
select * from your_tbl ;
your_key | your_picture
----------------------------------+-----------------
c4ca4238a0b923820dcc509a6f75849b | some bytea data
same functionality from your standpoint.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
I think what Greg suggested was sha1(number) as the key instead of requiring uuid as the key... it would perform the same function as far as you r use case is concerned.
As a similar example (using MD5):
CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
CREATE TABLE your_tbl (
your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
your_picture BYTEA NOT NULL,
PRIMARY KEY (your_key));
INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
select * from your_tbl ;
your_key | your_picture
----------------------------------+-----------------
c4ca4238a0b923820dcc509a6f75849b | some bytea data
same functionality from your standpoint.
On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.
> I don't think security through obscurity is a good reason to add something
> to PostgreSQL. Either make the pictures accessible to everyone, or put
> some other security mechanism in place. As far as pictures, you could
> certainly used the SHA1 checksum as your primary key, which would satisfy
> all your requirements.
I think you missed two parts. The first is, that it isn't really an
attempt at security. The pictures are still freely available. There
is no need to block accesses.
The goal is to prevent a dump of the database. Nobody should be able to
trivially download all the pictures at one time.
I'm not sure what you are suggesting the SHA1 checksum be applied to.
Not the picture, for sure, as this would require that my tables store
a primary key, and a picture key, leading me right back to something
that seems a little ridiculous.
If you mean the SHA1 checksum of some constant secret password, and
the primary key, possibly SERIAL, then sure - but that's ignoring the
other benefits of UUID. If I'm going to use UUID, I might as well use
only UUID, and use the SHA1 checksum of the constant secret password,
and the UUID. So, the idea has merit - and was suggested by another
person - but it doesn't mean using the UUID alone, without all the
extra layers on top I would have to write, is the wrong choice. Why
would I go out of my way to jump through so many hoops, when I can
very easily, use the UUID?
You're asking me to do a lot, only so that you can point out that
'UUID' is not necessary. Heck, I could use the timestamp, and a
random key as two separate fields. There are always work arounds.
Is it good to use work arounds? Or a type that already works fine
for the purposes? :-)
> > In my case, it isn't only pictures. I don't want people pulling all
> > the data off the site as a dump, and using it how they wish, but I do
> > wish to make the data freely available, and easily accessible from a
> > web browser.
> .htaccess? Cookies? Encrypted data? iptables? All of these are better
> solutions than random filenames.
Nope. The files are publically accessible. No reason the secure their
access, and securing their access makes the whole system less efficient.
> > Not that everybody should rip out SERIAL and replace it with UUID,
> > but it really isn't that bad, and in some cases, such as mine,
> > I don't see the point of using both, and choose to instead allow
> > UUID to solve many of my concerns at the same time, with an
> > acceptable for me cost in database pages.
> We're doing something similar to UUIDs here - LSID (Life Science
> Identifiers). But being as the whole point of LSIDs is to share
> data, the "random" factor is not needed. If ever PG did do something
> like UUID (and despite the paragraphs above, I could see a use for it),
> I'd rather do something like picking from a predetermined range of values
> at random and deplete the pool as you go rather than create large
> values with a low (but finite) probability of collision.
I don't need the random factor. I only need it to be difficult to guess.
The timestamps have sufficient granularity, that they are effectively
difficult to guess, and a foreach loop would be expensive. If none of
the bits were random, I would still be happy.
If I had LSID support, and no UUID support, perhaps I'd use LSID
instead. I have no special attachment to UUID, other than it satisfies
my concerns and purposes in a convenient manner. I can merge my tables
without fiddling with the sequence definitions. I can hand out
external identifiers for specific objects which are effectively
unguessable. I can avoid having two primary keys, and use only the
one. Any type that satisfied these concerns would make me happy.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On Thu, Sep 08, 2005 at 12:02:54PM -0400, Jonah H. Harris wrote: > I think what Greg suggested was sha1(number) as the key instead of requiring > uuid as the key... it would perform the same function as far as you r use > case is concerned. I'm sure he meant something like this. But I am still failing to see how the below is any better than UUID. UUID is 16 bytes instead of 4+32=36. UUID doesn't require synchronization of the SEQUENCE configuration (INCREMENT 100 START <SITE-NUMBER> as suggested by another person) when working with tables at multiple sites that will be merged together. So this would leave having the primary key for the row being SERIAL, and naming the picture as the MD5/SHA1 using data only from constant strings, and the row data, the only part of the row data remaining constant being the primary key. I agreed this would work, and enhanced this by copying a trick from the SASL people where the key would be concatenated with a constant secret string to further prevent people from guessing how to crack the numbering scheme, but it means I have to jump through hoops, and it doesn't solve the multisite / joining tables after the fact, issue. I can't take my picture databases from two separate sites, and merge them together nightly without jumping through hoops. I might have a site in New York, and a site in Toronto. I don't want to obtain sequence numbers remotely between sites, forcing a real-time dependency between the sites. I'm sure we could debate about alternatives for a long, long time. :-) In the case I describe above, I could have a "ORIGINAL SITE" field, and include that in a multi-column primary key. I stubbornly wish to use a type which doesn't require alternatives, or messy complications to the many SQL queries I will be performing. I want it to work out of the box, without any magic. UUID does exactly what I want, and perhaps more than I need. Most importantly, it does exactly what I want. I don't care at all that it does more than I need, as something less isn't available or standard at the moment. UUID is standard. You guys are amusing me - not in an insulting way, but by making me think outside my own conclusions. In the end, though, although there have been arguments that have made me doubt my decision (1.5X the number of index pages required over a SERIAL primary key, 2X the number of index pages required for a multi-column primary key used in a join table), for the most part, I've become more confident with my decision. There's nothing wrong with what I'm doing. Yes, there are other ways to do it - but these would only increase the complexity of my application, with an insufficient return for this sacrifice. My application is quite simple as is, using UUID as my preferred type for objects that will be accessed on their own through external interfaces. Cheers, mark > As a similar example (using MD5): > > CREATE SEQUENCE marks_seq START 1 INCREMENT 1; > CREATE TABLE your_tbl ( > your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')), > your_picture BYTEA NOT NULL, > PRIMARY KEY (your_key)); > > INSERT INTO your_tbl (your_picture) VALUES ('some bytea data'); > > select * from your_tbl ; > your_key | your_picture > ----------------------------------+----------------- > c4ca4238a0b923820dcc509a6f75849b | some bytea data > > same functionality from your standpoint. -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
>From what you said:
> I agreed this would work, and enhanced
> this by copying a trick from the SASL people where the key would be
> concatenated with a constant secret string to further prevent people
> from guessing how to crack the numbering scheme
under definition of security, "Something that gives or assures safety"
Firstly, the use case is a security-related thing, i.e. it RESTRICTS someone from doing something (mass dump of data) by introducing a SECURITY MEASURE (obfuscating a key). Enough about that discussion.
Second, while you did mention a distributed environment as an advantage of using UUID, there was no mention of distributed sites (New York, Toronto) in your actual use case; so don't expect us to be psychic.
Third, if you feel that you need UUID, by all means, USE IT! This topic, IMHO, is now *WAY* out of scope. You have the information from Tom et al and if you want to go ahead with a new PostgreSQL uuid type then develop it, test it, and submit it for 8.2.
Lastly, people are here to help you, not to argue with you (most of the time). No one told you not to use it, they just offered alternatives and discussion on whether it should be added to core.
You've made the case that it would be useful to have, so go forth and develop it. I don't see how there can be much more discussion here.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
> I agreed this would work, and enhanced
> this by copying a trick from the SASL people where the key would be
> concatenated with a constant secret string to further prevent people
> from guessing how to crack the numbering scheme
under definition of security, "Something that gives or assures safety"
Firstly, the use case is a security-related thing, i.e. it RESTRICTS someone from doing something (mass dump of data) by introducing a SECURITY MEASURE (obfuscating a key). Enough about that discussion.
Second, while you did mention a distributed environment as an advantage of using UUID, there was no mention of distributed sites (New York, Toronto) in your actual use case; so don't expect us to be psychic.
Third, if you feel that you need UUID, by all means, USE IT! This topic, IMHO, is now *WAY* out of scope. You have the information from Tom et al and if you want to go ahead with a new PostgreSQL uuid type then develop it, test it, and submit it for 8.2.
Lastly, people are here to help you, not to argue with you (most of the time). No one told you not to use it, they just offered alternatives and discussion on whether it should be added to core.
You've made the case that it would be useful to have, so go forth and develop it. I don't see how there can be much more discussion here.
On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
On Thu, Sep 08, 2005 at 12:02:54PM -0400, Jonah H. Harris wrote:
> I think what Greg suggested was sha1(number) as the key instead of requiring
> uuid as the key... it would perform the same function as far as you r use
> case is concerned.
I'm sure he meant something like this. But I am still failing to see
how the below is any better than UUID. UUID is 16 bytes instead of
4+32=36. UUID doesn't require synchronization of the SEQUENCE
configuration (INCREMENT 100 START <SITE-NUMBER> as suggested by
another person) when working with tables at multiple sites that will
be merged together.
So this would leave having the primary key for the row being SERIAL,
and naming the picture as the MD5/SHA1 using data only from constant
strings, and the row data, the only part of the row data remaining
constant being the primary key. I agreed this would work, and enhanced
this by copying a trick from the SASL people where the key would be
concatenated with a constant secret string to further prevent people
from guessing how to crack the numbering scheme, but it means I have
to jump through hoops, and it doesn't solve the multisite / joining
tables after the fact, issue. I can't take my picture databases from
two separate sites, and merge them together nightly without jumping
through hoops. I might have a site in New York, and a site in Toronto.
I don't want to obtain sequence numbers remotely between sites,
forcing a real-time dependency between the sites.
I'm sure we could debate about alternatives for a long, long time. :-)
In the case I describe above, I could have a "ORIGINAL SITE" field,
and include that in a multi-column primary key.
I stubbornly wish to use a type which doesn't require alternatives, or
messy complications to the many SQL queries I will be performing. I
want it to work out of the box, without any magic. UUID does exactly
what I want, and perhaps more than I need. Most importantly, it does
exactly what I want. I don't care at all that it does more than I need,
as something less isn't available or standard at the moment. UUID is
standard.
You guys are amusing me - not in an insulting way, but by making me
think outside my own conclusions. In the end, though, although there
have been arguments that have made me doubt my decision (1.5X the
number of index pages required over a SERIAL primary key, 2X the
number of index pages required for a multi-column primary key used in
a join table), for the most part, I've become more confident with my
decision. There's nothing wrong with what I'm doing. Yes, there are
other ways to do it - but these would only increase the complexity of
my application, with an insufficient return for this sacrifice. My
application is quite simple as is, using UUID as my preferred type
for objects that will be accessed on their own through external
interfaces.
Cheers,
mark
> As a similar example (using MD5):
>
> CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
> CREATE TABLE your_tbl (
> your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
> your_picture BYTEA NOT NULL,
> PRIMARY KEY (your_key));
>
> INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
>
> select * from your_tbl ;
> your_key | your_picture
> ----------------------------------+-----------------
> c4ca4238a0b923820dcc509a6f75849b | some bytea data
>
> same functionality from your standpoint.
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
One reason to use a UUID type over a naively stored hash for this purpose is that it takes up half the space as naively stored MD5 and 40% of the space as naively stored SHA1. Granted, it's easy enough to pack them, but packed MD5 does have the same storage requirements as UUID and it won't be quite as convenient to use as a native UUID type out of the box.
-bob
On Sep 8, 2005, at 9:02 AM, Jonah H. Harris wrote:
Mark,
I think what Greg suggested was sha1(number) as the key instead of requiring uuid as the key... it would perform the same function as far as you r use case is concerned.
As a similar example (using MD5):
CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
CREATE TABLE your_tbl (
your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval('marks_seq')),
your_picture BYTEA NOT NULL,
PRIMARY KEY (your_key));
INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
select * from your_tbl ;
your_key | your_picture
----------------------------------+-----------------
c4ca4238a0b923820dcc509a6f75849b | some bytea data
same functionality from your standpoint.On 9/8/05, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.
> I don't think security through obscurity is a good reason to add something
> to PostgreSQL. Either make the pictures accessible to everyone, or put
> some other security mechanism in place. As far as pictures, you could
> certainly used the SHA1 checksum as your primary key, which would satisfy
> all your requirements.
I think you missed two parts. The first is, that it isn't really an
attempt at security. The pictures are still freely available. There
is no need to block accesses.
The goal is to prevent a dump of the database. Nobody should be able to
trivially download all the pictures at one time.
I'm not sure what you are suggesting the SHA1 checksum be applied to.
Not the picture, for sure, as this would require that my tables store
a primary key, and a picture key, leading me right back to something
that seems a little ridiculous.
If you mean the SHA1 checksum of some constant secret password, and
the primary key, possibly SERIAL, then sure - but that's ignoring the
other benefits of UUID. If I'm going to use UUID, I might as well use
only UUID, and use the SHA1 checksum of the constant secret password,
and the UUID. So, the idea has merit - and was suggested by another
person - but it doesn't mean using the UUID alone, without all the
extra layers on top I would have to write, is the wrong choice. Why
would I go out of my way to jump through so many hoops, when I can
very easily, use the UUID?
You're asking me to do a lot, only so that you can point out that
'UUID' is not necessary. Heck, I could use the timestamp, and a
random key as two separate fields. There are always work arounds.
Is it good to use work arounds? Or a type that already works fine
for the purposes? :-)
> > In my case, it isn't only pictures. I don't want people pulling all
> > the data off the site as a dump, and using it how they wish, but I do
> > wish to make the data freely available, and easily accessible from a
> > web browser.
> .htaccess? Cookies? Encrypted data? iptables? All of these are better
> solutions than random filenames.
Nope. The files are publically accessible. No reason the secure their
access, and securing their access makes the whole system less efficient.
> > Not that everybody should rip out SERIAL and replace it with UUID,
> > but it really isn't that bad, and in some cases, such as mine,
> > I don't see the point of using both, and choose to instead allow
> > UUID to solve many of my concerns at the same time, with an
> > acceptable for me cost in database pages.
> We're doing something similar to UUIDs here - LSID (Life Science
> Identifiers). But being as the whole point of LSIDs is to share
> data, the "random" factor is not needed. If ever PG did do something
> like UUID (and despite the paragraphs above, I could see a use for it),
> I'd rather do something like picking from a predetermined range of values
> at random and deplete the pool as you go rather than create large
> values with a low (but finite) probability of collision.
I don't need the random factor. I only need it to be difficult to guess.
The timestamps have sufficient granularity, that they are effectively
difficult to guess, and a foreach loop would be expensive. If none of
the bits were random, I would still be happy.
If I had LSID support, and no UUID support, perhaps I'd use LSID
instead. I have no special attachment to UUID, other than it satisfies
my concerns and purposes in a convenient manner. I can merge my tables
without fiddling with the sequence definitions. I can hand out
external identifiers for specific objects which are effectively
unguessable. I can avoid having two primary keys, and use only the
one. Any type that satisfied these concerns would make me happy.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/