Thread: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Hello. I am currently playing with UUID data type and try to use it to store provided by third party (Hewlett-Packard) application. The problem is they format UUIDs as 0000-0000-0000-0000-0000-0000-0000-0000, so I have to replace(text,'-','')::uuid for this kind of data. Nooow, the case is quite simple and it might be that there are other applications formatting UUIDs too liberally. I am working on a patch to support this format (yes, it is a simple modification). And in the meanwhile I would like to ask you what do you think about it? Cons: Such format is not standard. Pros: This will help UUID data type adoption. [1] While good applications format their data well, there are others which don't follow standards. Also I think it is easier for a human being to enter UUID as 8 times 4 digits. Your thoughts? Should I submit a patch? Regards, Dawid [1]: My first thought when I received the error message was "hey! this is not an UUID, it is too long/too short!", only later did I check that they just don't format it too well.
Dawid, > I am working on a patch to support this format (yes, it is a simple > modification). I'd suggest writing a formatting function for UUIDs instead. Not sure what it should be called, though. "to_char" is pretty overloaded right now. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
> > I am working on a patch to support this format (yes, it is a simple > > modification). There was a proposal and a discussion regarding how this datatype would be before I started developing it. We decided to go with the format proposed in RFC. Unless there is strong case, I doubt any non standard formatting will be accepted into core. IIRC we where also opposed to support java like formatted uuid's back then. This is no different. Regards, Gevik.
Josh Berkus <josh@agliodbs.com> writes: >> I am working on a patch to support this format (yes, it is a simple >> modification). > I'd suggest writing a formatting function for UUIDs instead. That seems like overkill, if not outright encouragement of people to come up with yet other nonstandard formats for UUIDs. I think the question we have to answer is whether we want to be complicit in the spreading of a nonstandard UUID format. Even if we answer "yes" for this HP case, it doesn't follow that we should create a mechanism for anybody to do anything with 'em. That way lies the madness people already have to cope with for datetime data :-( regards, tom lane
On Thu, Feb 28, 2008 at 1:19 AM, Tom Lane wrote: > I think the question we have to answer is whether we want to be > complicit in the spreading of a nonstandard UUID format. I don't. I have patched the UUID input and output functions to be compatible with Adobe ColdFusion (http://adobe.com/products/coldfusion/ uses 8x-4x-4x-16x), and while I have released them I have deliberately made the changes incompatible with other formats and will not submit them to PostgreSQL because I want Adobe to fix ColdFusion to use the standard format. Jochem
Tom, > I think the question we have to answer is whether we want to be > complicit in the spreading of a nonstandard UUID format. Even if > we answer "yes" for this HP case, it doesn't follow that we should > create a mechanism for anybody to do anything with 'em. That way > lies the madness people already have to cope with for datetime > data :-( Well, I guess the question is: if we don't offer some builtin way to render non-standard formats built into company products, will those companies fix their format or just not use PostgreSQL? -- Josh Berkus PostgreSQL @ Sun San Francisco
On Thu, Feb 28, 2008 at 08:58:01AM -0800, Josh Berkus wrote: > Well, I guess the question is: if we don't offer some builtin way to render > non-standard formats built into company products, will those companies fix > their format or just not use PostgreSQL? Well, there is an advantage that Postgres has that some others don't: you can extend Postgres pretty easily. That suggests to me a reason to be conservative in what we "build in". This is consistent with the principle, "Be conservative in what you send, and liberal in what you accept." A
> > Well, I guess the question is: if we don't offer some builtin way to render > > non-standard formats built into company products, will those companies fix > > their format or just not use PostgreSQL? > > Well, there is an advantage that Postgres has that some others don't: you > can extend Postgres pretty easily. That suggests to me a reason to be > conservative in what we "build in". This is consistent with the principle, > "Be conservative in what you send, and liberal in what you accept." Well, then the uuid input function should most likely disregard all -, and accept the 4x-, 8x- formats and the like on input. Andreas
On Thu, Feb 28, 2008 at 08:06:46PM +0100, Zeugswetter Andreas ADI SD wrote: > > > > Well, I guess the question is: if we don't offer some builtin way to > render > > > non-standard formats built into company products, will those > companies fix > > > their format or just not use PostgreSQL? > > > > Well, there is an advantage that Postgres has that some others don't: > you > > can extend Postgres pretty easily. That suggests to me a reason to be > > conservative in what we "build in". This is consistent with the > principle, > > "Be conservative in what you send, and liberal in what you accept." > > Well, then the uuid input function should most likely disregard all -, > and accept the 4x-, 8x- formats and the like on input. > > Andreas > > We need to support the standard definition. People not using the standard need to know that and explicitly acknowledge that by implementing the conversion process themselves. Accepting random input puts a performance hit on everybody following the standard. It is the non-standard users who should pay that cost. Cheers, Ken
Kenneth Marshall wrote: > conversion process themselves. Accepting random input puts a performance > hit on everybody following the standard. Why is that necessarily the case? Why not have a liberal parser and a configurable switch that determines whether non-standard forms are liberally accepted, accepted with a logged warning, or rejected? James
James Mansion wrote: > Kenneth Marshall wrote: >> conversion process themselves. Accepting random input puts a performance >> hit on everybody following the standard. > Why is that necessarily the case? > > Why not have a liberal parser and a configurable switch that > determines whether non-standard > forms are liberally accepted, accepted with a logged warning, or > rejected? I recall there being a measurable performance difference between the most liberal parser, and the most optimized parser, back when I wrote one for PostgreSQL. I don't know how good the one in use for PostgreSQL 8.3 is. As to whether the cost is noticeable to people or not - that depends on what they are doing. The problem is that a UUID is pretty big, and parsing it liberally means a loop. My personal opinion is that this is entirely a philosophical issue, and that both sides have merits. There is no reason for PostgreSQL to support all formats, not matter how non-standard, for every single type. So, why would UUID be special? Because it's easy to do is not necessarily a good reason. But then, it's not a bad reason either. Cheers, mark -- Mark Mielke <mark@mielke.cc>
Mark Mielke wrote: > I recall there being a measurable performance difference between the > most liberal parser, and the most optimized parser, back when I wrote > one for PostgreSQL. I don't know how good the one in use for > PostgreSQL 8.3 is. As to whether the cost is noticeable to people or > not - that depends on what they are doing. The problem is that a UUID > is pretty big, and parsing it liberally means a loop. > It just seems odd - I would have thought one would use re2c or ragel to generate something and the performance would essentially be O[n] on the input length in characters - using either a collection of allowed forms or an engine that normalises case and discards the '-' characters between any hex pairs. So yes these would have a control loop. Is that so bad? Either way its hard to imagine how parsing a string of this length could create a measurable performance issue compared to what will happen with the value post parse. James
On Thu, Feb 28, 2008 at 06:45:18PM -0500, Mark Mielke wrote: > My personal opinion is that this is entirely a philosophical issue, and > that both sides have merits. I think it depends on what you're optimising for: initial development time, maintaince time or run time. > There is no reason for PostgreSQL to > support all formats, not matter how non-standard, for every single type. > So, why would UUID be special? Because it's easy to do is not > necessarily a good reason. But then, it's not a bad reason either. I never really buy the "performance" argument. I much prefer the correctness argument, if the code is doing something strange I'd prefer to know about it as soon as possible. This generally means that I'm optimising for maintaince. It's a similar argument to why lots of automatic casts were removed from 8.3, it generally doesn't hurt but the few times it does it's going to be bad and if you're doing something strange to start with it's better to be explicit about it. Sam
Andrew Sullivan <ajs@crankycanuck.ca> writes: > "Be conservative in what you send, and liberal in what you accept." Yeah, I was about to quote that same maxim myself. I don't have a big problem with allowing uuid_in to accept known format variants. (I'm not sure about allowing a hyphen *anywhere*, because that could lead to accepting things that weren't meant to be a UUID at all, but this HP format seems regular enough that that's not a serious objection to it.) What I was really complaining about was Josh's suggestion that we invent a function to let users *output* UUIDs in random-format-of-the-week. I can't imagine much good coming of that. I think we should keep uuid_out emitting only the RFC-standardized format. regards, tom lane
James Mansion wrote: > Mark Mielke wrote: >> I recall there being a measurable performance difference between the >> most liberal parser, and the most optimized parser, back when I wrote >> one for PostgreSQL. I don't know how good the one in use for >> PostgreSQL 8.3 is. As to whether the cost is noticeable to people or >> not - that depends on what they are doing. The problem is that a UUID >> is pretty big, and parsing it liberally means a loop. >> > It just seems odd - I would have thought one would use re2c or ragel > to generate something and the performance would essentially be O[n] on > the input length in characters - using either a collection of allowed > forms or an engine that normalises case and discards the '-' > characters between any hex pairs. Instruction level parallelism allows for multiple hex values to be processed in parallel, whereas a loop relies on branch prediction and speculative load and store? :-) The liberal version is difficult to unroll. The strict version is easy to unroll. > So yes these would have a control loop. Is that so bad? > > Either way its hard to imagine how parsing a string of this length > could create a measurable performance issue compared to what will > happen with the value post parse. I think so too. Cheers, mark -- Mark Mielke <mark@mielke.cc>
On Fri, Feb 29, 2008 at 9:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Sullivan <ajs@crankycanuck.ca> writes: > > "Be conservative in what you send, and liberal in what you accept." > > Yeah, I was about to quote that same maxim myself. I don't have a big > problem with allowing uuid_in to accept known format variants. (I'm > not sure about allowing a hyphen *anywhere*, because that could lead to > accepting things that weren't meant to be a UUID at all, but this HP > format seems regular enough that that's not a serious objection to it.) This seems like a good enough opportunity to mention an idea that I had while/after doing the enum patch. The patch was fairly intrusive for something that was just adding a type because postgresql isn't really set up for parameterized types other than core types. The idea would be to extend the enum mechanism to allow UDTs etc to be parameterized, and enums would just become one use of the mechanism. Other obvious examples that I had in mind were allowing variable lengths for that binary data type with hex IO for e.g. differently sized checksums that people want, and allowing different formats for uuids. So the idea as applied to this case would be to do the enum-style typesafe thing, ie: create type coldfusion_uuid as generic_uuid('xxxx-xxxx-xxxx-xxxx'); ...then just use that. I had some thoughts about whether it would be worth allowing inline declarations of such types inside table creation statements as well, and there are various related issues and thoughts on implementation which I won't go into in this email. Do people think the idea has legs, though? > What I was really complaining about was Josh's suggestion that we invent > a function to let users *output* UUIDs in random-format-of-the-week. > I can't imagine much good coming of that. I think we should keep > uuid_out emitting only the RFC-standardized format. Well, if the application is handing them to us in that format, it might be a bit surprised if it gets back a "fixed" one. The custom type approach wouldn't have that side effect. Cheers Tom
"Tom Dunstan" <pgsql@tomd.cc> writes: > This seems like a good enough opportunity to mention an idea that I > had while/after doing the enum patch. The patch was fairly intrusive > for something that was just adding a type because postgresql isn't > really set up for parameterized types other than core types. The idea > would be to extend the enum mechanism to allow UDTs etc to be > parameterized, and enums would just become one use of the mechanism. Isn't this reasonably well covered by Teodor's work to support typmods for user-defined types? We've discussed how the typmod could be effectively a key into a system catalog someplace, thus allowing it to represent more than just an int32 worth of stuff. I'm not seeing where your proposal accomplishes more than that can. regards, tom lane
Added to TODO: * Allow the UUID type to accept non-standard formats http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php --------------------------------------------------------------------------- Dawid Kuroczko wrote: > Hello. > > I am currently playing with UUID data type and try to use it to store provided > by third party (Hewlett-Packard) application. The problem is they > format UUIDs as > 0000-0000-0000-0000-0000-0000-0000-0000, so I have to > replace(text,'-','')::uuid for > this kind of data. > > Nooow, the case is quite simple and it might be that there are other > applications > formatting UUIDs too liberally. > > I am working on a patch to support this format (yes, it is a simple > modification). > > And in the meanwhile I would like to ask you what do you think about it? > > Cons: Such format is not standard. > > Pros: This will help UUID data type adoption. [1] While good > applications format > their data well, there are others which don't follow standards. Also > I think it is > easier for a human being to enter UUID as 8 times 4 digits. > > Your thoughts? Should I submit a patch? > > Regards, > Dawid > > [1]: My first thought when I received the error message was "hey! this > is not an UUID, > it is too long/too short!", only later did I check that they just > don't format it too well. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +