Thread: Re: [GENERAL] UUID's as primary keys
Tom Lane <tgl@sss.pgh.pa.us> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: > > The input functions get it, the output functions (bpcharout, > > bpcharsend, etc) don't. Which makes it kind of hard to print a raw > > value if you don't know how long it's going to be. They used to, but > > that was removed some time back. > Even back then you couldn't rely on the typmod value to be supplied; > it was quite likely to be passed as -1. The issue is not actually > with on-disk storage, it is with function/operator arguments and > results. Those have never been identified any more closely than by > giving a type OID. So for any value that came from a function, > you won't have a typmod, and you'd better be able to find out all > you need to know just by inspecting the value itself. Hence, length > words. Hm, so it could be stored on disk without the length header as long as the length header is added to the in-memory representation? I don't think the type system has hooks for reading and storing data to disk though. > This is all pretty off-topic for pgsql-general, isn't it? [moved to -hackers] -- greg
Greg Stark <gsstark@mit.edu> writes: > Hm, so it could be stored on disk without the length header as long as > the length header is added to the in-memory representation? I don't > think the type system has hooks for reading and storing data to disk > though. No, it doesn't, and we'd pay a nonzero price for allowing that. Currently the executor doesn't have to care (much) about whether a tuple is on-disk or in-memory --- the individual datums look the same either way. Allowing them to be different would force a lot of format conversion steps that currently need not happen. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Hm, so it could be stored on disk without the length header as long as > > the length header is added to the in-memory representation? I don't > > think the type system has hooks for reading and storing data to disk > > though. > > No, it doesn't, and we'd pay a nonzero price for allowing that. > Currently the executor doesn't have to care (much) about whether a > tuple is on-disk or in-memory --- the individual datums look the same > either way. Allowing them to be different would force a lot of > format conversion steps that currently need not happen. Is there ever a case where an entire tuple is passed around without knowing the typmod of an attribute in the tuple? The conversion would only really have to happen when the attribute is fetched or stored, not when the tuple is being passed around wholesale. But I have a feeling that would be more intrusive than just making the entire system typmod aware. -- greg
On Thu, Jun 29, 2006 at 02:40:15AM -0400, Greg Stark wrote: > > Greg Stark <gsstark@mit.edu> writes: > > No, it doesn't, and we'd pay a nonzero price for allowing that. > > Currently the executor doesn't have to care (much) about whether a > > tuple is on-disk or in-memory --- the individual datums look the same > > either way. Allowing them to be different would force a lot of > > format conversion steps that currently need not happen. > > Is there ever a case where an entire tuple is passed around without knowing > the typmod of an attribute in the tuple? A tuple is just an array of datums, with some header information. The problems come when you don't have a tuple anymore, but only the datum, like in arguments for functions. I think it's more a case that most places that deal with datums simply don't know about typmods. For example, the return type of a function can only be char, not char(16). If you consider the case of a function returning a RAW, the caller will have no way of knowing the typmod, they do know the type though. To be honest, it seems like a lot of work to save the four bytes of overhead for the varlena structure on disk if you're going to need it in memory anyway. And anything like RAW(16) which people want for UUIDs, if it's going to have a lot of functions associated with it, may as well just be a new type. I think time would be much better spent finding a way of allowing user-defined types to be created without using C functions. > The conversion would only really have to happen when the attribute is fetched > or stored, not when the tuple is being passed around wholesale. But I have a > feeling that would be more intrusive than just making the entire system typmod > aware. I'm not sure if tuples are ever passed wholesale very far. The first node to actually do anything with it (any join, expression or condition test) is going to need to deconstruct it. Consider where we currently we have a "Filter Cond" on a "Seq Scan". Currently the filter can access the datums directly on the disk page, with what you're proposing, it can't. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > A tuple is just an array of datums, with some header information. The > problems come when you don't have a tuple anymore, but only the datum, > like in arguments for functions. > > I think it's more a case that most places that deal with datums simply > don't know about typmods. For example, the return type of a function > can only be char, not char(16). If you consider the case of a function > returning a RAW, the caller will have no way of knowing the typmod, > they do know the type though. > > To be honest, it seems like a lot of work to save the four bytes of > overhead for the varlena structure on disk if you're going to need it > in memory anyway. And anything like RAW(16) which people want for > UUIDs, if it's going to have a lot of functions associated with it, may > as well just be a new type. For large databases storage density leads directly to speed. Saving four bytes of overhead on a 16-byte data structure would mean a 20% speed increase. Even if that's only helpful on a tenth of the columns you're still talking about a 2% speed increase for all queries on the table. A lot of databases use CHAR(1) for flags. The overhead is even worse there. > Consider where we currently we have a "Filter Cond" on a "Seq Scan". > Currently the filter can access the datums directly on the disk page, with > what you're proposing, it can't. Well it only can't if the data type has conversion functions. I'm not sure how complex it would be having pointers that *getattr sometimes return pointers to the disk page and sometimes return pointers to a palloced copy though. -- greg
Greg Stark wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > >> To be honest, it seems like a lot of work to save the four bytes of >> overhead for the varlena structure on disk if you're going to need it >> in memory anyway. And anything like RAW(16) which people want for >> UUIDs, if it's going to have a lot of functions associated with it, may >> as well just be a new type. >> > > For large databases storage density leads directly to speed. Saving four bytes > of overhead on a 16-byte data structure would mean a 20% speed increase. Even > if that's only helpful on a tenth of the columns you're still talking about a > 2% speed increase for all queries on the table. A lot of databases use CHAR(1) > for flags. The overhead is even worse there. > > I have to concur with this. Assume you use a bytea for a UUID that in turn is used as a primary key. The extra overhead will be reflected in all indexes, all foreign keys, etc. In a normalized database some tables may consist of UUID columns only. Regards, Thomas Hallgren
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > I have to concur with this. Assume you use a bytea for a UUID that in > turn is used as a primary key. The extra overhead will be reflected in > all indexes, all foreign keys, etc. In a normalized database some tables > may consist of UUID columns only. So you create a UUID type. It's cheap enough to create new types after all, that's one of postgresql's strengths. What I'm saying is that it's easier to create new fixed length types for the cases that need it, than it is to redo the entire type handling of the backend. And for people that want char(1), they should be using "char", which really is one byte (ex padding ofcourse). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > > I have to concur with this. Assume you use a bytea for a UUID that in > > turn is used as a primary key. The extra overhead will be reflected in > > all indexes, all foreign keys, etc. In a normalized database some tables > > may consist of UUID columns only. > > So you create a UUID type. It's cheap enough to create new types after > all, that's one of postgresql's strengths. What I'm saying is that it's > easier to create new fixed length types for the cases that need it, > than it is to redo the entire type handling of the backend. I guess my motivation here is that I feel currently char(n) is basically broken in Postgres. Sure it satisfies the letter of the specification, but it's failing to actually achieve anything for the users. There's no point at all in using char(n) in Postgres since it takes exactly the same amount of space as varchar() if you're always stuffing it full and more space if you're not. In the current setup the only reason for Postgres to have this data type at all is purely for legacy compatibility. It doesn't actually "work" in that it doesn't provide the space savings it's intended to and that would give users an actual reason to use it in new databases. -- greg
Greg Stark <gsstark@mit.edu> writes: > In the current setup the only reason for Postgres to have this data type at > all is purely for legacy compatibility. Yes. So? regards, tom lane
Martijn van Oosterhout wrote: > On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > >> I have to concur with this. Assume you use a bytea for a UUID that in >> turn is used as a primary key. The extra overhead will be reflected in >> all indexes, all foreign keys, etc. In a normalized database some tables >> may consist of UUID columns only. >> > > So you create a UUID type. It's cheap enough to create new types after > all, that's one of postgresql's strengths. It would be a whole lot easier if I could use a domain. > What I'm saying is that it's > easier to create new fixed length types for the cases that need it, > than it is to redo the entire type handling of the backend. > > Of course. But it's a matter of who does what. Your reasoning push the burden to the users. Regards, Thomas Hallgren
On Thu, Jun 29, 2006 at 06:40:13PM +0200, Thomas Hallgren wrote: > Martijn van Oosterhout wrote: > >On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > > > >>I have to concur with this. Assume you use a bytea for a UUID that in > >>turn is used as a primary key. The extra overhead will be reflected in > >>all indexes, all foreign keys, etc. In a normalized database some tables > >>may consist of UUID columns only. > >> > > > >So you create a UUID type. It's cheap enough to create new types after > >all, that's one of postgresql's strengths. > It would be a whole lot easier if I could use a domain. It seems to me that maybe the backend should include a 16-byte fixed length object (after all, we've got 1, 2, 4 and 8 bytes already) and then people can use that to build whatever they like, using domains, for example... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Thu, Jun 29, 2006 at 06:47:17PM +0200, Martijn van Oosterhout wrote: > It seems to me that maybe the backend should include a 16-byte fixed > length object (after all, we've got 1, 2, 4 and 8 bytes already) and > then people can use that to build whatever they like, using domains, > for example... Sooooo... Back to this. It won't happen unless somebody does it - and I realize that people are busy with their own projects, so unless somebody more willing and better suited will step up, I'm going to take a stab at getting advanced consensus. Please answer the below questions, and state whether your opinion is just an opinion, or whether you are stating it as a PostgreSQL maintainer and it is law. If you wish, you can rank preferences. 1) The added 128-bit type should take the form of: a) UUID, with all functions b) UUID, with only basic generation functions + encode/decode/indexable c) UUID, withonly encode/decode/indexable - generic except for the name of the type, and the encoding format. d) Generic 128-bittype - same as c) except may not encode or decode as UUID (dashes). Either a large number (hex string?), or binarydata. e) Generic n-byte binary data type generator. Not sure of feasibility of this at this point. See thread. 2) According to your answer in 1), the added 128-bit type should be: a) In core first. b) In contrib first. c) In pgfoundry first. Thanks, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Thu, Jul 06, 2006 at 12:12:18PM -0400, mark@mark.mielke.cc wrote: > Please answer the below questions, and state whether your opinion is > just an opinion, or whether you are stating it as a PostgreSQL > maintainer and it is law. If you wish, you can rank preferences. Do I have to pick only one? I'd choose firstly for: 1c) UUID, with only encode/decode/indexable - generic except for the name of the type, and the encoding format. 2a) In core first And in addation to that: 1b) UUID, with only basic generation functions + encode/decode/indexable 2b) In contrib first. And maybe finally: 1a) UUID, with all functions 2c) In pgfoundry first. IOW, I'm not so convinced that full UUID support should appear in core, but I think a 16-byte type should be available in core, with basic UUID functions in contrib and the full suite on pgfoundry. But that's just my opinion ofcourse. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 7/6/06, mark@mark.mielke.cc wrote: > > Please answer the below questions, and state whether your opinion is > just an opinion, or whether you are stating it as a PostgreSQL > maintainer and it is law. If you wish, you can rank preferences. > > 1) The added 128-bit type should take the form of: > > a) UUID, with all functions > 2) According to your answer in 1), the added 128-bit type should be: > > a) In core first. Opinion, 1 a, 2 a Jochem
mark@mark.mielke.cc writes: > e) Generic n-byte binary data type generator. Not sure of feasibility > of this at this point. See thread. I don't like the idea of a generator that would have to be manually invoked, though such a thing would be a fine tool for contrib or pgfoundry, I think it would never be a clean enough interface for core. On the other hand core could conceivably translate things like char(n) into such a type generated on the fly. That is, instead of having a single char oid it could check a cache of fixed length char(n) data types and if there isn't one already generate one on the fly. That would be somewhat grotty of an implementation but the user interface at least would be entirely transparent. If one day we change things to pass around typmod database designs wouldn't have to change at all. (Actually Postgres can never do this for char(n), at least not as long as we insist on making char/varchar/text locale-aware. Personally I think the default char/varchar/text locale should be C unless you specify otherwise on a per-column basis. But that seems to be a minority opinion. Postgres could however do this for separate raw binary datatypes like bit(n) or bytea(n).) In answer to your question, though my opinion carries no special weight at all, I would suggest adding a bare bones 16-byte data type to core and a second binary-compatible data type based on it that parsed/output as uuids. The extended uuid libraries should only go in pgfoundry/contrib. -- greg
Hi, Just MHO: > 1) The added 128-bit type should take the form of: > > c) UUID, with only encode/decode/indexable - generic except for the > name of the type, and the encoding format. > > 2) According to your answer in 1), the added 128-bit type should be: > > a) In core first. 1c is what I would need. 1b or 1a would be nice to have. - Sander
Greg Stark wrote: > In answer to your question, though my opinion carries no special weight at > all, I would suggest adding a bare bones 16-byte data type to core and a > second binary-compatible data type based on it that parsed/output as uuids. > The extended uuid libraries should only go in pgfoundry/contrib. > I second that. Regards, Thomas Hallgren
On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote: >> In answer to your question, though my opinion carries no special >> weight at >> all, I would suggest adding a bare bones 16-byte data type to core >> and a >> second binary-compatible data type based on it that parsed/output >> as uuids. >> The extended uuid libraries should only go in pgfoundry/contrib. > I second that. +1. If there's enough user demand we can look at adding the type to core (I don't see any real advantage to contrib over pgFoundry for this). I'm not sure if it makes sense to add a generic 16 byte RAW to core, either. I'd *much* rather see effort expended on a generic RAW type which had it's size defined as part of the type and didn't use varlena. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Saturday 08 July 2006 14:54, Jim Nasby wrote: > On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote: > >> In answer to your question, though my opinion carries no special > >> weight at > >> all, I would suggest adding a bare bones 16-byte data type to core > >> and a > >> second binary-compatible data type based on it that parsed/output > >> as uuids. > >> The extended uuid libraries should only go in pgfoundry/contrib. > > > > I second that. > > +1. If there's enough user demand we can look at adding the type to > core (I don't see any real advantage to contrib over pgFoundry for > this). The advantage of contrib over pgFoundry is that it will be packaged by the major distributions. Every distribution includes a package of the contrib modules. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
On Sat, Jul 08, 2006 at 05:54:26PM -0400, Jim Nasby wrote: > +1. If there's enough user demand we can look at adding the type to > core (I don't see any real advantage to contrib over pgFoundry for > this). I'm not sure if it makes sense to add a generic 16 byte RAW to > core, either. I'd *much* rather see effort expended on a generic RAW > type which had it's size defined as part of the type and didn't use > varlena. You could place a nice wrapper around type generators, which would let you say: DECLARE TYPE RAW(16); After which point you could use that type in function declarations and such. It would create an OID for that type would could be used as normal. I think that trying to get the backend to pay more attention to typmods is not going to be successful. Simply because functions and operators have an affect on the typmod and once you start relying on typmods to decode a tuple, you've got a real problem. As an example, what do you get when you concatenate two CHAR(4)'s? Do you get another CHAR(4) or is it a CHAR(8)? How does the backend know? You'd have to accompany each function with another function just to tell you how the typmods would be related. The only way out I can think of is that RAW(n) is merely a sort of template and RAW(x) cannot be used in a place where RAW(y) is expected (if x<>y). Hence it makes sense to have a template that people can instantiate instances of and let the rest of the system treat them as new types, unrelated to anything else. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.