Thread: Does this matter?
Does it make a performance difference if I use a char(20) or a char(36) as the primary key? My thought is no, but I would like to hear more opinions. And a little further off topic(since we have many database experts here), does it matter on MS SQL server 7? Thanks! -- Wei Weng Network Software Engineer KenCast Inc.
Wei, > Does it make a performance difference if I use a char(20) or a char(36) > as the primary key? My thought is no, but I would like to hear more > opinions. Yes, it does, though probably minor unless you have millions of records. CHAR is padded out to the specified length. Therefore the index on a char(36) column will be a little larger, and thus a little slower, than the char(20). Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless you used some of the extra 16 characters on most rows. Either way, for tables of a few thousand records, I doubt that you'll notice the difference. BTW, why not use a SERIAL value as a surrogate primary key? > And a little further off topic(since we have many database experts > here), does it matter on MS SQL server 7? Yes, same reason. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh: Since I need to use a GUID as the primary key, I have to use the char datatype. On Fri, 2002-11-01 at 15:23, Josh Berkus wrote: > Wei, > > > Does it make a performance difference if I use a char(20) or a char(36) > > as the primary key? My thought is no, but I would like to hear more > > opinions. > > Yes, it does, though probably minor unless you have millions of records. CHAR > is padded out to the specified length. Therefore the index on a char(36) > column will be a little larger, and thus a little slower, than the char(20). Does it affect the INSERT/UPDATE/DELETE operations on tables or simply the SELECT operation or both? > > Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless > you used some of the extra 16 characters on most rows. > > Either way, for tables of a few thousand records, I doubt that you'll notice > the difference. BTW, why not use a SERIAL value as a surrogate primary key? > > > And a little further off topic(since we have many database experts > > here), does it matter on MS SQL server 7? > > Yes, same reason. -- Wei Weng Network Software Engineer KenCast Inc.
> Wei, > > > Does it make a performance difference if I use a char(20) or a char(36) > > as the primary key? My thought is no, but I would like to hear more > > opinions. > > Yes, it does, though probably minor unless you have millions of records. CHAR > is padded out to the specified length. Therefore the index on a char(36) > column will be a little larger, and thus a little slower, than the char(20). > Really? According to this url (search for "Tip") there is no performance difference just a space difference. I don't know for sure either way, but if there is a difference the manual needs updating. http://www.postgresql.org/idocs/index.php?datatype-character.html -philip
On Fri, Nov 01, 2002 at 12:53:29PM -0800, Philip Hallstrom wrote: > > is padded out to the specified length. Therefore the index on a char(36) > > column will be a little larger, and thus a little slower, than the char(20). > > > > Really? According to this url (search for "Tip") there is no performance > difference just a space difference. I don't know for sure either way, but > if there is a difference the manual needs updating. Hmm. Maybe a clarification, but I don't think this is quite what the tip is talking about. The tip points out that part of the cost is "the increased storage" from the blank-padded type (char) as contrasted with non-padded types (like text). The tip isn't talking about whether a length of 20 is faster than a length of 36. Anyway, I can't really believe the length would be a big deal except on really huge tables. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Phillip, > Really? According to this url (search for "Tip") there is no performance > difference just a space difference. I don't know for sure either way, but > if there is a difference the manual needs updating. > > http://www.postgresql.org/idocs/index.php?datatype-character.html Actually, that note is intended to tell people that CHAR is not any faster than VARCHAR for the same-length string ... since CHAR *is* faster than VARCHAR in some systems, like MS SQL Server. -- -Josh Berkus
Andrew Sullivan wrote: > Hmm. Maybe a clarification, but I don't think this is quite what the > tip is talking about. The tip points out that part of the cost is > "the increased storage" from the blank-padded type (char) as > contrasted with non-padded types (like text). The tip isn't talking > about whether a length of 20 is faster than a length of 36. Anyway, > I can't really believe the length would be a big deal except on > really huge tables. It really depends on the access. I spend quite a bit of time optimizing database internals and the size of an index matters much more than is apparent in certain cases. This is especially true for medium sized tables. The real issue is the number of reads required to find a particular entry in the index. Assume a btree that tries to be 70% full. Assume 40 bytes for a header, 8 bytes overhead per index entry and an 8K btree page. The following represents the number of index entries that can be contained in both a two level and a three level btree. Type Bytes Items per page 2 3 ---- ------ ----- ------ ---------- char(36) 40 129 16,641 2,146,689 char(20) 24 203 41,209 8,365,427 Depending on the size of the table, the number of pages in the btree affect performance in two separate ways: 1) Cache hit ratio - This greatly depends on the way the tables are accessed but more densely packed btree indices are used more often and more likely to be present in a cache than less densely packed indices. 2) I/O time - If the number of items reaches a particular size then the btree will add an additional level which could result in a very expensive I/O operation per access. How this affects performance depends very specifically on the way the index is used. The problem is not necessarily the size of the table but the transitions in numbers of levels in the btree. For a table size of 200 to 15,000 tuples, there won't be a major difference. For a table size of 25,000 to 40,000 tuples, and assuming the root page is cached, an index lookup can be twice as fast with a char(20) as it is for a char(36) because in the one case a two-level btree handles the table while a three-level btree is needed for the other. This won't typically affect multi-user throughput as much since other backends will be working while the I/O's are waiting but it might affect the performance as seen from a single client. - Curtis
Wei, > Does it affect the INSERT/UPDATE/DELETE operations on tables or simply > the SELECT operation or both? All of the above. How many rows are we talking about, anyway? The difference may be academic. -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Wei Weng <wweng@kencast.com> writes: > Since I need to use a GUID as the primary key, I have to use the char > datatype. Try uniqueidentifier: http://archives.postgresql.org/pgsql-announce/2002-07/msg00001.php Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC