Thread: Does this matter?

Does this matter?

From
Wei Weng
Date:
Does it make a performance difference if I use a char(20) or a char(36)
as the primary key? My thought is no, but I would like to hear more
opinions.

And a little further off topic(since we have many database experts
here), does it matter on MS SQL server 7?

Thanks!

--
Wei Weng
Network Software Engineer
KenCast Inc.



Re: Does this matter?

From
Josh Berkus
Date:
Wei,

> Does it make a performance difference if I use a char(20) or a char(36)
> as the primary key? My thought is no, but I would like to hear more
> opinions.

Yes, it does, though probably minor unless you have millions of records.  CHAR
is padded out to the specified length.  Therefore the index on a char(36)
column will be a little larger, and thus a little slower, than the char(20).

Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless
you used some of the extra 16 characters on most rows.

Either way, for tables of a few thousand records, I doubt that you'll notice
the difference.   BTW, why not use a SERIAL value as a surrogate primary key?

> And a little further off topic(since we have many database experts
> here), does it matter on MS SQL server 7?

Yes, same reason.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Does this matter?

From
Wei Weng
Date:
Josh:

Since I need to use a GUID as the primary key, I have to use the char
datatype.

On Fri, 2002-11-01 at 15:23, Josh Berkus wrote:
> Wei,
>
> > Does it make a performance difference if I use a char(20) or a char(36)
> > as the primary key? My thought is no, but I would like to hear more
> > opinions.
>
> Yes, it does, though probably minor unless you have millions of records.  CHAR
> is padded out to the specified length.  Therefore the index on a char(36)
> column will be a little larger, and thus a little slower, than the char(20).
Does it affect the INSERT/UPDATE/DELETE operations on tables or simply
the SELECT operation or both?

>
> Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless
> you used some of the extra 16 characters on most rows.
>
> Either way, for tables of a few thousand records, I doubt that you'll notice
> the difference.   BTW, why not use a SERIAL value as a surrogate primary key?
>
> > And a little further off topic(since we have many database experts
> > here), does it matter on MS SQL server 7?
>
> Yes, same reason.
--
Wei Weng
Network Software Engineer
KenCast Inc.



Re: Does this matter?

From
Philip Hallstrom
Date:
> Wei,
>
> > Does it make a performance difference if I use a char(20) or a char(36)
> > as the primary key? My thought is no, but I would like to hear more
> > opinions.
>
> Yes, it does, though probably minor unless you have millions of records.  CHAR
> is padded out to the specified length.  Therefore the index on a char(36)
> column will be a little larger, and thus a little slower, than the char(20).
>

Really?  According to this url (search for "Tip") there is no performance
difference just a space difference.  I don't know for sure either way, but
if there is a difference the manual needs updating.

http://www.postgresql.org/idocs/index.php?datatype-character.html

-philip


Re: Does this matter?

From
Andrew Sullivan
Date:
On Fri, Nov 01, 2002 at 12:53:29PM -0800, Philip Hallstrom wrote:

> > is padded out to the specified length.  Therefore the index on a char(36)
> > column will be a little larger, and thus a little slower, than the char(20).
> >
>
> Really?  According to this url (search for "Tip") there is no performance
> difference just a space difference.  I don't know for sure either way, but
> if there is a difference the manual needs updating.

Hmm.  Maybe a clarification, but I don't think this is quite what the
tip is talking about.  The tip points out that part of the cost is
"the increased storage" from the blank-padded type (char) as
contrasted with non-padded types (like text).  The tip isn't talking
about whether a length of 20 is faster than a length of 36.  Anyway,
I can't really believe the length would be a big deal except on
really huge tables.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Does this matter?

From
Josh Berkus
Date:
Phillip,

> Really?  According to this url (search for "Tip") there is no performance
> difference just a space difference.  I don't know for sure either way, but
> if there is a difference the manual needs updating.
>
> http://www.postgresql.org/idocs/index.php?datatype-character.html

Actually, that note is intended to tell people that CHAR is not any faster
than VARCHAR for the same-length string ... since CHAR *is* faster than
VARCHAR in some systems, like MS SQL Server.

--
-Josh Berkus



Re: Does this matter?

From
"Curtis Faith"
Date:
Andrew Sullivan wrote:
> Hmm.  Maybe a clarification, but I don't think this is quite what the
> tip is talking about.  The tip points out that part of the cost is
> "the increased storage" from the blank-padded type (char) as
> contrasted with non-padded types (like text).  The tip isn't talking
> about whether a length of 20 is faster than a length of 36.  Anyway,
> I can't really believe the length would be a big deal except on
> really huge tables.

It really depends on the access. I spend quite a bit of time optimizing
database internals and the size of an index matters much more than is
apparent in certain cases. This is especially true for medium sized tables.

The real issue is the number of reads required to find a particular entry in
the index.

Assume a btree that tries to be 70% full. Assume 40 bytes for a header, 8
bytes overhead per index entry and an 8K btree page.

The following represents the number of index entries that can be contained in
both a two level and a three level btree.

    Type    Bytes     Items per page         2              3
    ----   ------              -----    ------     ----------
 char(36)      40                129    16,641      2,146,689
 char(20)      24                203    41,209      8,365,427

Depending on the size of the table, the number of pages in the btree affect
performance in two separate ways:

1) Cache hit ratio - This greatly depends on the way the tables are accessed
but more densely packed btree indices are used more often and more likely to
be present in a cache than less densely packed indices.

2) I/O time - If the number of items reaches a particular size then the btree
will add an additional level which could result in a very expensive I/O
operation per access. How this affects performance depends very specifically
on the way the index is used.

The problem is not necessarily the size of the table but the transitions in
numbers of levels in the btree. For a table size of 200 to 15,000 tuples,
there won't be a major difference.

For a table size of 25,000 to 40,000 tuples, and assuming the root page is
cached, an index lookup can be twice as fast with a char(20) as it is for a
char(36) because in the one case a two-level btree handles the table while a
three-level btree is needed for the other.

This won't typically affect multi-user throughput as much since other
backends will be working while the I/O's are waiting but it might affect the
performance as seen from a single client.

- Curtis



Re: Does this matter?

From
Josh Berkus
Date:
Wei,

> Does it affect the INSERT/UPDATE/DELETE operations on tables or simply
> the SELECT operation or both?

All of the above.  How many rows are we talking about, anyway?  The difference
may be academic.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco


Re: Does this matter?

From
Neil Conway
Date:
Wei Weng <wweng@kencast.com> writes:
> Since I need to use a GUID as the primary key, I have to use the char
> datatype.

Try uniqueidentifier:

    http://archives.postgresql.org/pgsql-announce/2002-07/msg00001.php

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC