Re: Does this matter? - Mailing list pgsql-performance

From Curtis Faith
Subject Re: Does this matter?
Date
Msg-id DMEEJMCDOJAKPPFACMPMGEONCEAA.curtis@galtair.com
Whole thread Raw
In response to Re: Does this matter?  (Andrew Sullivan <andrew@libertyrms.info>)
List pgsql-performance
Andrew Sullivan wrote:
> Hmm.  Maybe a clarification, but I don't think this is quite what the
> tip is talking about.  The tip points out that part of the cost is
> "the increased storage" from the blank-padded type (char) as
> contrasted with non-padded types (like text).  The tip isn't talking
> about whether a length of 20 is faster than a length of 36.  Anyway,
> I can't really believe the length would be a big deal except on
> really huge tables.

It really depends on the access. I spend quite a bit of time optimizing
database internals and the size of an index matters much more than is
apparent in certain cases. This is especially true for medium sized tables.

The real issue is the number of reads required to find a particular entry in
the index.

Assume a btree that tries to be 70% full. Assume 40 bytes for a header, 8
bytes overhead per index entry and an 8K btree page.

The following represents the number of index entries that can be contained in
both a two level and a three level btree.

    Type    Bytes     Items per page         2              3
    ----   ------              -----    ------     ----------
 char(36)      40                129    16,641      2,146,689
 char(20)      24                203    41,209      8,365,427

Depending on the size of the table, the number of pages in the btree affect
performance in two separate ways:

1) Cache hit ratio - This greatly depends on the way the tables are accessed
but more densely packed btree indices are used more often and more likely to
be present in a cache than less densely packed indices.

2) I/O time - If the number of items reaches a particular size then the btree
will add an additional level which could result in a very expensive I/O
operation per access. How this affects performance depends very specifically
on the way the index is used.

The problem is not necessarily the size of the table but the transitions in
numbers of levels in the btree. For a table size of 200 to 15,000 tuples,
there won't be a major difference.

For a table size of 25,000 to 40,000 tuples, and assuming the root page is
cached, an index lookup can be twice as fast with a char(20) as it is for a
char(36) because in the one case a two-level btree handles the table while a
three-level btree is needed for the other.

This won't typically affect multi-user throughput as much since other
backends will be working while the I/O's are waiting but it might affect the
performance as seen from a single client.

- Curtis



pgsql-performance by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Does this matter?
Next
From: Josh Berkus
Date:
Subject: Re: Does this matter?