Thread: Does max size of varchar influence index size

Does max size of varchar influence index size

From
Franck Routier
Date:
Hi,

I have problems with my database becoming huge in size (around 150 GB
right now, and 2/3 for only three tables, each having around 30 millions
tuples. Space is spent mainly on indices.).

I have a lot of multi-column varchar primary keys (natural keys), and
lot of foreign keys on these tables and thus a lot of indices.

When using VARCHAR, we defaulted to VARCHAR(32) (because on _some_ of
the identifiers, we have to apply md5).

We assumed that using VARCHAR(32) but having values at most 4 characters
long (for example) wouldn't influence indices size, ie it would be the
same as using VARCHAR(4) to keep the example.

Now I really doubt if we were right :)

So, what should we expect ? And are there other factors influencing
indices size ?

Thanks,
Franck



Re: Does max size of varchar influence index size

From
Mark Roberts
Date:
On Mon, 2008-06-30 at 18:57 +0200, Franck Routier wrote:
> Hi,
>
> I have problems with my database becoming huge in size (around 150 GB
> right now, and 2/3 for only three tables, each having around 30 millions
> tuples. Space is spent mainly on indices.).
>
> I have a lot of multi-column varchar primary keys (natural keys), and
> lot of foreign keys on these tables and thus a lot of indices.
>
> When using VARCHAR, we defaulted to VARCHAR(32) (because on _some_ of
> the identifiers, we have to apply md5).
>
> We assumed that using VARCHAR(32) but having values at most 4 characters
> long (for example) wouldn't influence indices size, ie it would be the
> same as using VARCHAR(4) to keep the example.
>
> Now I really doubt if we were right :)
>
> So, what should we expect ? And are there other factors influencing
> indices size ?
>
> Thanks,
> Franck

Is there any particular reason that you're not using a surrogate key?  I
found that switching from natural to surrogate keys in a similar
situation made the indexes not only smaller, but faster.

It really only became an issue after our individual tables got larger
than 20-25G, but I think we got lucky and headed the issue off at the
pass.

I think it should be fairly trivial* to set up a test case using
pg_total_relation_size() to determine whether your suspicions are
correct.

-Mark

* It may not be as trivial as I say, or I'd have done it in the 5
minutes it took to write this email.


Re: Does max size of varchar influence index size

From
Franck Routier
Date:
Le lundi 30 juin 2008 à 13:24 -0700, Mark Roberts a écrit :

Hi Mark,

> Is there any particular reason that you're not using a surrogate key?

Well, human readability is the main reason, no standard way to handle
sequences between databases vendors being the second... (and also
problems when copying data between different instances of the database).

So surrogate keys could be a way, and I am considering this, but I'd
rather avoid it :)

Franck





Re: Does max size of varchar influence index size

From
Richard Huxton
Date:
Franck Routier wrote:
> Le lundi 30 juin 2008 à 13:24 -0700, Mark Roberts a écrit :
>
> Hi Mark,
>
>> Is there any particular reason that you're not using a surrogate key?
>
> Well, human readability is the main reason, no standard way to handle
> sequences between databases vendors being the second... (and also
> problems when copying data between different instances of the database).
>
> So surrogate keys could be a way, and I am considering this, but I'd
> rather avoid it :)

Might be worth looking at 8.3 - that can save you significant space with
short varchar's - the field-length is no longer fixed at 32 bits but can
  adjust itself automatically. Apart from the overheads, you need the
space to store the text in each string, not the maximum possible.

--
   Richard Huxton
   Archonet Ltd