Thread: Does max size of varchar influence index size
Hi, I have problems with my database becoming huge in size (around 150 GB right now, and 2/3 for only three tables, each having around 30 millions tuples. Space is spent mainly on indices.). I have a lot of multi-column varchar primary keys (natural keys), and lot of foreign keys on these tables and thus a lot of indices. When using VARCHAR, we defaulted to VARCHAR(32) (because on _some_ of the identifiers, we have to apply md5). We assumed that using VARCHAR(32) but having values at most 4 characters long (for example) wouldn't influence indices size, ie it would be the same as using VARCHAR(4) to keep the example. Now I really doubt if we were right :) So, what should we expect ? And are there other factors influencing indices size ? Thanks, Franck
On Mon, 2008-06-30 at 18:57 +0200, Franck Routier wrote: > Hi, > > I have problems with my database becoming huge in size (around 150 GB > right now, and 2/3 for only three tables, each having around 30 millions > tuples. Space is spent mainly on indices.). > > I have a lot of multi-column varchar primary keys (natural keys), and > lot of foreign keys on these tables and thus a lot of indices. > > When using VARCHAR, we defaulted to VARCHAR(32) (because on _some_ of > the identifiers, we have to apply md5). > > We assumed that using VARCHAR(32) but having values at most 4 characters > long (for example) wouldn't influence indices size, ie it would be the > same as using VARCHAR(4) to keep the example. > > Now I really doubt if we were right :) > > So, what should we expect ? And are there other factors influencing > indices size ? > > Thanks, > Franck Is there any particular reason that you're not using a surrogate key? I found that switching from natural to surrogate keys in a similar situation made the indexes not only smaller, but faster. It really only became an issue after our individual tables got larger than 20-25G, but I think we got lucky and headed the issue off at the pass. I think it should be fairly trivial* to set up a test case using pg_total_relation_size() to determine whether your suspicions are correct. -Mark * It may not be as trivial as I say, or I'd have done it in the 5 minutes it took to write this email.
Le lundi 30 juin 2008 à 13:24 -0700, Mark Roberts a écrit : Hi Mark, > Is there any particular reason that you're not using a surrogate key? Well, human readability is the main reason, no standard way to handle sequences between databases vendors being the second... (and also problems when copying data between different instances of the database). So surrogate keys could be a way, and I am considering this, but I'd rather avoid it :) Franck
Franck Routier wrote: > Le lundi 30 juin 2008 à 13:24 -0700, Mark Roberts a écrit : > > Hi Mark, > >> Is there any particular reason that you're not using a surrogate key? > > Well, human readability is the main reason, no standard way to handle > sequences between databases vendors being the second... (and also > problems when copying data between different instances of the database). > > So surrogate keys could be a way, and I am considering this, but I'd > rather avoid it :) Might be worth looking at 8.3 - that can save you significant space with short varchar's - the field-length is no longer fixed at 32 bits but can adjust itself automatically. Apart from the overheads, you need the space to store the text in each string, not the maximum possible. -- Richard Huxton Archonet Ltd