Thread: Do non-sequential primary keys slow performance significantly??
Hello, The most difficult part of this question is justifying WHY we would want to use random primary keys! There is a very strong reason for doing so, although not quite compelling. We are Java developers developing desktop applications that persist data in postgres. This is a pretty "low spec" database as it will only servicing a few PCs. We do this via Hibernate so our SQL & Postrges skills and insights are relatively lacking. I certainly don't really understand the gory internal details of postgres. We have an internal proposal to use what are virtually random 128 bit numbers for our primary keys. These are not truley random in any mathematical sense, and they will be unique, but they are certainly NOT sequential. In my ignorant bliss I would suspect that postgres will run more slowly using random primary keys. Can anyone provide any "rules of thumb" for how this may effect performance?? Is it a plain dumb idea?? Or maybe it would have only modest impact?? Any comments, insights, pointers are very much appreciated, Thanks, -Damian
"Damian C" <jamianb@gmail.com> writes: > In my ignorant bliss I would suspect that postgres will run more > slowly using random primary keys. More slowly compared to what? If your performance bottleneck is concurrent insertions, random keys should win over sequential keys because the insert load is distributed over the whole index, leading to less page-level lock contention. There might be other scenarios where sequential keys are better. For a database servicing "only a few PCs" I'm not sure you should even spend any time thinking about it --- do what's easiest for your application code. regards, tom lane
On 29/9/2006 14:59, "Damian C" <jamianb@gmail.com> wrote: > Hello, > The most difficult part of this question is justifying WHY we would > want to use random primary keys! There is a very strong reason for > doing so, although not quite compelling. Either you want it that way because that's your first thought of how to do it or you have reasoning behind why you want to do it that way. > We are Java developers developing desktop applications that persist > data in postgres. This is a pretty "low spec" database as it will only > servicing a few PCs. We do this via Hibernate so our SQL & Postrges > skills and insights are relatively lacking. I certainly don't really > understand the gory internal details of postgres. > > We have an internal proposal to use what are virtually random 128 bit > numbers for our primary keys. These are not truley random in any > mathematical sense, and they will be unique, but they are certainly > NOT sequential. Specifying it as the primary key or a unique index will ensure the uniqueness of entries. > In my ignorant bliss I would suspect that postgres will run more > slowly using random primary keys. Can anyone provide any "rules of > thumb" for how this may effect performance?? Is it a plain dumb > idea?? Or maybe it would have only modest impact?? > > Any comments, insights, pointers are very much appreciated, > As for being non-sequential this is how a live database will end up (to some degree) - as rows get deleted the auto assigned sequential numbers get deleted along with them and don't get re-used leaving gaps. I believe the only performance different will stem from using a 128bit column for the key against say a 32 or 64 bit serial. As in larger data to save and search through and compare. When you say that you want it as your primary key, how much will you use it to link to other tables? If it is creating a link to 3 other tables it wouldn't be a problem but if you use it to link 50 tables with complex joins in your searches then you may want to reconsider or develop a work around. Or is it simply just your unique row identifier? The fact that you say it will only service a few pc's would lead me to think that you aren't expecting millions of rows to be entered, this would suggest that the performance difference would not be intolerable even if it is noticeable to your application and you could get better performance with a different design. -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz
> The most difficult part of this question is justifying WHY we would > want to use random primary keys! There is a very strong reason for > doing so, although not quite compelling. One problem with using random generated primary keys that I've recently read about deal with insert failing do to primary key duplication. If the size of your dataset grows to become a significant percentage of the size of the integer type used for your random primary key, the probability of inserting a duplicated number dramatically increases. I imagine that this problem could contribute to poor preformance for large bulk inserts that have to add logic for dealing with re-trying a insert if a duplicate number is created. Regards, Richard Broersma Jr.
On Fri, Sep 29, 2006 at 08:10:23 -0700, Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > The most difficult part of this question is justifying WHY we would > > want to use random primary keys! There is a very strong reason for > > doing so, although not quite compelling. > > One problem with using random generated primary keys that I've > recently read about deal with insert failing do to primary key > duplication. > > If the size of your dataset grows to become a significant percentage > of the size of the integer type used for your random primary key, > the probability of inserting a duplicated number dramatically > increases. I imagine that this problem could contribute to poor > preformance for large bulk inserts that have to add logic for > dealing with re-trying a insert if a duplicate number is created. They are using 128 bit keys! If their random number generator actually works, they shouldn't have a problem until they have generated on the order of 2^64 keys. That isn't likely to happen any time soon.