Re: default index created for primary key - Mailing list pgsql-general
From | Frank D. Engel, Jr. |
---|---|
Subject | Re: default index created for primary key |
Date | |
Msg-id | B510C398-544B-11D9-AF71-0050E410655F@fjrhome.net Whole thread Raw |
In response to | Re: default index created for primary key ("vinita bansal" <sagivini@hotmail.com>) |
List | pgsql-general |
On Dec 22, 2004, at 12:09 PM, vinita bansal wrote: <excerpt> I am still not clear on why postgres has this restriction? By uniqueness, you mean to say that if later anyone wants to add a primary key constraint on a table which already has a primary key defined, postgres will use this index to determine that there is already a primary key defined and would not allow to add this constraint since a table cannot have two primary keys?? </excerpt> No, an index is required for efficiency. Consider a table with a column which must be unique. Assume there are 350,000 rows in the table. Now *every time* you insert a new row or perform an update which changes that unique column, assuming no index on the column, the database would need to check all 350,000 rows individually to determine that the value is in fact unique. With an index on the column, it is relatively quick for the database to determine that the value is unique, as it does not need to check nearly as many values.. To see this (rough example), start with an empty table with a single column, which is a unique integer column. Now add values and watch what happens to an index (use a fixed-width font): 4 53 72 15 23 19 3 12 8 Index: <fixed><bigger>4 4 \ 53 53 / \ 4 72 53 / \ 4 72 \ 15 53 / \ 15 72 / \ 4 23 19 / \ 15 53 / \ \ 4 23 72 15 / \ / \ / \ 4 53 / \ / \ 3 23 19 72 15 / \ / \ / \ 4 53 / \ / \ 3 23 19 72 \ 12 15 / \ / \ / \ 4 53 / \ / \ 8 23 19 72 / \ 3 12</bigger></fixed> Now the user tries to insert 12, which is already in the table. In order to determine that 12 is in the table, the database could scan every value in the table until it finds it. In this case, it would need to check 8 rows. Using the index, it would only need to check 4 values, cutting the time in half. In a few cases, it may take marginally longer (2 as opposed to 1 for the value 4), but on average, 5 rows for unindexed vs. 2.8 rows for indexed, shows that the index has a definite advantage. Now extend this to 350,000 rows. Without an index, you'd need to check an average of about 175,000 rows just to determine that a value was there. And if the value is not there, as will more commonly be the case, you'd need to check them all. With an index like the ones I used above, you would need to check *at most* 19 values. You begin to see why PostgreSQL requires an index here. ----------------------------------------------------------- Frank D. Engel, Jr. <<fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $ On Dec 22, 2004, at 12:09 PM, vinita bansal wrote: > > I am still not clear on why postgres has this restriction? > By uniqueness, you mean to say that if later anyone wants to add a > primary key constraint on a table which already has a primary key > defined, postgres will use this index to determine that there is > already a primary key defined and would not allow to add this > constraint since a table cannot have two primary keys?? No, an index is required for efficiency. Consider a table with a column which must be unique. Assume there are 350,000 rows in the table. Now *every time* you insert a new row or perform an update which changes that unique column, assuming no index on the column, the database would need to check all 350,000 rows individually to determine that the value is in fact unique. With an index on the column, it is relatively quick for the database to determine that the value is unique, as it does not need to check nearly as many values.. To see this (rough example), start with an empty table with a single column, which is a unique integer column. Now add values and watch what happens to an index (use a fixed-width font): 4 53 72 15 23 19 3 12 8 Index: 4 4 \ 53 53 / \ 4 72 53 / \ 4 72 \ 15 53 / \ 15 72 / \ 4 23 19 / \ 15 53 / \ \ 4 23 72 15 / \ / \ / \ 4 53 / \ / \ 3 23 19 72 15 / \ / \ / \ 4 53 / \ / \ 3 23 19 72 \ 12 15 / \ / \ / \ 4 53 / \ / \ 8 23 19 72 / \ 3 12 Now the user tries to insert 12, which is already in the table. In order to determine that 12 is in the table, the database could scan every value in the table until it finds it. In this case, it would need to check 8 rows. Using the index, it would only need to check 4 values, cutting the time in half. In a few cases, it may take marginally longer (2 as opposed to 1 for the value 4), but on average, 5 rows for unindexed vs. 2.8 rows for indexed, shows that the index has a definite advantage. Now extend this to 350,000 rows. Without an index, you'd need to check an average of about 175,000 rows just to determine that a value was there. And if the value is not there, as will more commonly be the case, you'd need to check them all. With an index like the ones I used above, you would need to check *at most* 19 values. You begin to see why PostgreSQL requires an index here. ----------------------------------------------------------- Frank D. Engel, Jr. <fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $
Attachment
pgsql-general by date: