Re: default index created for primary key - Mailing list pgsql-general

From Frank D. Engel, Jr.
Subject Re: default index created for primary key
Date
Msg-id B510C398-544B-11D9-AF71-0050E410655F@fjrhome.net
Whole thread Raw
In response to Re: default index created for primary key  ("vinita bansal" <sagivini@hotmail.com>)
List pgsql-general

On Dec 22, 2004, at 12:09 PM, vinita bansal wrote:

<excerpt>

I am still not clear on why postgres has this restriction?

By uniqueness, you mean to say that if later anyone wants to add a
primary key constraint on a table which already has a primary key
defined, postgres will use this index to determine that there is
already a primary key defined and would not allow to add this
constraint since a table cannot have two primary keys??

</excerpt>

No, an index is required for efficiency.


Consider a table with a column which must be unique.  Assume there are
350,000 rows in the table.  Now *every time* you insert a new row or
perform an update which changes that unique column, assuming no index
on the column, the database would need to check all 350,000 rows
individually to determine that the value is in fact unique.


With an index on the column, it is relatively quick for the database
to determine that the value is unique, as it does not need to check
nearly as many values..


To see this (rough example), start with an empty table with a single
column, which is a unique integer column.  Now add values and watch
what happens to an index (use a fixed-width font):


4 53 72 15 23 19 3 12 8


Index:


<fixed><bigger>4


4

 \

  53


  53

 /  \

4    72


  53

 /  \

4    72

 \

  15


     53

    /  \

  15    72

 /  \

4    23


     19

    /  \

  15    53

 /  \     \

4    23    72


      15

     /  \

    /    \

   /      \

  4        53

 / \      /  \

3   23  19    72


      15

     /  \

    /    \

   /      \

  4        53

 / \      /  \

3   23  19    72

 \

  12


        15

       /  \

      /    \

     /      \

    4        53

   / \      /  \

  8   23  19    72

 / \

3   12</bigger></fixed>



Now the user tries to insert 12, which is already in the table.  In
order to determine that 12 is in the table, the database could scan
every value in the table until it finds it.  In this case, it would
need to check 8 rows.  Using the index, it would only need to check 4
values, cutting the time in half.  In a few cases, it may take
marginally longer (2 as opposed to 1 for the value 4), but on average,
5 rows for unindexed vs. 2.8 rows for indexed, shows that the index
has a definite advantage.


Now extend this to 350,000 rows.  Without an index, you'd need to
check an average of about 175,000 rows just to determine that a value
was there.  And if the value is not there, as will more commonly be
the case, you'd need to check them all.  With an index like the ones I
used above, you would need to check *at most* 19 values.  You begin to
see why PostgreSQL requires an index here.


-----------------------------------------------------------

Frank D. Engel, Jr.  <<fde101@fjrhome.net>


$ ln -s /usr/share/kjvbible /usr/manual

$ true | cat /usr/manual | grep "John 3:16"

John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.

$


On Dec 22, 2004, at 12:09 PM, vinita bansal wrote:
>
> I am still not clear on why postgres has this restriction?
> By uniqueness, you mean to say that if later anyone wants to add a
> primary key constraint on a table which already has a primary key
> defined, postgres will use this index to determine that there is
> already a primary key defined and would not allow to add this
> constraint since a table cannot have two primary keys??

No, an index is required for efficiency.

Consider a table with a column which must be unique.  Assume there are
350,000 rows in the table.  Now *every time* you insert a new row or
perform an update which changes that unique column, assuming no index
on the column, the database would need to check all 350,000 rows
individually to determine that the value is in fact unique.

With an index on the column, it is relatively quick for the database to
determine that the value is unique, as it does not need to check nearly
as many values..

To see this (rough example), start with an empty table with a single
column, which is a unique integer column.  Now add values and watch
what happens to an index (use a fixed-width font):

4 53 72 15 23 19 3 12 8

Index:

4

4
  \
   53

   53
  /  \
4    72

   53
  /  \
4    72
  \
   15

      53
     /  \
   15    72
  /  \
4    23

      19
     /  \
   15    53
  /  \     \
4    23    72

       15
      /  \
     /    \
    /      \
   4        53
  / \      /  \
3   23  19    72

       15
      /  \
     /    \
    /      \
   4        53
  / \      /  \
3   23  19    72
  \
   12

         15
        /  \
       /    \
      /      \
     4        53
    / \      /  \
   8   23  19    72
  / \
3   12


Now the user tries to insert 12, which is already in the table.  In
order to determine that 12 is in the table, the database could scan
every value in the table until it finds it.  In this case, it would
need to check 8 rows.  Using the index, it would only need to check 4
values, cutting the time in half.  In a few cases, it may take
marginally longer (2 as opposed to 1 for the value 4), but on average,
5 rows for unindexed vs. 2.8 rows for indexed, shows that the index has
a definite advantage.

Now extend this to 350,000 rows.  Without an index, you'd need to check
an average of about 175,000 rows just to determine that a value was
there.  And if the value is not there, as will more commonly be the
case, you'd need to check them all.  With an index like the ones I used
above, you would need to check *at most* 19 values.  You begin to see
why PostgreSQL requires an index here.

-----------------------------------------------------------
Frank D. Engel, Jr.  <fde101@fjrhome.net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$

Attachment

pgsql-general by date:

Previous
From: Együd Csaba
Date:
Subject: Re: Strange Index behavior
Next
From: "Sander Steffann"
Date:
Subject: Re: default index created for primary key