Home > mailing lists

Thread: Primary key vs unique index

Primary key vs unique index

From

"Voils, Steven M"

Date:

17 March 2011, 15:18:30

Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables that have significant amounts of updates performed on them, as a result the primary key indexes are becoming significantly bloated. There are other indexes on the tables that also become bloated as a result of this, but these are automatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to be very low.

We don’t want to remove the unique constraint of that the primary key is providing, but the space on disk will continue to grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuilt concurrently, or would this be considered bad design? The reasoning behind this would be that the unique index could be rebuilt concurrently without taking the application down or exclusively locking the table for an extending period of time. Are there other advantages to a primary key outside of a uniqueness constraint and an index?

Re: Primary key vs unique index

From

Scott Ribe

Date:

17 March 2011, 16:13:33

On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote:

> Is there a fundamental difference between a primary key and a unique index?  Currently we have primary keys on tables
thathave significant amounts of updates performed on them, as a result the primary key indexes are becoming
significantlybloated.  There are other indexes on the tables that also become bloated as a result of this, but these
areautomatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to
bevery low.  
>
> We don’t want to remove the unique constraint of that the primary key is providing, but the space on disk will
continueto grow unbounded so we must do something.  Can we replace the primary key with a unique index that could be
rebuiltconcurrently, or would this be considered bad design?  The reasoning behind this would be that the unique index
couldbe rebuilt concurrently without taking the application down or exclusively locking the table for an extending
periodof time.  Are there other advantages to a primary key outside of a uniqueness constraint and an index?  

So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long?

The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary
key.Which for example in the db allows foreign key constraints to be created without specifying that column. And some
ORM/apps/frameworkscan automatically make use of the information as well. I like having them for clarity, but you
reallycan do away with them if your deployment needs to do so. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

Re: Primary key vs unique index

From

Scott Marlowe

Date:

17 March 2011, 18:51:55

On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
> Is there a fundamental difference between a primary key and a unique index?
> Currently we have primary keys on tables that have significant amounts of
> updates performed on them, as a result the primary key indexes are becoming
> significantly bloated.  There are other indexes on the tables that also
> become bloated as a result of this, but these are automatically rebuild
> periodically by the application (using the concurrently flag) when read
> usage is expected to be very low.

If you're experiencing bloat, but not deleting huge chunks of your
table at a time, then you're not vacuuming aggressively enough

Re: Primary key vs unique index

From

Scott Marlowe

Date:

17 March 2011, 18:53:03

On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
>> Is there a fundamental difference between a primary key and a unique index?
>> Currently we have primary keys on tables that have significant amounts of
>> updates performed on them, as a result the primary key indexes are becoming
>> significantly bloated.  There are other indexes on the tables that also
>> become bloated as a result of this, but these are automatically rebuild
>> periodically by the application (using the concurrently flag) when read
>> usage is expected to be very low.
>
> If you're experiencing bloat, but not deleting huge chunks of your
> table at a time, then you're not vacuuming aggressively enough

Or you're on 8.3 or before and blowing out your free space map.

Re: Primary key vs unique index

From

"Voils, Steven M"

Date:

17 March 2011, 19:08:11

Thanks for the reply.  I should have mentioned in the first post that we do delete significant amounts of the table
whichI thought was the cause of the bloat.  We are already performing automatic vacuums nightly. 

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 2:52 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index

On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
>> Is there a fundamental difference between a primary key and a unique index?
>> Currently we have primary keys on tables that have significant amounts of
>> updates performed on them, as a result the primary key indexes are becoming
>> significantly bloated.  There are other indexes on the tables that also
>> become bloated as a result of this, but these are automatically rebuild
>> periodically by the application (using the concurrently flag) when read
>> usage is expected to be very low.
>
> If you're experiencing bloat, but not deleting huge chunks of your
> table at a time, then you're not vacuuming aggressively enough

Or you're on 8.3 or before and blowing out your free space map.

Re: Primary key vs unique index

From

"Voils, Steven M"

Date:

17 March 2011, 19:08:25

Thanks for the reply, that's what I was looking for.  I just wasn't sure if there was another compelling advantage to
useprimary keys instead of a unique index. 

-----Original Message-----
From: Scott Ribe [mailto:scott_ribe@elevated-dev.com]
Sent: Thursday, March 17, 2011 12:13 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index

On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote:

> Is there a fundamental difference between a primary key and a unique index?  Currently we have primary keys on tables
thathave significant amounts of updates performed on them, as a result the primary key indexes are becoming
significantlybloated.  There are other indexes on the tables that also become bloated as a result of this, but these
areautomatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to
bevery low.  
>
> We don't want to remove the unique constraint of that the primary key is providing, but the space on disk will
continueto grow unbounded so we must do something.  Can we replace the primary key with a unique index that could be
rebuiltconcurrently, or would this be considered bad design?  The reasoning behind this would be that the unique index
couldbe rebuilt concurrently without taking the application down or exclusively locking the table for an extending
periodof time.  Are there other advantages to a primary key outside of a uniqueness constraint and an index?  

So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long?

The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary
key.Which for example in the db allows foreign key constraints to be created without specifying that column. And some
ORM/apps/frameworkscan automatically make use of the information as well. I like having them for clarity, but you
reallycan do away with them if your deployment needs to do so. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

Re: Primary key vs unique index

From

Scott Marlowe

Date:

17 March 2011, 22:31:05

On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:
> Thanks for the reply.  I should have mentioned in the first post that we do delete significant amounts of the table
whichI thought was the cause of the bloat.  We are already performing automatic vacuums nightly. 

Automatic regular vacuums?  So you do or don't have autovac turned on?

What version of pg are you running (8.3 or before, 8.4 or later?)

Are your nightly vacuums FULL or regular vacuums?

Re: Primary key vs unique index

From

Scott Marlowe

Date:

18 March 2011, 12:35:54

Autovacuum is generally more effective as it can run when it needs to
not having to wait til the end of the day.  If you delete big chunks
several times a day autovac can keep up.  Also, it's enabled by
default in 8.4 and up so the end user would have to actively turn it
off in this instance.

On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
> Our application supports pg 8.4 onwards.  The vacuums are run automatically by the application rather than requiring
endusers to enable autovacuum; the vacuums being performed are regular. 
>
> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Thursday, March 17, 2011 6:31 PM
> To: Voils, Steven M
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Primary key vs unique index
>
> On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
> <steve@sensorswitch.com> wrote:
>> Thanks for the reply.  I should have mentioned in the first post that we do delete significant amounts of the table
whichI thought was the cause of the bloat.  We are already performing automatic vacuums nightly. 
>
> Automatic regular vacuums?  So you do or don't have autovac turned on?
>
> What version of pg are you running (8.3 or before, 8.4 or later?)
>
> Are your nightly vacuums FULL or regular vacuums?
>



--
To understand recursion, one must first understand recursion.

Re: Primary key vs unique index

From

"Voils, Steven M"

Date:

18 March 2011, 13:19:10

Our application supports pg 8.4 onwards.  The vacuums are run automatically by the application rather than requiring
endusers to enable autovacuum; the vacuums being performed are regular. 

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 6:31 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index

On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:
> Thanks for the reply.  I should have mentioned in the first post that we do delete significant amounts of the table
whichI thought was the cause of the bloat.  We are already performing automatic vacuums nightly. 

Automatic regular vacuums?  So you do or don't have autovac turned on?

What version of pg are you running (8.3 or before, 8.4 or later?)

Are your nightly vacuums FULL or regular vacuums?

Re: Primary key vs unique index

From

"Voils, Steven M"

Date:

18 March 2011, 13:19:11

What are the general guidelines under which autovacuum will trigger?  I was unaware it was turned on by default for the
newerversions.  Would it be worthwhile to leave the manual vacuuming on?  Currently it runs immediately after large
sectionsof the tables are deleted.  Or would it be expected that autovac would pick these changes up and run anyway?   

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Friday, March 18, 2011 8:36 AM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index

Autovacuum is generally more effective as it can run when it needs to
not having to wait til the end of the day.  If you delete big chunks
several times a day autovac can keep up.  Also, it's enabled by
default in 8.4 and up so the end user would have to actively turn it
off in this instance.

On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
> Our application supports pg 8.4 onwards.  The vacuums are run automatically by the application rather than requiring
endusers to enable autovacuum; the vacuums being performed are regular. 
>
> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Thursday, March 17, 2011 6:31 PM
> To: Voils, Steven M
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Primary key vs unique index
>
> On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
> <steve@sensorswitch.com> wrote:
>> Thanks for the reply.  I should have mentioned in the first post that we do delete significant amounts of the table
whichI thought was the cause of the bloat.  We are already performing automatic vacuums nightly. 
>
> Automatic regular vacuums?  So you do or don't have autovac turned on?
>
> What version of pg are you running (8.3 or before, 8.4 or later?)
>
> Are your nightly vacuums FULL or regular vacuums?
>

--
To understand recursion, one must first understand recursion.

Re: Primary key vs unique index

From

Derrick Rice

Date:

22 March 2011, 01:05:19

On Fri, Mar 18, 2011 at 8:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote:

What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sections of the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?

Autovacuum will tend to run after those types of changes. As described here:

http://www.postgresql.org/docs/8.4/interactive/routine-vacuuming.html#AUTOVACUUM

What PostgreSQL looks for is a portion of the tuples to be obsoleted. The configuration is essentially "if some portion (percentage) of the table is obsolete, vacuum it" but also has an added scalar (base threshold) which is required on top of that portion (scale factor).

My understanding is that the base threshold is there to prevent small tables from being vacuumed for little or no reason, but for large tables it should be insignificant in comparison to the scale factor. So if your scale factor is .5, when you delete half of your table, you can expect an autovacuum to run on the next iteration of the daemon.

Note that the default scale factor is .2 (20%) and the default base threshold is 50. Both can be modified for the cluster as well as for individual tables.

As someone else already alluded, VACUUM FULL is generally bad for indexes, where VACUUM will help indexes (by reclaiming space the same way as it does for the table). More details on the same page linked above.

Derrick

Re: Primary key vs unique index

From

Scott Marlowe

Date:

22 March 2011, 01:13:28

On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
> What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for
thenewer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large
sectionsof the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?

Until 8.3 autovacuum was more of a proof of concept rather than
production ready code. By 8.3 two things had happened, vacuum
costing, which is important so you can tune vacuuming / autovacuuming
to your hardware and usage patterns, and multi-threaded autovacuuming
daemon, which meant that autovac could now handle the scenario where
one or more table would take a long (sometimes very long) time to
vacuum, especially with costing factors slowing it down, and another
table would get bloated while waiting its turn. With a server with
LOTS of random IO capability you can run quite a few threads at once,
since each one is only a small impact against the maximum IO of the
drive array. If you've got 1,000 tables and a couple dozen big ones
that can take 30 minutes or more to vacuum, it's a good thing to be
able to run autovac on more than one at a time.

The next HUGE improvement came with 8.4, which took the free space map
and put it on the drives, removing the need to constantly monitor and
adjust free space map to prevent blowout. If you've got a well tuned
<= pg 8.3 you're ok. If you need to tune an older version, it's often
easier AND safer to migrate to 8.4 or above.

Re: Primary key vs unique index

From

"Voils, Steven M"

Date:

22 March 2011, 12:45:12

Thanks for the help.  I think we're only going to support pg 8.4 onwards.  I'll turn off the application's manual
vacuumingand just let autovac do its thing. 

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Monday, March 21, 2011 9:13 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index

On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
> What are the general guidelines under which autovacuum will trigger?  I was unaware it was turned on by default for
thenewer versions.  Would it be worthwhile to leave the manual vacuuming on?  Currently it runs immediately after large
sectionsof the tables are deleted.  Or would it be expected that autovac would pick these changes up and run anyway? 

Until 8.3 autovacuum was more of a proof of concept rather than
production ready code.  By 8.3 two things had happened, vacuum
costing, which is important so you can tune vacuuming / autovacuuming
to your hardware and usage patterns, and multi-threaded autovacuuming
daemon, which meant that autovac could now handle the scenario where
one or more table would take a long (sometimes very long) time to
vacuum, especially with costing factors slowing it down, and another
table would get bloated while waiting its turn.    With a server with
LOTS of random IO capability you can run quite a few threads at once,
since each one is only a small impact against the maximum IO of the
drive array.  If you've got 1,000 tables and a couple dozen big ones
that can take 30 minutes or more to vacuum, it's a good thing to be
able to run autovac on more than one at a time.

The next HUGE improvement came with 8.4, which took the free space map
and put it on the drives, removing the need to constantly monitor and
adjust free space map to prevent blowout.  If you've got a well tuned
<= pg 8.3 you're ok.  If you need to tune an older version, it's often
easier AND safer to migrate to 8.4 or above.

Re: Primary key vs unique index

From

rverghese

Date:

06 July 2012, 16:08:11

We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?

Thanks
RV

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Primary key vs unique index

From

hubert depesz lubaczewski

Date:

06 July 2012, 17:01:58

On Fri, Jul 06, 2012 at 09:07:53AM -0700, rverghese wrote:
> We are experiencing a similar problem, even though we are on 8.4 and have
> been for a while, and have autovacuum turned on. I have regular concurrent
> reindexes on the indexes but the primary key is seriously bloated. I was
> considering doing the same thing, that is, create another primary key that
> is built on a sequence ( primarily for slony) and then change my current
> multi-column primary key to a unique index. Have you been able to work
> around the problem in any other way?

http://www.depesz.com/2011/07/06/bloat-happens/

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Primary key vs unique index

From

Gurjeet Singh

Date:

06 July 2012, 17:54:48

Bloat in primary key indexes has been a long standing issue (although not faced by many), and especially since online rebuild of primary keys was never possible in production environments.

Since version 9.1 we have a nice little feature of being able to change a primary key's underlying index. Look at the 'table_constraint_using_index' clause in ALTER TABLE docs [1]. And example in the same doc specifically shows how to solve the problem in just two commands:

<quote>
To recreate a primary key constraint, without blocking updates while the index is rebuilt:

CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON distributors (dist_id);

ALTER TABLE distributors DROP CONSTRAINT distributors_pkey,
ADD CONSTRAINT distributors_pkey PRIMARY KEY USING INDEX dist_id_temp_idx;
</quote>

[1] http://www.postgresql.org/docs/9.1/static/sql-altertable.html

Best regards,

PS: Shameless plug: I am credited for this feature :)

http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107778

On Fri, Jul 6, 2012 at 12:07 PM, rverghese <riyav@hotmail.com> wrote:

We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?

Thanks
RV

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--

Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: Primary key vs unique index

From

rverghese

Date:

06 July 2012, 18:07:32

Yes I am using that option for one of my POstgres 9.1 database and it works
well. But its still an issue with Foreign keys, which you need to drop and
recreate . Also I use Slony for replication and it uses the primary key to
check repl. So I don't want that to be interrupted by dropping PK and
recreating PK.
Thanks
RV

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715729.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Primary key vs unique index

From

Gurjeet Singh

Date:

06 July 2012, 19:26:13

On Fri, Jul 6, 2012 at 2:07 PM, rverghese <riyav@hotmail.com> wrote:

Yes I am using that option for one of my POstgres 9.1 database and it works
well. But its still an issue with Foreign keys, which you need to drop and
recreate .

Having to drop and create foriegn keys is a legitimate concern. I am looking into improving that.

Also I use Slony for replication and it uses the primary key to
check repl. So I don't want that to be interrupted by dropping PK and
recreating PK.

If you look closely at that example, DROP and CREATE of the primary key is being done in one command (and hence one transaction), so anything that depends this constraint should not be affected except from the fact that this table will be locked in exclusive mode for the duration of this operation, which should be very short.

Best regards,

--

Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company