Thread: Primary key vs unique index
Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables that have significant amounts of updates performed on them, as a result the primary key indexes are becoming significantly bloated. There are other indexes on the tables that also become bloated as a result of this, but these are automatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to be very low.
We don’t want to remove the unique constraint of that the primary key is providing, but the space on disk will continue to grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuilt concurrently, or would this be considered bad design? The reasoning behind this would be that the unique index could be rebuilt concurrently without taking the application down or exclusively locking the table for an extending period of time. Are there other advantages to a primary key outside of a uniqueness constraint and an index?
On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote: > Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables thathave significant amounts of updates performed on them, as a result the primary key indexes are becoming significantlybloated. There are other indexes on the tables that also become bloated as a result of this, but these areautomatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to bevery low. > > We don’t want to remove the unique constraint of that the primary key is providing, but the space on disk will continueto grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuiltconcurrently, or would this be considered bad design? The reasoning behind this would be that the unique index couldbe rebuilt concurrently without taking the application down or exclusively locking the table for an extending periodof time. Are there other advantages to a primary key outside of a uniqueness constraint and an index? So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long? The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary key.Which for example in the db allows foreign key constraints to be created without specifying that column. And some ORM/apps/frameworkscan automatically make use of the information as well. I like having them for clarity, but you reallycan do away with them if your deployment needs to do so. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote: > Is there a fundamental difference between a primary key and a unique index? > Currently we have primary keys on tables that have significant amounts of > updates performed on them, as a result the primary key indexes are becoming > significantly bloated. There are other indexes on the tables that also > become bloated as a result of this, but these are automatically rebuild > periodically by the application (using the concurrently flag) when read > usage is expected to be very low. If you're experiencing bloat, but not deleting huge chunks of your table at a time, then you're not vacuuming aggressively enough
On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote: >> Is there a fundamental difference between a primary key and a unique index? >> Currently we have primary keys on tables that have significant amounts of >> updates performed on them, as a result the primary key indexes are becoming >> significantly bloated. There are other indexes on the tables that also >> become bloated as a result of this, but these are automatically rebuild >> periodically by the application (using the concurrently flag) when read >> usage is expected to be very low. > > If you're experiencing bloat, but not deleting huge chunks of your > table at a time, then you're not vacuuming aggressively enough Or you're on 8.3 or before and blowing out your free space map.
Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table whichI thought was the cause of the bloat. We are already performing automatic vacuums nightly. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Thursday, March 17, 2011 2:52 PM To: Voils, Steven M Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Primary key vs unique index On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote: >> Is there a fundamental difference between a primary key and a unique index? >> Currently we have primary keys on tables that have significant amounts of >> updates performed on them, as a result the primary key indexes are becoming >> significantly bloated. There are other indexes on the tables that also >> become bloated as a result of this, but these are automatically rebuild >> periodically by the application (using the concurrently flag) when read >> usage is expected to be very low. > > If you're experiencing bloat, but not deleting huge chunks of your > table at a time, then you're not vacuuming aggressively enough Or you're on 8.3 or before and blowing out your free space map.
Thanks for the reply, that's what I was looking for. I just wasn't sure if there was another compelling advantage to useprimary keys instead of a unique index. -----Original Message----- From: Scott Ribe [mailto:scott_ribe@elevated-dev.com] Sent: Thursday, March 17, 2011 12:13 PM To: Voils, Steven M Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Primary key vs unique index On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote: > Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables thathave significant amounts of updates performed on them, as a result the primary key indexes are becoming significantlybloated. There are other indexes on the tables that also become bloated as a result of this, but these areautomatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to bevery low. > > We don't want to remove the unique constraint of that the primary key is providing, but the space on disk will continueto grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuiltconcurrently, or would this be considered bad design? The reasoning behind this would be that the unique index couldbe rebuilt concurrently without taking the application down or exclusively locking the table for an extending periodof time. Are there other advantages to a primary key outside of a uniqueness constraint and an index? So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long? The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary key.Which for example in the db allows foreign key constraints to be created without specifying that column. And some ORM/apps/frameworkscan automatically make use of the information as well. I like having them for clarity, but you reallycan do away with them if your deployment needs to do so. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M <steve@sensorswitch.com> wrote: > Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table whichI thought was the cause of the bloat. We are already performing automatic vacuums nightly. Automatic regular vacuums? So you do or don't have autovac turned on? What version of pg are you running (8.3 or before, 8.4 or later?) Are your nightly vacuums FULL or regular vacuums?
Autovacuum is generally more effective as it can run when it needs to not having to wait til the end of the day. If you delete big chunks several times a day autovac can keep up. Also, it's enabled by default in 8.4 and up so the end user would have to actively turn it off in this instance. On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote: > Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring endusers to enable autovacuum; the vacuums being performed are regular. > > -----Original Message----- > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Sent: Thursday, March 17, 2011 6:31 PM > To: Voils, Steven M > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Primary key vs unique index > > On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M > <steve@sensorswitch.com> wrote: >> Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table whichI thought was the cause of the bloat. We are already performing automatic vacuums nightly. > > Automatic regular vacuums? So you do or don't have autovac turned on? > > What version of pg are you running (8.3 or before, 8.4 or later?) > > Are your nightly vacuums FULL or regular vacuums? > -- To understand recursion, one must first understand recursion.
Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring endusers to enable autovacuum; the vacuums being performed are regular. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Thursday, March 17, 2011 6:31 PM To: Voils, Steven M Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Primary key vs unique index On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M <steve@sensorswitch.com> wrote: > Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table whichI thought was the cause of the bloat. We are already performing automatic vacuums nightly. Automatic regular vacuums? So you do or don't have autovac turned on? What version of pg are you running (8.3 or before, 8.4 or later?) Are your nightly vacuums FULL or regular vacuums?
What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newerversions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sectionsof the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway? -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Friday, March 18, 2011 8:36 AM To: Voils, Steven M Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Primary key vs unique index Autovacuum is generally more effective as it can run when it needs to not having to wait til the end of the day. If you delete big chunks several times a day autovac can keep up. Also, it's enabled by default in 8.4 and up so the end user would have to actively turn it off in this instance. On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote: > Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring endusers to enable autovacuum; the vacuums being performed are regular. > > -----Original Message----- > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Sent: Thursday, March 17, 2011 6:31 PM > To: Voils, Steven M > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Primary key vs unique index > > On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M > <steve@sensorswitch.com> wrote: >> Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table whichI thought was the cause of the bloat. We are already performing automatic vacuums nightly. > > Automatic regular vacuums? So you do or don't have autovac turned on? > > What version of pg are you running (8.3 or before, 8.4 or later?) > > Are your nightly vacuums FULL or regular vacuums? > -- To understand recursion, one must first understand recursion.
What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sections of the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?
Autovacuum will tend to run after those types of changes. As described here:
http://www.postgresql.org/docs/8.4/interactive/routine-vacuuming.html#AUTOVACUUM
What PostgreSQL looks for is a portion of the tuples to be obsoleted. The configuration is essentially "if some portion (percentage) of the table is obsolete, vacuum it" but also has an added scalar (base threshold) which is required on top of that portion (scale factor).
My understanding is that the base threshold is there to prevent small tables from being vacuumed for little or no reason, but for large tables it should be insignificant in comparison to the scale factor. So if your scale factor is .5, when you delete half of your table, you can expect an autovacuum to run on the next iteration of the daemon.
Note that the default scale factor is .2 (20%) and the default base threshold is 50. Both can be modified for the cluster as well as for individual tables.
As someone else already alluded, VACUUM FULL is generally bad for indexes, where VACUUM will help indexes (by reclaiming space the same way as it does for the table). More details on the same page linked above.
Derrick
On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote: > What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for thenewer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sectionsof the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway? Until 8.3 autovacuum was more of a proof of concept rather than production ready code. By 8.3 two things had happened, vacuum costing, which is important so you can tune vacuuming / autovacuuming to your hardware and usage patterns, and multi-threaded autovacuuming daemon, which meant that autovac could now handle the scenario where one or more table would take a long (sometimes very long) time to vacuum, especially with costing factors slowing it down, and another table would get bloated while waiting its turn. With a server with LOTS of random IO capability you can run quite a few threads at once, since each one is only a small impact against the maximum IO of the drive array. If you've got 1,000 tables and a couple dozen big ones that can take 30 minutes or more to vacuum, it's a good thing to be able to run autovac on more than one at a time. The next HUGE improvement came with 8.4, which took the free space map and put it on the drives, removing the need to constantly monitor and adjust free space map to prevent blowout. If you've got a well tuned <= pg 8.3 you're ok. If you need to tune an older version, it's often easier AND safer to migrate to 8.4 or above.
Thanks for the help. I think we're only going to support pg 8.4 onwards. I'll turn off the application's manual vacuumingand just let autovac do its thing. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Monday, March 21, 2011 9:13 PM To: Voils, Steven M Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Primary key vs unique index On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote: > What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for thenewer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sectionsof the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway? Until 8.3 autovacuum was more of a proof of concept rather than production ready code. By 8.3 two things had happened, vacuum costing, which is important so you can tune vacuuming / autovacuuming to your hardware and usage patterns, and multi-threaded autovacuuming daemon, which meant that autovac could now handle the scenario where one or more table would take a long (sometimes very long) time to vacuum, especially with costing factors slowing it down, and another table would get bloated while waiting its turn. With a server with LOTS of random IO capability you can run quite a few threads at once, since each one is only a small impact against the maximum IO of the drive array. If you've got 1,000 tables and a couple dozen big ones that can take 30 minutes or more to vacuum, it's a good thing to be able to run autovac on more than one at a time. The next HUGE improvement came with 8.4, which took the free space map and put it on the drives, removing the need to constantly monitor and adjust free space map to prevent blowout. If you've got a well tuned <= pg 8.3 you're ok. If you need to tune an older version, it's often easier AND safer to migrate to 8.4 or above.
We are experiencing a similar problem, even though we are on 8.4 and have been for a while, and have autovacuum turned on. I have regular concurrent reindexes on the indexes but the primary key is seriously bloated. I was considering doing the same thing, that is, create another primary key that is built on a sequence ( primarily for slony) and then change my current multi-column primary key to a unique index. Have you been able to work around the problem in any other way? Thanks RV -- View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Fri, Jul 06, 2012 at 09:07:53AM -0700, rverghese wrote: > We are experiencing a similar problem, even though we are on 8.4 and have > been for a while, and have autovacuum turned on. I have regular concurrent > reindexes on the indexes but the primary key is seriously bloated. I was > considering doing the same thing, that is, create another primary key that > is built on a sequence ( primarily for slony) and then change my current > multi-column primary key to a unique index. Have you been able to work > around the problem in any other way? http://www.depesz.com/2011/07/06/bloat-happens/ depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
Since version 9.1 we have a nice little feature of being able to change a primary key's underlying index. Look at the 'table_constraint_using_index' clause in ALTER TABLE docs [1]. And example in the same doc specifically shows how to solve the problem in just two commands:
<quote>
To recreate a primary key constraint, without blocking updates while the index is rebuilt:
CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON distributors (dist_id);
ALTER TABLE distributors DROP CONSTRAINT distributors_pkey,
ADD CONSTRAINT distributors_pkey PRIMARY KEY USING INDEX dist_id_temp_idx;
</quote>
[1] http://www.postgresql.org/docs/9.1/static/sql-altertable.html
Best regards,
PS: Shameless plug: I am credited for this feature :)
http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107778
We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?
Thanks
RV
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Yes I am using that option for one of my POstgres 9.1 database and it works well. But its still an issue with Foreign keys, which you need to drop and recreate . Also I use Slony for replication and it uses the primary key to check repl. So I don't want that to be interrupted by dropping PK and recreating PK. Thanks RV -- View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715729.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Yes I am using that option for one of my POstgres 9.1 database and it works
well. But its still an issue with Foreign keys, which you need to drop and
recreate .
Having to drop and create foriegn keys is a legitimate concern. I am looking into improving that.
Also I use Slony for replication and it uses the primary key to
check repl. So I don't want that to be interrupted by dropping PK and
recreating PK.
If you look closely at that example, DROP and CREATE of the primary key is being done in one command (and hence one transaction), so anything that depends this constraint should not be affected except from the fact that this table will be locked in exclusive mode for the duration of this operation, which should be very short.
Best regards,
EnterpriseDB Corporation
The Enterprise PostgreSQL Company