Thread: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Hi all. Maybe mine is a stupid question, but I'd like to know the answer if possible. In an inner join involving a 16M+ rows table and a 100+ rows table performances got drastically improved by 100+ times by replacing a UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in the very same order. The query has not been modified. In the older case, thanks to the EXPLAIN command, I saw that the join was causing a sort on the index elements, while the primary key was not. So ther's some difference for sure, but I'm missing it. Any hint? -- Vincenzo Romano -- Maybe Computer will never become as intelligent as Humans. For sure they won't ever become so stupid. [VR-1988]
On Jul 20, 2007, at 17:54 , Vincenzo Romano wrote: > In an inner join involving a 16M+ rows table and a 100+ rows table > performances got drastically improved by 100+ times by replacing a > UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in > the very same order. The query has not been modified. There should be no difference in query performance, AIUI. > In the older case, thanks to the EXPLAIN command, I saw that the join > was causing a sort on the index elements, while the primary key was > not. Can you provide the actual EXPLAIN ANALYZE (not just EXPLAIN) outputs you can provide for us to look at? I suspect there's a difference wrt the size of the tables, the distribution of the values of the involved columns, index bloat, or how recent the tables have been analyzed. (Most likely the last.) Dropping the UNIQUE NOT NULL constraint and adding the PRIMARY KEY constraint will cause the index to be recreated, which could affect which plan is chosen and its efficacy. Without the EXPLAIN ANALYZE output, I don't think there's a lot of hope in understanding what's different. Michael Glaesemann grzm seespotcode net
On 7/20/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Jul 20, 2007, at 17:54 , Vincenzo Romano wrote: > > > In an inner join involving a 16M+ rows table and a 100+ rows table > > performances got drastically improved by 100+ times by replacing a > > UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in > > the very same order. The query has not been modified. > > There should be no difference in query performance, AIUI. If I read the documentation correctly, PRIMARY KEY is simply syntactic sugar equivalent to UNIQUE + NOT NULL, the only difference being that a PRIMARY KEY is reported as such to someone looking at the table structure, which becomes more intuitive than seeing UNIQUE + NOT NULL. > > > In the older case, thanks to the EXPLAIN command, I saw that the join > > was causing a sort on the index elements, while the primary key was > > not. > Might it just be that the original UNIQUE + NOT NULL index was bloated or otherwise degraded, and reindexing it would have resulted in the same performance gain? That's just a guess. -Josh
"Josh Tolley" <eggyknap@gmail.com> writes: > Might it just be that the original UNIQUE + NOT NULL index was bloated > or otherwise degraded, and reindexing it would have resulted in the > same performance gain? That's just a guess. Yeah. There is precious little difference between UNIQUE+NOT NULL and PRIMARY KEY --- to be exact, the latter will allow another table to reference this one in FOREIGN KEY without specifying column names. The planner knows nothing of that little convenience. The interesting thing about this report is that the plan changed after creating the new index. That has to mean that some statistic visible to the planner changed. Creating an index does update the pg_class columns about the table's size and number of rows, but probably those weren't that far off to start with. My bet is that the new index is a lot smaller than the old because of bloat in the old index. If so, REINDEX would have had the same result. regards, tom lane
On Saturday 21 July 2007 08:00:11 Tom Lane wrote: > "Josh Tolley" <eggyknap@gmail.com> writes: > > Might it just be that the original UNIQUE + NOT NULL index was > > bloated or otherwise degraded, and reindexing it would have > > resulted in the same performance gain? That's just a guess. > > Yeah. There is precious little difference between UNIQUE+NOT NULL > and PRIMARY KEY --- to be exact, the latter will allow another > table to reference this one in FOREIGN KEY without specifying > column names. The planner knows nothing of that little convenience. > > The interesting thing about this report is that the plan changed > after creating the new index. That has to mean that some statistic > visible to the planner changed. Creating an index does update the > pg_class columns about the table's size and number of rows, but > probably those weren't that far off to start with. My bet is that > the new index is a lot smaller than the old because of bloat in the > old index. If so, REINDEX would have had the same result. > > regards, tom lane I've done a bit deeper analisys. In the original setup, the "UNIQUE" constraint had been dropped *before* doing the tests. So the "slow" case is without the UNIQUE constraint but with an index. The NOT NULL was instead there. What I don't understand is why the planner in order to accomplish a JOIN does the sort if it has no UNIQUEness constraint and doesn't need to sort if it has. -- Vincenzo Romano -- Maybe Computer will never become as intelligent as Humans. For sure they won't ever become so stupid. [VR-1988]
Vincenzo Romano <vincenzo.romano@gmail.com> writes: > In the original setup, the "UNIQUE" constraint had been dropped > *before* doing the tests. So the "slow" case is without the UNIQUE > constraint but with an index. The NOT NULL was instead there. With what index, pray tell? regards, tom lane
On Sunday 22 July 2007 19:20:08 Tom Lane wrote: > Vincenzo Romano <vincenzo.romano@gmail.com> writes: > > In the original setup, the "UNIQUE" constraint had been dropped > > *before* doing the tests. So the "slow" case is without the > > UNIQUE constraint but with an index. The NOT NULL was instead > > there. > > With what index, pray tell? > > regards, tom lane Sorry for the incomplete sentence. Read it as: In the original setup, the "UNIQUE" constraint had been dropped *before* doing the tests. So the "slow" case is without the UNIQUE constraint but with an index on NOT NULL fields. The "fast" case was with the primary key on the very same fields in the very same order. -- Vincenzo Romano -- Maybe Computer will never become as intelligent as Humans. For sure they won't ever become so stupid. [VR-1988]
Vincenzo Romano <vincenzo.romano@gmail.com> writes: > On Sunday 22 July 2007 19:20:08 Tom Lane wrote: >> With what index, pray tell? > In the original setup, the "UNIQUE" constraint had been dropped > *before* doing the tests. So the "slow" case is without the > UNIQUE constraint but with an index on NOT NULL fields. You haven't said where you think this index is coming from. regards, tom lane
On 7/22/07, Vincenzo Romano <vincenzo.romano@gmail.com> wrote: > On Sunday 22 July 2007 19:20:08 Tom Lane wrote: > > Vincenzo Romano <vincenzo.romano@gmail.com> writes: > > > In the original setup, the "UNIQUE" constraint had been dropped > > > *before* doing the tests. So the "slow" case is without the > > > UNIQUE constraint but with an index. The NOT NULL was instead > > > there. > > > > With what index, pray tell? > > > > regards, tom lane > > Sorry for the incomplete sentence. > Read it as: > > In the original setup, the "UNIQUE" constraint had been dropped > *before* doing the tests. So the "slow" case is without the > UNIQUE constraint but with an index on NOT NULL fields. Control question: did you recreate non-unique index after dropping the UNIQUE constraint? Regards, Dawid