Thread: Disallow redundant indexes
Hi, hackers Currently, PostgreSQL permits creating multiple indexes on the same columns in the same order for a table, potentially leading to redundant indexes. For example: CREATE INDEX ON t(id); CREATE INDEX ON t(id); While permitted, this leads to: - Increased storage consumption - Performance degradation (for data modification) - Maintenance overhead - Potential query optimizer confusion Oracle prevents this with an error like ORA-01408: such column list already indexed [1]. I propose that PostgreSQL prevent redundant index creation by: - Checking for identical existing indexes during CREATE INDEX. - Failing with an error (like Oracle's ORA-01408) if a duplicate is found. - Providing a GUC parameter (allow_redundant_indexes) to control this. This change would: - Prevent accidental redundancy - Optimize storage - Improve performance - Simplify maintenance - Enhance efficiency and user flexibility I’d love to hear your feedback or suggestions for improvement. [1] https://docs.oracle.com/en/error-help/db/ora-01408/?r=19c -- Regrads, Japin Li
On Thu, 24 Apr 2025 at 21:27, Japin Li <japinli@hotmail.com> wrote: > I propose that PostgreSQL prevent redundant index creation by: > > - Checking for identical existing indexes during CREATE INDEX. > - Failing with an error (like Oracle's ORA-01408) if a duplicate is found. > - Providing a GUC parameter (allow_redundant_indexes) to control this. > I’d love to hear your feedback or suggestions for improvement. Sounds like pointless nannying to me. Also, we don't want GUCs that change the way things work. It's too annoying for application developers. I don't have a complete picture of the history of it, but I believe we still have quite a mess as a result of some GUC-dependent behaviour. Check the mess around backslash_quote, standard_conforming_strings and escape_string_warning. In any case, who are we to define what a duplicate index is? Would creating a hash index on a column that's already part of a btree index be disallowed? How about creating an index on (col1) when there's already an index on (col1,col2)? One person might think that's a waste of space, and someone else might think it's useful to have the (col1) index to support faster index-only scans when only that column is needed. I get that we have REINDEX CONCURRENTLY now and there's less of a need to recreate another duplicate index CONCURRENTLY before dropping the old one, but aren't there still reasons to create a duplicate index concurrently, e.g to move an index to another tablespace without blocking queries. If you want to do this, then maybe just write some query that looks at pg_index to find duplicates and stick it on the wiki. Anyone who cares enough about this can run that to check. Oh wait, someone did that already, see [1]. David [1] https://wiki.postgresql.org/wiki/Index_Maintenance#Duplicate_indexes
On Thu, Apr 24, 2025 at 7:31 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 24 Apr 2025 at 21:27, Japin Li <japinli@hotmail.com> wrote:
> I propose that PostgreSQL prevent redundant index creation by:
In any case, who are we to define what a duplicate index is?
I think this part is easier than you make it sound: everything (except the name) is exactly the same as an existing index. That's the 99% case we are trying to catch here.
I've had this idea before, and even wrote a quick POC at one point, but I had it simply throw a warning rather than an error. That avoids the need for any GUC, which I agree is not a good idea. And it still allows people to create a duplicate index if they really want to.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Hi, Greg and David Thank you for your feedback. On Thu, 24 Apr 2025 at 08:26, Greg Sabino Mullane <htamfids@gmail.com> wrote: > On Thu, Apr 24, 2025 at 7:31 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 24 Apr 2025 at 21:27, Japin Li <japinli@hotmail.com> wrote: > > I propose that PostgreSQL prevent redundant index creation by: > > > In any case, who are we to define what a duplicate index is? > You're absolutely right. Defining a duplicate index is indeed simpler than I initially described. > I think this part is easier than you make it sound: everything (except the name) is exactly the same as an existing > index. That's the 99% case we are trying to catch here. > As Greg pointed out, if everything except the name is identical to an existing index, it should be considered a duplicate in most cases (the 99% case). This is precisely the scenario I'm aiming to prevent. > I've had this idea before, and even wrote a quick POC at one point, but I had it simply throw a warning rather than an > error. That avoids the need for any GUC, which I agree is not a good idea. And it still allows people to create a > duplicate index if they really want to. > I also appreciate your suggestion regarding the GUC parameter. You've convinced me that a warning might be a more appropriate approach. A warning would still alert users to the potential issue of creating a redundant index, while allowing them to proceed if they have a specific reason to do so. -- Regrads, Japin Li
> I've had this idea before, and even wrote a quick POC at one point, but I had it simply throw a warning rather than an
> error. That avoids the need for any GUC, which I agree is not a good idea. And it still allows people to create a
> duplicate index if they really want to.
>
I also appreciate your suggestion regarding the GUC parameter. You've
convinced me that a warning might be a more appropriate approach. A warning
would still alert users to the potential issue of creating a redundant index,
while allowing them to proceed if they have a specific reason to do so.
+1 to a warning.
One reason they might want to create a duplicate index is to reduce index bloat, especially on partial indexes. Granted, we've had REINDEX CONCURRENTLY since v12, but some reindexing workflows are older than that, and I've seen a few that put a "freshness date" into the index name as a hint to future DBAs about the urgency of the next reindexing. A canceled reindex concurrently could in the past (and maybe still can) leave an invalid index of indeterminate name, and therefore harder to clean up, so that's a possible reason to prefer duplicate-then-swap index creation over reindex.
> > > I've had this idea before, and even wrote a quick POC at one point, but I had it simply throw a warning rather thanan > > > error. That avoids the need for any GUC, which I agree is not a good idea. And it still allows people to create a > > > duplicate index if they really want to. > > > > > > > I also appreciate your suggestion regarding the GUC parameter. You've > > convinced me that a warning might be a more appropriate approach. A warning > > would still alert users to the potential issue of creating a redundant index, > > while allowing them to proceed if they have a specific reason to do so. > +1 to a warning. If I had a dollar for every time I asked a user to check for duplicate indexes :) So I think it's a good idea to provide some way to detect these ( besides a query in the Wiki ), but I don't think we should prevent it. While a WARNING will be a good to have, it could easily go unnoticed, but it's still good to have. I also think we should either provide a psql shortcut to detect these indexes or to add annotation to an index in the \d <table> command, or perhaps both of these things. -- Sami Imseih Amazon Web Services (AWS)
On Mon, Apr 28, 2025 at 01:12:18PM -0500, Sami Imseih wrote: > If I had a dollar for every time I asked a user to check for duplicate > indexes :) So > I think it's a good idea to provide some way to detect these ( besides > a query in the > Wiki ), but I don't think we should prevent it. > > While a WARNING will be a good to have, it could easily go unnoticed, > but it's still good to have. I don't doubt that this is a problem in the field, but I'm not sure I'm convinced it's worthwhile to emit a warning that only appears for exact duplicates and is very easily missed, not to mention that the user could be intentionally creating the index. -- nathan
On Mon, Apr 28, 2025, at 7:17 PM, Nathan Bossart wrote:
On Mon, Apr 28, 2025 at 01:12:18PM -0500, Sami Imseih wrote:> If I had a dollar for every time I asked a user to check for duplicate> indexes :) So> I think it's a good idea to provide some way to detect these ( besides> a query in the> Wiki ), but I don't think we should prevent it.>> While a WARNING will be a good to have, it could easily go unnoticed,> but it's still good to have.I don't doubt that this is a problem in the field, but I'm not sure I'mconvinced it's worthwhile to emit a warning that only appears for exactduplicates and is very easily missed, not to mention that the user could beintentionally creating the index.
I don't like to idea of a new GUC to control an SQL command. The WARNING seems
fragile because it doesn't solve the problem. I also don't want to disallow
duplicate indexes; it can break old routines. If we want to do something about
it, I would say let's add a psql meta command or even a new view to show
duplicate indexes. I do think that official maintenance queries are a good to
step forward to make life easier for DBAs.
> On Mon, Apr 28, 2025 at 01:12:18PM -0500, Sami Imseih wrote: > > If I had a dollar for every time I asked a user to check for duplicate > > indexes :) So > > I think it's a good idea to provide some way to detect these ( besides > > a query in the > > Wiki ), but I don't think we should prevent it. > > > > While a WARNING will be a good to have, it could easily go unnoticed, > > but it's still good to have. > > I don't doubt that this is a problem in the field, but I'm not sure I'm > convinced it's worthwhile to emit a warning that only appears for exact > duplicates and is very easily missed, not to mention that the user could be > intentionally creating the index. > > > I don't like to idea of a new GUC to control an SQL command. The WARNING seems > fragile because it doesn't solve the problem. I also don't want to disallow > duplicate indexes; it can break old routines. If we want to do something about > it, I would say let's add a psql meta command or even a new view to show > duplicate indexes. I do think that official maintenance queries are a good to > step forward to make life easier for DBAs. I do think adding an annotation to the index in the \d[+] psql meta-command will be very useful when there are indexes that are the same except for the name (and do not enforce constraints?). This will at least help DBAs quickly spot such indexes in their tables. -- Sami