Thread: Proposal to Enable/Disable Index using ALTER INDEX
Hello hackers,
This is my first time posting here, and I’d like to propose a new feature related to PostgreSQL indexes. If this idea resonates, I’d be happy to follow up with a patch as well.
Problem:
Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can be resource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal since as a user you may want a quicker way to test their effects without fully committing to removing & adding them back again. Which can be a time taking operation on larger tables.
Proposal:
I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look something like:
ALTER INDEX index_name ENABLE;
ALTER INDEX index_name DISABLE;
A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess whether an index impacts query performance before deciding to drop it entirely.
Implementation:
To keep this simple, I suggest toggling the indisvalid flag in pg_index during the enable/disable operation.
Additional Considerations:
- Keeping the index up-to-date while it’s disabled seems preferable, as it avoids the need to rebuild the index if it’s re-enabled later. The alternative would be dropping and rebuilding the index upon re-enabling, which I believe would introduce additional overhead in terms of application logic & complexity.
- I am also proposing to reuse the existing indisvalid flag to avoid adding new state and the maintenance that comes with it, but I’m open to feedback if this approach has potential downsides.
- To keep the scope minimal for now, I propose that we only allow enabling and disabling indexes globally, and not locally, by supporting it exclusively in ALTER INDEX. I would love to know if this would break any SQL grammar promises that I might be unaware of.
I would love to learn if this sounds like a good idea and how it can be improved further. Accordingly, as a next step I would be very happy to propose a patch as well.
Best regards,
Shayon Mukherjee
This is my first time posting here, and I’d like to propose a new feature related to PostgreSQL indexes. If this idea resonates, I’d be happy to follow up with a patch as well.
Problem:
Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can be resource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal since as a user you may want a quicker way to test their effects without fully committing to removing & adding them back again. Which can be a time taking operation on larger tables.
Proposal:
I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look something like:
ALTER INDEX index_name ENABLE;
ALTER INDEX index_name DISABLE;
A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess whether an index impacts query performance before deciding to drop it entirely.
Implementation:
To keep this simple, I suggest toggling the indisvalid flag in pg_index during the enable/disable operation.
Additional Considerations:
- Keeping the index up-to-date while it’s disabled seems preferable, as it avoids the need to rebuild the index if it’s re-enabled later. The alternative would be dropping and rebuilding the index upon re-enabling, which I believe would introduce additional overhead in terms of application logic & complexity.
- I am also proposing to reuse the existing indisvalid flag to avoid adding new state and the maintenance that comes with it, but I’m open to feedback if this approach has potential downsides.
- To keep the scope minimal for now, I propose that we only allow enabling and disabling indexes globally, and not locally, by supporting it exclusively in ALTER INDEX. I would love to know if this would break any SQL grammar promises that I might be unaware of.
I would love to learn if this sounds like a good idea and how it can be improved further. Accordingly, as a next step I would be very happy to propose a patch as well.
Best regards,
Shayon Mukherjee
On Tue, 10 Sept 2024 at 09:39, Shayon Mukherjee <shayonj@gmail.com> wrote: > Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can beresource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal sinceas a user you may want a quicker way to test their effects without fully committing to removing & adding them back again.Which can be a time taking operation on larger tables. > > Proposal: > I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look somethinglike: > > ALTER INDEX index_name ENABLE; > ALTER INDEX index_name DISABLE; > > A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allowsusers to assess whether an index impacts query performance before deciding to drop it entirely. I personally think having some way to alter an index to stop it from being used in query plans would be very useful for the reasons you mentioned. I don't have any arguments against the syntax you've proposed. We'd certainly have to clearly document that constraints are still enforced. Perhaps there is some other syntax which would self-document slightly better. I just can't think of it right now. > Implementation: > To keep this simple, I suggest toggling the indisvalid flag in pg_index during the enable/disable operation. That's not a good idea as it would allow ALTER INDEX ... ENABLE; to be used to make valid a failed concurrently created index. I think this would need a new flag and everywhere in the planner would need to be adjusted to ignore indexes when that flag is false. > Additional Considerations: > - Keeping the index up-to-date while it’s disabled seems preferable, as it avoids the need to rebuild the index if it’sre-enabled later. The alternative would be dropping and rebuilding the index upon re-enabling, which I believe wouldintroduce additional overhead in terms of application logic & complexity. I think the primary use case here is to assist in dropping useless indexes in a way that can very quickly be undone if the index is more useful than thought. If you didn't keep the index up-to-date then that would make the feature useless for that purpose. If we get the skip scan feature for PG18, then there's likely going to be lots of people with indexes that they might want to consider removing after upgrading. Maybe this is a good time to consider this feature as it possibly won't ever be more useful than it will be after we get skip scans. David
Hi Shayon
Thank you for your work on this , I think it's great to have this feature implemented ,I checked the doucment on other databases,It seems both MySQL 8.0 and oracle supports it, sql server need to rebuild indexes after disabled,It seems disable the index, it's equivalent to deleting the index, except that the index's metadata is still retained:
->A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess -> ->whether an index impacts query performance before deciding to drop it entirely.
MySQL 8.0 and oracle settings are not visible, index information is always updated, I would then suggest that the statement be changed to set the index invisible and visible.
Thanks
David Rowley <dgrowleyml@gmail.com> 于2024年9月10日周二 06:17写道:
On Tue, 10 Sept 2024 at 09:39, Shayon Mukherjee <shayonj@gmail.com> wrote:
> Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can be resource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal since as a user you may want a quicker way to test their effects without fully committing to removing & adding them back again. Which can be a time taking operation on larger tables.
>
> Proposal:
> I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look something like:
>
> ALTER INDEX index_name ENABLE;
> ALTER INDEX index_name DISABLE;
>
> A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess whether an index impacts query performance before deciding to drop it entirely.
I personally think having some way to alter an index to stop it from
being used in query plans would be very useful for the reasons you
mentioned. I don't have any arguments against the syntax you've
proposed. We'd certainly have to clearly document that constraints
are still enforced. Perhaps there is some other syntax which would
self-document slightly better. I just can't think of it right now.
> Implementation:
> To keep this simple, I suggest toggling the indisvalid flag in pg_index during the enable/disable operation.
That's not a good idea as it would allow ALTER INDEX ... ENABLE; to be
used to make valid a failed concurrently created index. I think this
would need a new flag and everywhere in the planner would need to be
adjusted to ignore indexes when that flag is false.
> Additional Considerations:
> - Keeping the index up-to-date while it’s disabled seems preferable, as it avoids the need to rebuild the index if it’s re-enabled later. The alternative would be dropping and rebuilding the index upon re-enabling, which I believe would introduce additional overhead in terms of application logic & complexity.
I think the primary use case here is to assist in dropping useless
indexes in a way that can very quickly be undone if the index is more
useful than thought. If you didn't keep the index up-to-date then that
would make the feature useless for that purpose.
If we get the skip scan feature for PG18, then there's likely going to
be lots of people with indexes that they might want to consider
removing after upgrading. Maybe this is a good time to consider this
feature as it possibly won't ever be more useful than it will be after
we get skip scans.
David
Hi, On Tue, Sep 10, 2024 at 10:16:34AM +1200, David Rowley wrote: > On Tue, 10 Sept 2024 at 09:39, Shayon Mukherjee <shayonj@gmail.com> wrote: > > Adding and removing indexes is a common operation in PostgreSQL. On > > larger databases, however, these operations can be > > resource-intensive. When evaluating the performance impact of one or > > more indexes, dropping them might not be ideal since as a user you > > may want a quicker way to test their effects without fully > > committing to removing & adding them back again. Which can be a time > > taking operation on larger tables. > > > > Proposal: > > I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look somethinglike: > > > > ALTER INDEX index_name ENABLE; > > ALTER INDEX index_name DISABLE; > > > > A disabled index would still receive updates and enforce constraints > > as usual but would not be used for queries. This allows users to > > assess whether an index impacts query performance before deciding to > > drop it entirely. > > I personally think having some way to alter an index to stop it from > being used in query plans would be very useful for the reasons you > mentioned. I don't have any arguments against the syntax you've > proposed. We'd certainly have to clearly document that constraints > are still enforced. Perhaps there is some other syntax which would > self-document slightly better. I just can't think of it right now. > > > Implementation: > > To keep this simple, I suggest toggling the indisvalid flag in > > pg_index during the enable/disable operation. > > That's not a good idea as it would allow ALTER INDEX ... ENABLE; to be > used to make valid a failed concurrently created index. I think this > would need a new flag and everywhere in the planner would need to be > adjusted to ignore indexes when that flag is false. How about the indislive flag instead? I haven't looked at the code, but from the documentation ("If false, the index is in process of being dropped, and should be ignored for all purposes") it sounds like we made be able to piggy-back on that instead? Michael
On Tue, 10 Sept 2024 at 22:46, Michael Banck <mbanck@gmx.net> wrote: > How about the indislive flag instead? I haven't looked at the code, but > from the documentation ("If false, the index is in process of being > dropped, and > should be ignored for all purposes") it sounds like we made be able to > piggy-back on that instead? Doing that could cause an UPDATE which would ordinarily not be eligible for a HOT-update to become a HOT-update. That would cause issues if the index is enabled again as the index wouldn't have been updated during the UPDATE. I don't see the big deal with adding a new flag. There's even a free padding byte to put this flag in after indisreplident, so we don't have to worry about using more memory. David
+1 for the new flag as well, since it'd be nice to be able to enable/disable indexes without having to worry about the missed updates / having to rebuild it.
Shayon
On Tue, Sep 10, 2024 at 8:02 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 10 Sept 2024 at 22:46, Michael Banck <mbanck@gmx.net> wrote:
> How about the indislive flag instead? I haven't looked at the code, but
> from the documentation ("If false, the index is in process of being
> dropped, and
> should be ignored for all purposes") it sounds like we made be able to
> piggy-back on that instead?
Doing that could cause an UPDATE which would ordinarily not be
eligible for a HOT-update to become a HOT-update. That would cause
issues if the index is enabled again as the index wouldn't have been
updated during the UPDATE.
I don't see the big deal with adding a new flag. There's even a free
padding byte to put this flag in after indisreplident, so we don't
have to worry about using more memory.
David
Kind Regards,
Shayon Mukherjee
Hello,
Thank you for the detailed information and feedback David. Comments inline.
P.S Re-sending it to the mailing list, because I accidentally didn't select reply-all on the last reply.
On Mon, Sep 9, 2024 at 6:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 10 Sept 2024 at 09:39, Shayon Mukherjee <shayonj@gmail.com> wrote:
> Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can be resource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal since as a user you may want a quicker way to test their effects without fully committing to removing & adding them back again. Which can be a time taking operation on larger tables.
>
> Proposal:
> I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look something like:
>
> ALTER INDEX index_name ENABLE;
> ALTER INDEX index_name DISABLE;
>
> A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess whether an index impacts query performance before deciding to drop it entirely.
I personally think having some way to alter an index to stop it from
being used in query plans would be very useful for the reasons you
mentioned. I don't have any arguments against the syntax you've
proposed. We'd certainly have to clearly document that constraints
are still enforced. Perhaps there is some other syntax which would
self-document slightly better. I just can't think of it right now.
Thank you and likewise. I was thinking of piggy backing off of VALID / NOT VALID, but that might have similar issues (if not more confusion) to the current proposed syntax. Will be sure to update the documentation.
> Implementation:
> To keep this simple, I suggest toggling the indisvalid flag in pg_index during the enable/disable operation.
That's not a good idea as it would allow ALTER INDEX ... ENABLE; to be
used to make valid a failed concurrently created index. I think this
would need a new flag and everywhere in the planner would need to be
adjusted to ignore indexes when that flag is false.
That is a great call and I wasn't thinking of the semantics with the existing usage of concurrently created indexes.
> Additional Considerations:
> - Keeping the index up-to-date while it’s disabled seems preferable, as it avoids the need to rebuild the index if it’s re-enabled later. The alternative would be dropping and rebuilding the index upon re-enabling, which I believe would introduce additional overhead in terms of application logic & complexity.
I think the primary use case here is to assist in dropping useless
indexes in a way that can very quickly be undone if the index is more
useful than thought. If you didn't keep the index up-to-date then that
would make the feature useless for that purpose.
+1
If we get the skip scan feature for PG18, then there's likely going to
be lots of people with indexes that they might want to consider
removing after upgrading. Maybe this is a good time to consider this
feature as it possibly won't ever be more useful than it will be after
we get skip scans.
David
Thank you for the feedback again, I will look into the changes required and accordingly propose a PATCH.
Kind Regards,
Shayon Mukherjee
On Tue, Sep 10, 2024 at 10:16:34AM +1200, David Rowley wrote: > I think the primary use case here is to assist in dropping useless > indexes in a way that can very quickly be undone if the index is more > useful than thought. If you didn't keep the index up-to-date then that > would make the feature useless for that purpose. > > If we get the skip scan feature for PG18, then there's likely going to > be lots of people with indexes that they might want to consider > removing after upgrading. Maybe this is a good time to consider this > feature as it possibly won't ever be more useful than it will be after > we get skip scans. +1, this is something I've wanted for some time. There was some past discussion, too [0]. [0] https://postgr.es/m/flat/ed8c9ed7-bb5d-aaec-065b-ad4893645deb%402ndQuadrant.com -- nathan
On Wed, 11 Sept 2024 at 03:12, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Sep 10, 2024 at 10:16:34AM +1200, David Rowley wrote: > > If we get the skip scan feature for PG18, then there's likely going to > > be lots of people with indexes that they might want to consider > > removing after upgrading. Maybe this is a good time to consider this > > feature as it possibly won't ever be more useful than it will be after > > we get skip scans. > > +1, this is something I've wanted for some time. There was some past > discussion, too [0]. > > [0] https://postgr.es/m/flat/ed8c9ed7-bb5d-aaec-065b-ad4893645deb%402ndQuadrant.com Thanks for digging that up. I'd forgotten about that. I see there was pushback from having this last time, which is now over 6 years ago. In the meantime, we still have nothing to make this easy for people. I think the most important point I read in that thread is [1]. Maybe what I mentioned in [2] is a good workaround. Additionally, I think there will need to be syntax in CREATE INDEX for this. Without that pg_get_indexdef() might return SQL that does not reflect the current state of the index. MySQL seems to use "CREATE INDEX name ON table (col) [VISIBLE|INVISIBLE]". David [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytvmcm%40alap3.anarazel.de [2] https://www.postgresql.org/message-id/CAKJS1f_L7y_BTGESp5Qd6BSRHXP0mj3x9O9C_U27GU478UwpBw%40mail.gmail.com
On Mon, 23 Sept 2024 at 05:43, Shayon Mukherjee <shayonj@gmail.com> wrote: > - Modified get_index_paths() and build_index_paths() to exclude disabled > indexes from consideration during query planning. There are quite a large number of other places you also need to modify. Here are 2 places where the index should be ignored but isn't: 1. expression indexes seem to still be used for statistical estimations: create table b as select generate_series(1,1000)b; create index on b((b%10)); analyze b; explain select distinct b%10 from b; -- HashAggregate (cost=23.00..23.12 rows=10 width=4) alter index b_expr_idx disable; explain select distinct b%10 from b; -- HashAggregate (cost=23.00..23.12 rows=10 width=4) <-- should be 1000 rows drop index b_expr_idx; explain select distinct b%10 from b; -- HashAggregate (cost=23.00..35.50 rows=1000 width=4) 2. Indexes seem to still be used for join removals. create table c (c int primary key); explain select c1.* from c c1 left join c c2 on c1.c=c2.c; -- correctly removes join. alter index c_pkey disable; explain select c1.* from c c1 left join c c2 on c1.c=c2.c; -- should not remove join. Please carefully look over all places that RelOptInfo.indexlist is looked at and consider skipping disabled indexes. Please also take time to find SQL that exercises each of those places so you can verify that the behaviour is correct after your change. This is also a good way to learn exactly all cases where indexes are used. Using this method would have led you to find places like rel_supports_distinctness(), where you should be skipping disabled indexes. The planner should not be making use of disabled indexes for any optimisations at all. > - catversion.h is updated with a new CATALOG_VERSION_NO to reflect change in pg_index > schema. Please leave that up to the committer. Patch authors doing this just results in the patch no longer applying as soon as someone commits a version bump. Also, please get rid of these notices. The command tag serves that purpose. It's not interesting that the index is already disabled. # alter index a_pkey disable; NOTICE: index "a_pkey" is now disabled ALTER INDEX # alter index a_pkey disable; NOTICE: index "a_pkey" is already disabled ALTER INDEX I've only given the code a very quick glance. I don't quite understand why you're checking the index is enabled in create_index_paths() and get_index_paths(). I think the check should be done only in create_index_paths(). Primarily, you'll find code such as "if (index->indpred != NIL && !index->predOK)" in the locations you need to consider skipping the disabled index. I think your new code should be located very close to those places or perhaps within the same if condition unless it makes it overly complex for the human reader. I think the documents should also mention that disabling an index is a useful way to verify an index is not being used before dropping it as the index can be enabled again at the first sign that performance has been effected. (It might also be good to mention that checking pg_stat_user_indexes.idx_scan should be the first port of call when checking for unused indexes) David
Hi David, Thank you so much for the review and pointers. I totally missed expression indexes. I am going to do another proper passalong with your feedback and follow up with an updated patch and any questions. Excited to be learning so much about the internals. Shayon > On Sep 22, 2024, at 6:44 PM, David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 23 Sept 2024 at 05:43, Shayon Mukherjee <shayonj@gmail.com> wrote: >> - Modified get_index_paths() and build_index_paths() to exclude disabled >> indexes from consideration during query planning. > > There are quite a large number of other places you also need to modify. > > Here are 2 places where the index should be ignored but isn't: > > 1. expression indexes seem to still be used for statistical estimations: > > create table b as select generate_series(1,1000)b; > create index on b((b%10)); > analyze b; > explain select distinct b%10 from b; > -- HashAggregate (cost=23.00..23.12 rows=10 width=4) > > alter index b_expr_idx disable; > explain select distinct b%10 from b; > -- HashAggregate (cost=23.00..23.12 rows=10 width=4) <-- should be 1000 rows > > drop index b_expr_idx; > explain select distinct b%10 from b; > -- HashAggregate (cost=23.00..35.50 rows=1000 width=4) > > 2. Indexes seem to still be used for join removals. > > create table c (c int primary key); > explain select c1.* from c c1 left join c c2 on c1.c=c2.c; -- > correctly removes join. > alter index c_pkey disable; > explain select c1.* from c c1 left join c c2 on c1.c=c2.c; -- should > not remove join. > > Please carefully look over all places that RelOptInfo.indexlist is > looked at and consider skipping disabled indexes. Please also take > time to find SQL that exercises each of those places so you can verify > that the behaviour is correct after your change. This is also a good > way to learn exactly all cases where indexes are used. Using this > method would have led you to find places like > rel_supports_distinctness(), where you should be skipping disabled > indexes. > > The planner should not be making use of disabled indexes for any > optimisations at all. > >> - catversion.h is updated with a new CATALOG_VERSION_NO to reflect change in pg_index >> schema. > > Please leave that up to the committer. Patch authors doing this just > results in the patch no longer applying as soon as someone commits a > version bump. > > Also, please get rid of these notices. The command tag serves that > purpose. It's not interesting that the index is already disabled. > > # alter index a_pkey disable; > NOTICE: index "a_pkey" is now disabled > ALTER INDEX > # alter index a_pkey disable; > NOTICE: index "a_pkey" is already disabled > ALTER INDEX > > I've only given the code a very quick glance. I don't quite understand > why you're checking the index is enabled in create_index_paths() and > get_index_paths(). I think the check should be done only in > create_index_paths(). Primarily, you'll find code such as "if > (index->indpred != NIL && !index->predOK)" in the locations you need > to consider skipping the disabled index. I think your new code should > be located very close to those places or perhaps within the same if > condition unless it makes it overly complex for the human reader. > > I think the documents should also mention that disabling an index is a > useful way to verify an index is not being used before dropping it as > the index can be enabled again at the first sign that performance has > been effected. (It might also be good to mention that checking > pg_stat_user_indexes.idx_scan should be the first port of call when > checking for unused indexes) > > David
On 09.09.24 23:38, Shayon Mukherjee wrote: > *Problem*: > Adding and removing indexes is a common operation in PostgreSQL. On > larger databases, however, these operations can be resource-intensive. > When evaluating the performance impact of one or more indexes, dropping > them might not be ideal since as a user you may want a quicker way to > test their effects without fully committing to removing & adding them > back again. Which can be a time taking operation on larger tables. > > *Proposal*: > I propose adding an ALTER INDEX command that allows for enabling or > disabling an index globally. This could look something like: > > ALTER INDEX index_name ENABLE; > ALTER INDEX index_name DISABLE; > > A disabled index would still receive updates and enforce constraints as > usual but would not be used for queries. This allows users to assess > whether an index impacts query performance before deciding to drop it > entirely. I think a better approach would be to make the list of disabled indexes a GUC setting, which would then internally have an effect similar to enable_indexscan, meaning it would make the listed indexes unattractive to the planner. This seems better than the proposed DDL command, because you'd be able to use this per-session, instead of forcing a global state, and even unprivileged users could use it. (I think we have had proposals like this before, but I can't find the discussion I'm thinking of right now.)
That's a good point. +1 for the idea of the GUC setting, especially since, as you mentioned, it allows unprivileged users to access it and beingper-session.. I am happy to draft a patch for this as well. I think I have a working idea so far of where the necessary checks might go.However if you don’t mind, can you elaborate further on how the effect would be similar to enable_indexscan? I was thinking we could introduce a new GUC option called `disabled_indexes` and perform a check against in all places foreach index being considered with its OID via get_relname_relid through a helper function in the various places we needto prompt the planner to not use the index (like in indxpath.c as an example). Curious to learn if you have a different approach in mind perhaps? Thank you, Shayon > On Sep 23, 2024, at 11:14 AM, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 09.09.24 23:38, Shayon Mukherjee wrote: >> *Problem*: >> Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can beresource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal sinceas a user you may want a quicker way to test their effects without fully committing to removing & adding them back again.Which can be a time taking operation on larger tables. >> *Proposal*: >> I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look somethinglike: >> ALTER INDEX index_name ENABLE; >> ALTER INDEX index_name DISABLE; >> A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. Thisallows users to assess whether an index impacts query performance before deciding to drop it entirely. > > I think a better approach would be to make the list of disabled indexes a GUC setting, which would then internally havean effect similar to enable_indexscan, meaning it would make the listed indexes unattractive to the planner. > > This seems better than the proposed DDL command, because you'd be able to use this per-session, instead of forcing a globalstate, and even unprivileged users could use it. > > (I think we have had proposals like this before, but I can't find the discussion I'm thinking of right now.) >
I found an old thread here [0].
Also, a question: If we go with the GUC approach, how do we expect `pg_get_indexdef` to behave?
I suppose it would behave no differently than it otherwise would, because there's no new SQL grammar to support and, given its GUC status, it seems reasonable that `pg_get_indexdef` doesn’t reflect whether an index is enabled or not.
If so, then I wonder if using a dedicated `ALTER` command and keeping the state in `pg_index` would be better for consistency's sake?
Thank you
Shayon
On Sep 23, 2024, at 4:51 PM, Shayon Mukherjee <shayonj@gmail.com> wrote:That's a good point.
+1 for the idea of the GUC setting, especially since, as you mentioned, it allows unprivileged users to access it and being per-session..
I am happy to draft a patch for this as well. I think I have a working idea so far of where the necessary checks might go. However if you don’t mind, can you elaborate further on how the effect would be similar to enable_indexscan?
I was thinking we could introduce a new GUC option called `disabled_indexes` and perform a check against in all places for each index being considered with its OID via get_relname_relid through a helper function in the various places we need to prompt the planner to not use the index (like in indxpath.c as an example).
Curious to learn if you have a different approach in mind perhaps?
Thank you,
ShayonOn Sep 23, 2024, at 11:14 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
On 09.09.24 23:38, Shayon Mukherjee wrote:*Problem*:
Adding and removing indexes is a common operation in PostgreSQL. On larger databases, however, these operations can be resource-intensive. When evaluating the performance impact of one or more indexes, dropping them might not be ideal since as a user you may want a quicker way to test their effects without fully committing to removing & adding them back again. Which can be a time taking operation on larger tables.
*Proposal*:
I propose adding an ALTER INDEX command that allows for enabling or disabling an index globally. This could look something like:
ALTER INDEX index_name ENABLE;
ALTER INDEX index_name DISABLE;
A disabled index would still receive updates and enforce constraints as usual but would not be used for queries. This allows users to assess whether an index impacts query performance before deciding to drop it entirely.
I think a better approach would be to make the list of disabled indexes a GUC setting, which would then internally have an effect similar to enable_indexscan, meaning it would make the listed indexes unattractive to the planner.
This seems better than the proposed DDL command, because you'd be able to use this per-session, instead of forcing a global state, and even unprivileged users could use it.
(I think we have had proposals like this before, but I can't find the discussion I'm thinking of right now.)
On Tue, 24 Sept 2024 at 03:14, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 09.09.24 23:38, Shayon Mukherjee wrote: > > ALTER INDEX index_name ENABLE; > > ALTER INDEX index_name DISABLE; > > I think a better approach would be to make the list of disabled indexes > a GUC setting, which would then internally have an effect similar to > enable_indexscan, meaning it would make the listed indexes unattractive > to the planner. I understand the last discussion went down that route too. For me, it seems strange that adding some global variable is seen as cleaner than storing the property in the same location as all the other index properties. How would you ensure no cached plans are still using the index after changing the GUC? > This seems better than the proposed DDL command, because you'd be able > to use this per-session, instead of forcing a global state, and even > unprivileged users could use it. That's true. > (I think we have had proposals like this before, but I can't find the > discussion I'm thinking of right now.) I think it's the one that was already linked by Nathan. [1]? The GUC seems to have been first suggested on the same thread in [2]. David [1] https://www.postgresql.org/message-id/ed8c9ed7-bb5d-aaec-065b-ad4893645deb%402ndQuadrant.com [2] https://www.postgresql.org/message-id/29800.1529359024%40sss.pgh.pa.us
On Mon, Sep 23, 2024 at 8:31 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 24 Sept 2024 at 03:14, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 09.09.24 23:38, Shayon Mukherjee wrote:
> > ALTER INDEX index_name ENABLE;
> > ALTER INDEX index_name DISABLE;
>
> I think a better approach would be to make the list of disabled indexes
> a GUC setting, which would then internally have an effect similar to
> enable_indexscan, meaning it would make the listed indexes unattractive
> to the planner.
I understand the last discussion went down that route too. For me, it
seems strange that adding some global variable is seen as cleaner than
storing the property in the same location as all the other index
properties.
That was my first instinct as well. Although, being able to control this setting on a per session level and as an unprivileged user is somewhat attractive.
How would you ensure no cached plans are still using the index after
changing the GUC?
Could we call ResetPlanCache() to invalidate all plan caches from the assign_ hook on GUC when it's set (and doesn't match the old value). Something like this (assuming the GUC is called `disabled_indexes`)
void
assign_disabled_indexes(const char *newval, void *extra)
{
if (disabled_indexes != newval)
ResetPlanCache();
}
assign_disabled_indexes(const char *newval, void *extra)
{
if (disabled_indexes != newval)
ResetPlanCache();
}
A bit heavy-handed, but perhaps it's OK, since it's not meant to be used frequently also ?
> This seems better than the proposed DDL command, because you'd be able
> to use this per-session, instead of forcing a global state, and even
> unprivileged users could use it.
That's true.
> (I think we have had proposals like this before, but I can't find the
> discussion I'm thinking of right now.)
I think it's the one that was already linked by Nathan. [1]? The GUC
seems to have been first suggested on the same thread in [2].
David
[1] https://www.postgresql.org/message-id/ed8c9ed7-bb5d-aaec-065b-ad4893645deb%402ndQuadrant.com
[2] https://www.postgresql.org/message-id/29800.1529359024%40sss.pgh.pa.us
If one of the use cases is soft-dropping indexes, would a GUC approach still support that? ALTER TABLE?
On 23.09.24 22:51, Shayon Mukherjee wrote: > I am happy to draft a patch for this as well. I think I have a working > idea so far of where the necessary checks might go. However if you don’t > mind, can you elaborate further on how the effect would be similar to > enable_indexscan? Planner settings like enable_indexscan used to just add a large number (disable_cost) to the estimated plan node costs. It's a bit more sophisticated in PG17. But in any case, I imagine "disabling an index" could use the same mechanism. Or maybe not, maybe the setting would just completely ignore the index.
On 24.09.24 02:30, David Rowley wrote: > I understand the last discussion went down that route too. For me, it > seems strange that adding some global variable is seen as cleaner than > storing the property in the same location as all the other index > properties. It's arguably not actually a property of the index, it's a property of the user's session. Like, kind of, the search path is a session property, not a property of a schema. > How would you ensure no cached plans are still using the index after > changing the GUC? Something for the patch author to figure out. ;-)
Peter Eisentraut <peter@eisentraut.org> writes: > On 23.09.24 22:51, Shayon Mukherjee wrote: >> I am happy to draft a patch for this as well. I think I have a working >> idea so far of where the necessary checks might go. However if you don’t >> mind, can you elaborate further on how the effect would be similar to >> enable_indexscan? > Planner settings like enable_indexscan used to just add a large number > (disable_cost) to the estimated plan node costs. It's a bit more > sophisticated in PG17. But in any case, I imagine "disabling an index" > could use the same mechanism. Or maybe not, maybe the setting would > just completely ignore the index. Yeah, I'd be inclined to implement this by having create_index_paths just not make any paths for rejected indexes. Or you could back up another step and keep plancat.c from building IndexOptInfos for them. The latter might have additional effects, such as preventing the plan from relying on a uniqueness condition enforced by the index. Not clear to me if that's desirable or not. [ thinks... ] One good reason for implementing it in plancat.c is that you'd have the index relation open and be able to see its name for purposes of matching to the filter. Anywhere else, getting the name would involve additional overhead. regards, tom lane
Thank you for the historical context and working, I understand what you were referring to before now. Shayon > On Sep 24, 2024, at 2:08 PM, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 23.09.24 22:51, Shayon Mukherjee wrote: >> I am happy to draft a patch for this as well. I think I have a working >> idea so far of where the necessary checks might go. However if you don’t >> mind, can you elaborate further on how the effect would be similar to >> enable_indexscan? > > Planner settings like enable_indexscan used to just add a large number (disable_cost) to the estimated plan node costs. It's a bit more sophisticated in PG17. But in any case, I imagine "disabling an index" could use the same mechanism. Or maybe not, maybe the setting would just completely ignore the index.
Hello,
Also added this as a post in Commit Fest [0]
Thank you
Shayon
On Sep 26, 2024, at 1:39 PM, Shayon Mukherjee <shayonj@gmail.com> wrote:Hello,
I am back with a PATCH :). Thanks to everyone in the threads for all the helpful discussions.
This proposal is for a PATCH to introduce a GUC variable to disable specific indexes during query planning.
This is an alternative approach to the previous PATCH I had proposed and is improved upon after some of the recent discussions in the thread. The PATCH contains the relevant changes, regression tests, and documentation.
I went with the GUC approach to introduce a way for a user to disable indexes during query planning over dedicated SQL Grammar and introducing the `isenabled` attribute in `pg_index` for the following reasons:
- Inspired by the discussions brought in earlier about this setting being something that unprivileged users can benefit from versus an ALTER statement.
- A GUC variable felt more closely aligned with the query tuning purpose, which this feature would serve, over index maintenance, the state of which is more closely reflected in `pg_index`.
Implementation details:
The patch introduces a new GUC parameter `disabled_indexes` that allows users to specify a comma-separated list of indexes to be ignored during query planning. Key aspects:
- Adds a new `isdisabled` attribute to the `IndexOptInfo` structure.
- Modifies `get_relation_info` in `plancat.c` to skip disabled indexes entirely, thus reducing the number of places we need to check if an index is disabled or not.
- Implements GUC hooks for parameter validation and assignment.
- Resets the plan cache when the `disabled_indexes` list is modified through `ResetPlanCache()`
I chose to modify the logic within `get_relation_info` as compared to, say, reducing the cost to make the planner not consider an index during planning, mostly to keep the number of changes being introduced to a minimum and also the logic itself being self-contained and easier to under perhaps (?).
As mentioned before, this does not impact the building of the index. That still happens.
I have added regression tests for:
- Basic single-column and multi-column indexes
- Partial indexes
- Expression indexes
- Join indexes
- GIN and GiST indexes
- Covering indexes
- Range indexes
- Unique indexes and constraints
I'd love to hear any feedback on the proposed PATCH and also the overall approach.
<v1-0001-Ability-to-enable-disable-indexes-through-GUC.patch>On Sep 24, 2024, at 9:19 AM, Shayon Mukherjee <shayonj@gmail.com> wrote:
--
Kind Regards,
Shayon Mukherjee
<v1-0001-Proof-of-Concept-Ability-to-enable-disable-indexe.patch>
On Mon, Sep 23, 2024 at 11:14 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I think a better approach would be to make the list of disabled indexes > a GUC setting, which would then internally have an effect similar to > enable_indexscan, meaning it would make the listed indexes unattractive > to the planner. > > This seems better than the proposed DDL command, because you'd be able > to use this per-session, instead of forcing a global state, and even > unprivileged users could use it. > > (I think we have had proposals like this before, but I can't find the > discussion I'm thinking of right now.) I feel like a given user could want either one of these things. If you've discovered that a certain index is causing your production application to pick the wrong index, disabling it and thereby affecting all backends is what you want. If you're trying to experiment with different query plans without changing anything for other backends, being able to set some session-local state is better. I don't understand the argument that one of these is categorically better than the other. -- Robert Haas EDB: http://www.enterprisedb.com
> On Oct 7, 2024, at 4:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Sep 23, 2024 at 11:14 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> I think a better approach would be to make the list of disabled indexes >> a GUC setting, which would then internally have an effect similar to >> enable_indexscan, meaning it would make the listed indexes unattractive >> to the planner. >> >> This seems better than the proposed DDL command, because you'd be able >> to use this per-session, instead of forcing a global state, and even >> unprivileged users could use it. >> >> (I think we have had proposals like this before, but I can't find the >> discussion I'm thinking of right now.) > > I feel like a given user could want either one of these things. If > you've discovered that a certain index is causing your production > application to pick the wrong index, disabling it and thereby > affecting all backends is what you want. If you're trying to > experiment with different query plans without changing anything for > other backends, being able to set some session-local state is better. > I don't understand the argument that one of these is categorically > better than the other. Makes sense to me and it’s something I am somewhat split on as well. I suppose with a GUC you can still do some thing like ALTER USER foobar SET disabled_indexes to ‘idx_test_table_id’ [thinking…] This way all new sessions will start to not consider the index when query planning. Of course it does not helpexisting sessions, so one may need to kill those backends, which could be heavy handed. Both these options clearly serve slightly different purposes with good pros and I am currently thinking if GUC is that goodmiddle ground solution. Curious if someone has a stronger opinion on which one of these might make more sense perhaps :-D. [thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attribute onpg_index? I can see that both implementations (GUC and the new attribute on pg_index via ALTER) have the primary logic managed by `get_relation_info`in `plancat.c`. Here, we set `isdisabled` (new attribute) on `IndexOptInfo` and compare it against `disabled_indexes`in the GUC (from the previous GUC patch). Similarly, for `pg_index`, which is already open in `get_relation_info`,we can read from `pg_index.isdisabled` and accordingly update `IndexOptInfo.isdisabled`. [0] https://www.postgresql.org/message-id/6CE345C1-6FFD-4E4C-8775-45DA659C57CF@gmail.com Thanks Shayon
On Wed, 9 Oct 2024 at 20:07, Shayon Mukherjee <shayonj@gmail.com> wrote: > [thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attributeon pg_index? I just wanted to explain my point of view on this. This is my opinion and is by no means authoritative. I was interested in this patch when you proposed it as an ALTER INDEX option. I know other committers seem interested, but I personally don't have any interest in the GUC option. I think the reason I dislike it is that it's yet another not even half-baked take on planner hints (the other one being enable* GUCs). I often thought that if we ever did planner hints that it would be great to have multiple ways to specify the hints. Ordinarily, I'd expect some special comment type as the primary method to specify hints, but equally, it would be nice to be able to specify them in other ways. e.g. a GUC to have them apply to more than just 1 query. Useful for things such as "don't use index X". Now, I'm not suggesting you go off and code up planner hints. That's a huge project. I'm just concerned that we've already got a fair bit of cruft that will be left remaining if we ever get core planner hints and a disabled_indexes GUC will just add to that. I don't feel like the ALTER INDEX method would be leftover cruft from us gaining core planner hints. Others might feel differently on that one. I feel the ALTER INDEX option is less controversial. I'll also stand by what I said earlier on this thread. If PeterG gets index skip scans done for PG18, then it's likely there's going to be lots of users considering if they still need a certain index or not after upgrading to PG18. David
On Wed, Oct 9, 2024 at 4:19 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, 9 Oct 2024 at 20:07, Shayon Mukherjee <shayonj@gmail.com> wrote: > > [thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attributeon pg_index? > > I just wanted to explain my point of view on this. This is my opinion > and is by no means authoritative. > > I was interested in this patch when you proposed it as an ALTER INDEX > option. I know other committers seem interested, but I personally > don't have any interest in the GUC option. I think the reason I > dislike it is that it's yet another not even half-baked take on > planner hints (the other one being enable* GUCs). I often thought that > if we ever did planner hints that it would be great to have multiple > ways to specify the hints. Ordinarily, I'd expect some special comment > type as the primary method to specify hints, but equally, it would be > nice to be able to specify them in other ways. e.g. a GUC to have them > apply to more than just 1 query. Useful for things such as "don't use > index X". +1. A GUC can be done as a contrib module using existing hooks, and I think that's already been done outside of core, perhaps multiple times. That certainly doesn't mean we CAN'T add it as an in-core feature, but I do think "yet another not even half-baked take on planner hints" is a fair description. What I would personally like to see is for us to ship one or possibly more than one contrib module that let people do hint-like things in useful ways, and this could be a part of that. But I think we need better infrastructure for controlling the planner behavior first, hence the "allowing extensions to control planner behavior" thread. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Oct 9, 2024 at 1:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 9, 2024 at 4:19 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 9 Oct 2024 at 20:07, Shayon Mukherjee <shayonj@gmail.com> wrote:
> > [thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attribute on pg_index?
>
> I just wanted to explain my point of view on this. This is my opinion
> and is by no means authoritative.
>
> I was interested in this patch when you proposed it as an ALTER INDEX
> option. I know other committers seem interested, but I personally
> don't have any interest in the GUC option. I think the reason I
> dislike it is that it's yet another not even half-baked take on
> planner hints (the other one being enable* GUCs). I often thought that
> if we ever did planner hints that it would be great to have multiple
> ways to specify the hints. Ordinarily, I'd expect some special comment
> type as the primary method to specify hints, but equally, it would be
> nice to be able to specify them in other ways. e.g. a GUC to have them
> apply to more than just 1 query. Useful for things such as "don't use
> index X".
+1. A GUC can be done as a contrib module using existing hooks, and I
think that's already been done outside of core, perhaps multiple
times. That certainly doesn't mean we CAN'T add it as an in-core
feature, but I do think "yet another not even half-baked take on
planner hints" is a fair description. What I would personally like to
see is for us to ship one or possibly more than one contrib module
that let people do hint-like things in useful ways, and this could be
a part of that. But I think we need better infrastructure for
controlling the planner behavior first, hence the "allowing extensions
to control planner behavior" thread.
What's the strategy here in this discussion?
This topic is older than PostgreSQL itself - everytime WE talk about so-called "hints" we see procrastination in the name of trademarks.
Lack of definition of what can and can't be done with hooks and what is the infra-estructural code that is necessary to allow it from core.
Take for example the need of disabling an index. What does it mean practically and for which component of the code?
You are going to disable the index but not the update of it? Why? Does it imply that when you are going to re-enable it you are going to recreate it?
So that's more observed from the point of syntax and facilities.
Also distributed into partitions and why not the opposite: the creation of a global index for all partitions. Also in discussion elsewhere.
Otherwise it will appear that people will need hints and contribs to "outsource" the main role of what is strategic to the team to companies elsewhere.
Regards,
Vinícius
Hi David,
Answered below
Thank you so much this context, as someone new to psql-hackers, having this insight is super useful. Also getting a sense of how folks feel about controlling different behaviors like planner hints through GUC and SQL grammar.
On Oct 9, 2024, at 9:19 AM, David Rowley <dgrowleyml@gmail.com> wrote:On Wed, 9 Oct 2024 at 20:07, Shayon Mukherjee <shayonj@gmail.com> wrote:[thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attribute on pg_index?
I just wanted to explain my point of view on this. This is my opinion
and is by no means authoritative.
I was interested in this patch when you proposed it as an ALTER INDEX
option. I know other committers seem interested, but I personally
don't have any interest in the GUC option. I think the reason I
dislike it is that it's yet another not even half-baked take on
planner hints (the other one being enable* GUCs). I often thought that
if we ever did planner hints that it would be great to have multiple
ways to specify the hints. Ordinarily, I'd expect some special comment
type as the primary method to specify hints, but equally, it would be
nice to be able to specify them in other ways. e.g. a GUC to have them
apply to more than just 1 query. Useful for things such as "don't use
index X".
For instance: I wasn’t quite able to figure out the how to properly distinguish + reason between the enable* GUCs and ALTER index for this case, and patches are per my limited understand of the historical context as well.
Now, I'm not suggesting you go off and code up planner hints. That's a
huge project. I'm just concerned that we've already got a fair bit of
cruft that will be left remaining if we ever get core planner hints
and a disabled_indexes GUC will just add to that. I don't feel like
the ALTER INDEX method would be leftover cruft from us gaining core
planner hints. Others might feel differently on that one. I feel the
ALTER INDEX option is less controversial.
I'll also stand by what I said earlier on this thread. If PeterG gets
index skip scans done for PG18, then it's likely there's going to be
lots of users considering if they still need a certain index or not
after upgrading to PG18.
Likewise, I personally feel that the ability to disable indexes quickly and reverse the disabling (also quickly) is super useful, especially from an operational POV (point of view). So, I am very keen on getting this landed and happy to iterate on as many patches as it takes. :D
At this point, I am indifferent to each of the approaches (GUC or SQL grammar) based on the pros/cons I shared earlier in the thread & discussions in the thread. However, I would like us to make progress on getting _something_ out since the topic of disabling indexes has come up many times on pgsql-hackers in the past years and there is no easy way to toggle this behavior yet.
“yet another not even half-baked take on planner hints" is a good way to put things about enable* GUCs, so I am very much on board with proposing an updated PATCH to support disabling of indexes through ALTER. The original PATCH was here for context [1].
I am also curious about supporting this ([1]) through the ALTER grammar and not having the planner consider indexes by updating `get_relation_info` in `plancat.c`. Basically, through `pg_index.isdisabled`, which is already open in `get_relation_info`, we can read from `pg_index.isdisabled` and accordingly update `IndexOptInfo.isdisabled`. So, I'm happy to explore that as well and share my findings.
Thanks
Shayon
On Oct 9, 2024, at 1:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Oct 9, 2024 at 4:19 AM David Rowley <dgrowleyml@gmail.com> wrote:On Wed, 9 Oct 2024 at 20:07, Shayon Mukherjee <shayonj@gmail.com> wrote:[thinking…] Unless - we try to do support both a GUC and the ALTER INDEX ENABLE/DISABLE grammar + isdisabled attribute on pg_index?
I just wanted to explain my point of view on this. This is my opinion
and is by no means authoritative.
I was interested in this patch when you proposed it as an ALTER INDEX
option. I know other committers seem interested, but I personally
don't have any interest in the GUC option. I think the reason I
dislike it is that it's yet another not even half-baked take on
planner hints (the other one being enable* GUCs). I often thought that
if we ever did planner hints that it would be great to have multiple
ways to specify the hints. Ordinarily, I'd expect some special comment
type as the primary method to specify hints, but equally, it would be
nice to be able to specify them in other ways. e.g. a GUC to have them
apply to more than just 1 query. Useful for things such as "don't use
index X".
+1. A GUC can be done as a contrib module using existing hooks, and I
think that's already been done outside of core, perhaps multiple
times. That certainly doesn't mean we CAN'T add it as an in-core
feature, but I do think "yet another not even half-baked take on
planner hints" is a fair description. What I would personally like to
see is for us to ship one or possibly more than one contrib module
that let people do hint-like things in useful ways, and this could be
a part of that. But I think we need better infrastructure for
controlling the planner behavior first, hence the "allowing extensions
to control planner behavior" thread.
- Support disabling of indexes [1] through ALTER command
- While also building on "allowing extensions to control planner behavior” for the reasons above?
Thanks
Shayon
On Sat, Oct 12, 2024 at 5:56 AM Shayon Mukherjee <shayonj@gmail.com> wrote: > Thank you for sharing this Robert. I like the idea behind "allowing extensions to control planner behavior” overall andI think it does help towards a powerful extension ecosystem too. I wonder if there is a reality where we can achieve boththe outcomes here > > - Support disabling of indexes [1] through ALTER command > - While also building on "allowing extensions to control planner behavior” for the reasons above? > > [1] https://www.postgresql.org/message-id/ABD42A12-4DCF-4EE4-B903-4C657903CECE%40gmail.com Yes, I think we can do both things. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, 12 Oct 2024 at 22:41, Vinícius Abrahão <vinnix.bsd@gmail.com> wrote: > You are going to disable the index but not the update of it? Why? Does it imply that when you are going to re-enable ityou are going to recreate it? It might be worth you reading the discussion and proposed patches. I think either of those would answer your questions. I don't recall anyone ever proposing that re-enabling the index would result in it having to be rebuilt. If that was a requirement, then I'd say there does not seem much point in the feature. You might as well just drop the index and recreate it if you change your mind. David
On Wed, 16 Oct 2024 at 03:40, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Oct 12, 2024 at 5:56 AM Shayon Mukherjee <shayonj@gmail.com> wrote: > > Thank you for sharing this Robert. I like the idea behind "allowing extensions to control planner behavior” overall andI think it does help towards a powerful extension ecosystem too. I wonder if there is a reality where we can achieve boththe outcomes here > > > > - Support disabling of indexes [1] through ALTER command > > - While also building on "allowing extensions to control planner behavior” for the reasons above? > > > > [1] https://www.postgresql.org/message-id/ABD42A12-4DCF-4EE4-B903-4C657903CECE%40gmail.com > > Yes, I think we can do both things. I think so too. I imagine there'd be cases where even hints global to all queries running on the server wouldn't result in the index being completely disabled. For example, a physical replica might not be privy to the hints defined on the primary and it might just be the queries running on the physical replica that are getting the most use out of the given index. Having the change made in pg_index would mean physical replicas have the index disabled too. For the primary use case I have in mind (test disabling indexes you're considering dropping), having the disabledness replicate would be very useful. David
On 24.09.24 20:21, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> On 23.09.24 22:51, Shayon Mukherjee wrote: >>> I am happy to draft a patch for this as well. I think I have a working >>> idea so far of where the necessary checks might go. However if you don’t >>> mind, can you elaborate further on how the effect would be similar to >>> enable_indexscan? > >> Planner settings like enable_indexscan used to just add a large number >> (disable_cost) to the estimated plan node costs. It's a bit more >> sophisticated in PG17. But in any case, I imagine "disabling an index" >> could use the same mechanism. Or maybe not, maybe the setting would >> just completely ignore the index. > > Yeah, I'd be inclined to implement this by having create_index_paths > just not make any paths for rejected indexes. Or you could back up > another step and keep plancat.c from building IndexOptInfos for them. > The latter might have additional effects, such as preventing the plan > from relying on a uniqueness condition enforced by the index. Not > clear to me if that's desirable or not. Yes, I think you'd want that, because one of the purposes of this feature would be to test whether it's okay to drop an index. So you don't want the planner to take any account of the index for its decisions.
On Wed, Oct 16, 2024 at 8:33 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > Yeah, I'd be inclined to implement this by having create_index_paths > > just not make any paths for rejected indexes. Or you could back up > > another step and keep plancat.c from building IndexOptInfos for them. > > The latter might have additional effects, such as preventing the plan > > from relying on a uniqueness condition enforced by the index. Not > > clear to me if that's desirable or not. > > Yes, I think you'd want that, because one of the purposes of this > feature would be to test whether it's okay to drop an index. So you > don't want the planner to take any account of the index for its decisions. I think this is right. I think we want to avoid invalidating the index, so we still need to consider it in determining where HOT updates must be performed, but we don't want to "improve" the plan using the index if it's disabled. -- Robert Haas EDB: http://www.enterprisedb.com
On Oct 16, 2024, at 2:15 PM, Shayon Mukherjee <shayonj@gmail.com> wrote:On Oct 16, 2024, at 12:19 PM, Shayon Mukherjee <shayonj@gmail.com> wrote:- ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_indexcatalog to protect against indcheckxmin [2] (older unrelated thread).Performing the in place update of the pg_index row from ATExecEnableDisableIndex using systable_inplace_update_begin was failing in CI weirdly but not on my local MacBook when running spec suite. I am also wondering what is causing the requirement [1] to update the row in-place vs say using CatalogTupleUpdate since using the later is working well locally + CI?
I believe I somewhat understand why the issue might be occurring, where CI is failing the create_index regression test and crashing (signal 6 in postmaster logs). I suspect it might be due to a segmentation fault or a similar issue.
IsInplaceUpdateOid is used in systable_inplace_update_begin (recently introduced), but it doesn’t yet support pg_index. Based on check_lock_if_inplace_updateable_rel in heapam.c and IsInplaceUpdateOid in catalog.c, introducing support for in-place updates via this new mechanism might not be straightforward for pg_index. It would require updating the callers to handle in-place updates and locks, I presume?
I did confirm that, based on the v2 PATCH, when not doing in-place updates on pg_index - it means that the xmin for pg_index would increment. I suppose there’s a risk of indcheckxmin being marked as TRUE when xmin exceeds, potentially causing an index to be accidentally skipped ? (I am not 100% sure)
I'll take some time to think this through and familiarize myself with the new systable_inplace_update_begin. In the meantime, aside from the in-place updates on pg_index, I would love to learn any first impressions or feedback on the patch folks may have.
Thank you,
Shayon
Interesting idea.
This patch needs a rebase.
On Thu, 17 Oct 2024 at 15:59, Shayon Mukherjee <shayonj@gmail.com> wrote:
On Oct 16, 2024, at 6:01 PM, Shayon Mukherjee <shayonj@gmail.com> wrote:I'll take some time to think this through and familiarize myself with the new systable_inplace_update_begin. In the meantime, aside from the in-place updates on pg_index, I would love to learn any first impressions or feedback on the patch folks may have.My take away from whether or not an in-place update is needed on pg_index [1]- It’s unclear to me why it’s needed.- I understand the xmin would get incremented when using CatalogTupleUpdate to update indisenabled.- However, I haven’t been able to replicate any odd behavior locally or CI.- FWIW - REINDEX CONCURRENTLY (via index_swap), index_constraint_create and few other places perform CatalogTupleUpdate to update the relevant attributes as well.Based on the above summary and after my testing I would like to propose a v3 of the patch. The only difference between this and v1 [2] is that the update of pg_index row happens via CatalogTupleUpdate.Thank you for bearing with me on this :DShayon
Regards,
Rafia Sabih
Rafia Sabih
CYBERTEC PostgreSQL International GmbH
On Thu, Oct 17, 2024 at 9:59 AM Shayon Mukherjee <shayonj@gmail.com> wrote: > My take away from whether or not an in-place update is needed on pg_index [1] > > - It’s unclear to me why it’s needed. > - I understand the xmin would get incremented when using CatalogTupleUpdate to update indisenabled. > - However, I haven’t been able to replicate any odd behavior locally or CI. > - FWIW - REINDEX CONCURRENTLY (via index_swap), index_constraint_create and few other places perform CatalogTupleUpdateto update the relevant attributes as well. > > Based on the above summary and after my testing I would like to propose a v3 of the patch. The only difference betweenthis and v1 [2] is that the update of pg_index row happens via CatalogTupleUpdate. > > [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytvmcm@alap3.anarazel.de > [2] https://www.postgresql.org/message-id/EF2313B8-A017-4869-9B7F-A24EDD8795DE%40gmail.com In-place updates are generally bad news, so I think this patch shouldn't use them. However, someone will need to investigate whether that breaks the indcheckxmin thing that Andres mentions in your [1], and if it does, figure out what to do about it. Creating a test case to show the breakage would probably be a good first step, to frame the discussion. -- Robert Haas EDB: http://www.enterprisedb.com
> On Nov 5, 2024, at 10:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Oct 17, 2024 at 9:59 AM Shayon Mukherjee <shayonj@gmail.com> wrote: >> My take away from whether or not an in-place update is needed on pg_index [1] >> >> - It’s unclear to me why it’s needed. >> - I understand the xmin would get incremented when using CatalogTupleUpdate to update indisenabled. >> - However, I haven’t been able to replicate any odd behavior locally or CI. >> - FWIW - REINDEX CONCURRENTLY (via index_swap), index_constraint_create and few other places perform CatalogTupleUpdateto update the relevant attributes as well. >> >> Based on the above summary and after my testing I would like to propose a v3 of the patch. The only difference betweenthis and v1 [2] is that the update of pg_index row happens via CatalogTupleUpdate. >> >> [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytvmcm@alap3.anarazel.de >> [2] https://www.postgresql.org/message-id/EF2313B8-A017-4869-9B7F-A24EDD8795DE%40gmail.com > > In-place updates are generally bad news, so I think this patch > shouldn't use them. However, someone will need to investigate whether > that breaks the indcheckxmin thing that Andres mentions in your [1], > and if it does, figure out what to do about it. Creating a test case > to show the breakage would probably be a good first step, to frame the > discussion. Hello, Thank you for the guidance and tips. I was wondering if updating in-place is preferable or not, since my first instinct wasto avoid it. I did not notice any breakage last time, unless I was looking in the wrong place (possibly?). I did noticethe min update when a not-in-place update was performed, but I didn't notice any issues (as mentioned in [1]) fromit, via logs, index usage, or other common operations. Let me write up some more test cases to check if there is a riskof indexcheckxmin running out or other issues, and I'll get back here. Thank you Shayon [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytvmcm@alap3.anarazel.de > > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Tue, 26 Nov 2024 at 11:34, Shayon Mukherjee <shayonj@gmail.com> wrote: > Thank you for the guidance and tips. I was wondering if updating in-place is preferable or not, since my first instinctwas to avoid it. I did not notice any breakage last time, unless I was looking in the wrong place (possibly?). Idid notice the min update when a not-in-place update was performed, but I didn't notice any issues (as mentioned in [1])from it, via logs, index usage, or other common operations. Let me write up some more test cases to check if there isa risk of indexcheckxmin running out or other issues, and I'll get back here. Another safer option could be to disallow the enable/disable ALTER TABLE if indcheckxmin is true. We do have and use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for these sorts of issues. There are other existing failure modes relating to indexes that can depend on what another session has done, e.g. MarkInheritDetached() can fail if another session detached an index concurrently. I could respect an argument that this one might be worse than that as its timing dependent rather than dependent on what another session has done. David
Hello.
A few comments on patch:
> + temporarily reducing the overhead of index maintenance
> + during bulk data loading operationsBut tuples are still inserted, where the difference come from?
> or verifying an index is not being used
> + before dropping itHm, it does not provide the guarantee - index may also be used as an arbiter for INSERT ON CONFLICT, for example. For that case, "update pg_index set indisvalid = false" should be used before the DROP, probably.
Also index may also be used for constraint, part of partitioned table, etc.
Also, I think it is better to move check to indisvalid as if (!index->indisvalid || !index->indisenabled).
Best regards,
Mikhail.
> Rebased with the latest master as well. Hi, This is a great, long needed feature. Thanks for doing this. I am late to this thread, but I took a look at the current patch and have some comments as I continue to look. 1/ instead of + If true, the index is currently enabled and should be used for queries. + If false, the index is disabled and should not be used for queries, how about? "If true, the index is currently enabled and may be used for queries. If false, the index is disabled and may not be used for queries," "may" is more accurate than "should" in this context. 2/ instead of + but is still maintained when the table is modified. Default is true. how about? "but is still updated when the table is modified. Default is true." "update" of an index is the current verb used. See bottom of https://www.postgresql.org/docs/current/indexes-intro.html 3/ instead of saying "used by the query planner for query optimization", can it just be "The index will be used for queries." + <para> + Enable the specified index. The index will be used by the query planner + for query optimization. This is the default state for newly created indexes. + </para> Same for + <listitem> + <para> + Disable the specified index. A disabled index is not used by the query planner + for query optimization, but it is still maintained when the underlying table + data changes and will still be used to enforce constraints (such as UNIQUE, + or PRIMARY KEY constraints). 4/ Should documentation recommend a direct catalog update? + to identify potentially unused indexes. Note that if you want to completely + prevent an index from being used, including for constraint enforcement, you + would need to mark it as invalid using a direct update to the system catalogs + (e.g., <literal>UPDATE pg_index SET indisvalid = false WHERE indexrelid = 'index_name'::regclass</literal>). "indisvalid" does not control constraint enforcement in this case. It will be "indisready" being set to false that will. But even then, this goes against the general principle ( also documnted ) of not updating the catalog directly. See [1] I think this part should be removed. 5/ In a case of a prepared statement, disabling the index has no effect. postgres=# create table foo ( id int primary key ); CREATE TABLE postgres=# prepare prp as select * from foo where id = 1; PREPARE postgres=# explain analyze execute prp; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Index Only Scan using foo_pkey on foo (cost=0.15..8.17 rows=1 width=4) (actual time=0.018..0.019 rows=0 loops=1) Index Cond: (id = 1) Heap Fetches: 0 Buffers: shared hit=2 Planning: Buffers: shared hit=15 read=7 Planning Time: 2.048 ms Execution Time: 0.071 ms (8 rows) postgres=# alter index foo_pkey disable ; ALTER INDEX postgres=# explain analyze execute prp; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Index Only Scan using foo_pkey on foo (cost=0.15..8.17 rows=1 width=4) (actual time=0.035..0.036 rows=0 loops=1) Index Cond: (id = 1) Heap Fetches: 0 Buffers: shared hit=2 Planning Time: 0.012 ms Execution Time: 0.320 ms (6 rows) Should this not behave like if you drop (or create) an index during a prepared statement? I have not yet looked closely at this code to see what could be done. Regards, Sami Imseih Amazon Web Services (AWS) [1] https://www.postgresql.org/docs/current/catalogs.html
On Sun, Sep 22, 2024 at 3:45 PM David Rowley <dgrowleyml@gmail.com> wrote: > I think the documents should also mention that disabling an index is a > useful way to verify an index is not being used before dropping it as > the index can be enabled again at the first sign that performance has > been effected. (It might also be good to mention that checking > pg_stat_user_indexes.idx_scan should be the first port of call when > checking for unused indexes) While reviewing Shayon's v14 patch, I had removed text (quoted below) from the ALTER INDEX docs that did not feel right in a command reference. I thought of reading up on the history/discussion of the patch, and now I see why Shayon chose to include an advice in ALTER INDEX docs. > + indexes. If performance degrades after making an index invisible, it can be easily > + be made visible using <literal>VISIBLE</literal>. Before making an index invisible, it's recommended > + to check <structname>pg_stat_user_indexes</structname>.<structfield>idx_scan</structfield> > + to identify potentially unused indexes. I feel ALTER INDEX command reference doc is not the right place for this kind of operational advice. Is there a better place in documentation for this kind of advice? Or maybe it needs to be reworded to fit the command reference style? Best regards, Gurjeet http://Gurje.et
> > + indexes. If performance degrades after making an index invisible, it can be easily > > + be made visible using <literal>VISIBLE</literal>. Before making an index invisible, it's recommended > > + to check <structname>pg_stat_user_indexes</structname>.<structfield>idx_scan</structfield> > > + to identify potentially unused indexes. > > I feel ALTER INDEX command reference doc is not the right place for this kind of > operational advice. Is there a better place in documentation for this kind of > advice? Or maybe it needs to be reworded to fit the command reference style? I agree with you. What about we add this wording in the following section [0]? This section discusses techniques for experimenting with indexes. It says, ".... A good deal of experimentation is often necessary. The rest of this section gives some tips for that:...." A discussion about invisible indexes as one of the tools for experimentation can be added here. What do you think? [0] https://www.postgresql.org/docs/current/indexes-examine.html -- Sami Imseih Amazon Web Services (AWS)
On Wed Apr 2, 2025 at 6:58 PM PDT, Sami Imseih wrote: >> > + indexes. If performance degrades after making an index invisible, it can be easily >> > + be made visible using <literal>VISIBLE</literal>. Before making an index invisible, it's recommended >> > + to check <structname>pg_stat_user_indexes</structname>.<structfield>idx_scan</structfield> >> > + to identify potentially unused indexes. >> >> I feel ALTER INDEX command reference doc is not the right place for this kind of >> operational advice. Is there a better place in documentation for this kind of >> advice? Or maybe it needs to be reworded to fit the command reference style? > > I agree with you. > > What about we add this wording in the following section [0]? This > section discusses techniques > for experimenting with indexes. It says, > ".... A good deal of experimentation is often necessary. The rest of > this section gives some tips for that:...." > > A discussion about invisible indexes as one of the tools for > experimentation can be added here. > What do you think? > > [0] https://www.postgresql.org/docs/current/indexes-examine.html That seems like a very good location for this advice. But the current set of bullet points are all directed towards "... a general procedure for determining which indexes to create". I propose that a new paragrph, not a bullet point, be added towards the end of that section which addresses the options of steps to take before dropping an index. Something like the following: Sometimes you may notice that an index is not being used anymore by the application queries. In such cases, it is a good idea to investigate if such an index can be dropped, because an index that is not being used for query optimization still consumes resources and slows down INSERT, UPDATE, and DELETE commands. To aid in such an investigation, look at the pg_stat_user_indexes.idx_scan count for the index. To determine the performance effects of dropping the index, without actually dropping the said index, you may mark the index invisible to the planner by using the ALTER INDEX ... INVISIIBLE command. If it turns out that doing so causes a performance degradation, the index can be quickly made visible to the planner for query optimization by using the ALTER INDEX ... VISIBLE command. Thoughts? Best regards, Gurjeet http://Gurje.et
> That seems like a very good location for this advice. But the current > set of bullet points are all directed towards "... a general procedure > for determining which indexes to create". I propose that a new paragrph, > not a bullet point, be added towards the end of that section which > addresses the options of steps to take before dropping an index. > Something like the following: > Thoughts? This new feature provides the ability to experiment with indexes to create ( or drop ), so I don't think it deviates from the purpose of this paragraphs. This patch will provide the ability for the user to create an index as initially invisible and a GUC, use_invisible_index if set to TRUE to experiment with the new index to see if it improves performance. So, I think providing this pattern to experiment with a new index will fit nicely as a new bulletpoint. -- Sami Imseih Amazon Web Services (AWS)
On Wed, Apr 2, 2025 at 10:53 PM Sami Imseih <samimseih@gmail.com> wrote:
> That seems like a very good location for this advice. But the current
> set of bullet points are all directed towards "... a general procedure
> for determining which indexes to create". I propose that a new paragrph,
> not a bullet point, be added towards the end of that section which
> addresses the options of steps to take before dropping an index.
> Something like the following:
> Thoughts?
This new feature provides the ability to experiment with indexes to
create ( or drop ),
so I don't think it deviates from the purpose of this paragraphs.
This patch will provide the ability for the user to create an index as initially
invisible and a GUC, use_invisible_index if set to TRUE to experiment with
the new index to see if it improves performance. So, I think providing this
pattern to experiment with a new index will fit nicely as a new bulletpoint.
Thank you for the feedback and pointers Sami and Gurjeet. Good call on [0] being a good place for operational advice. I have gone ahead and removed the advice about "pg_stat_user_indexes.idx_scan" from doc/src/sgml/ref/alter_index.sgml and updated doc/src/sgml/indices.sgml to include a new bullet point with also a reference to use_invisible_index. Let me know how it sounds and if there is any feedback.
Also, rebased.
Thank you
Shayon
Attachment
Thanks for the update! The changes in v15 look good to me. The patch does need to be rebased, and I also think you should add a tab-complete for CREATE INDEX simseih@bcd07415af92 postgresql % git diff diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 8e2eb50205e..f1853a68ccc 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3434,6 +3434,8 @@ match_previous_words(int pattern_id, !TailMatches("POLICY", MatchAny, MatchAny, MatchAny, MatchAny, MatchAny) && !TailMatches("FOR", MatchAny, MatchAny, MatchAny)) COMPLETE_WITH("("); + else if (TailMatches("*)")) + COMPLETE_WITH("VISIBLE", "INVISIBLE"); /* CREATE OR REPLACE */ else if (Matches("CREATE", "OR")) IMO, with the above in place, this patch is RFC. -- Sami Imseih Amazon Web Services (AWS)
On Mon, Apr 7, 2025 at 4:01 PM Sami Imseih <samimseih@gmail.com> wrote:
Thanks for the update!
The changes in v15 look good to me. The patch does need to be rebased,
and I also think you should add a tab-complete for CREATE INDEX
simseih@bcd07415af92 postgresql % git diff
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8e2eb50205e..f1853a68ccc 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3434,6 +3434,8 @@ match_previous_words(int pattern_id,
!TailMatches("POLICY", MatchAny, MatchAny,
MatchAny, MatchAny, MatchAny) &&
!TailMatches("FOR", MatchAny, MatchAny, MatchAny))
COMPLETE_WITH("(");
+ else if (TailMatches("*)"))
+ COMPLETE_WITH("VISIBLE", "INVISIBLE");
/* CREATE OR REPLACE */
else if (Matches("CREATE", "OR"))
IMO, with the above in place, this patch is RFC.
Thank you Sami, really appreciate it!
Attached v16 with feedback and rebased.
Thanks
Shayon
Attachment
> Attached v16 with feedback and rebased. Thanks, and I realized the original tab-complete I proposed was not entirely correct. I fixed it and also shortened the commit message. -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Mon, Apr 7, 2025 at 5:39 PM Sami Imseih <samimseih@gmail.com> wrote:
> Attached v16 with feedback and rebased.
Thanks, and I realized the original tab-complete I proposed
was not entirely correct. I fixed it and also shortened the
commit message.
I was wondering about if the check needed to be more encompassing. Your proposal definitely makes sense, thank you!
Shayon
hi. The following is a review of version 17. ATExecSetIndexVisibility if (indexForm->indisvisible != visible) { indexForm->indisvisible = visible; CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple); CacheInvalidateRelcache(heapRel); InvokeObjectPostAlterHook(IndexRelationId, indexOid, 0); CommandCounterIncrement(); } I am slightly confused. if we already used CommandCounterIncrement(); then we don't need CacheInvalidateRelcache? doc/src/sgml/ref/alter_index.sgml <para> <command>ALTER INDEX</command> changes the definition of an existing index. There are several subforms described below. Note that the lock level required may differ for each subform. An <literal>ACCESS EXCLUSIVE</literal> lock is held unless explicitly noted. When multiple subcommands are listed, the lock held will be the strictest one required from any subcommand. per the above para, we need mention that ALTER INDEX SET INVISIBLE|INVISIBLE only use ShareUpdateExclusiveLock? index_create is called in several places, most of the time, we use INDEX_CREATE_VISIBLE. if we define it as INDEX_CREATE_INVISIBLE rather than INDEX_CREATE_VISIBLE then argument flags required code changes would be less, (i didn't try it myself) Similar to get_index_isclustered, We can place GetIndexVisibility in src/backend/utils/cache/lsyscache.c, make it an extern function, so others can use it; to align with other function names, maybe rename it as get_index_visibility. create index v2_idx on v1(data) visible; is allowed, doc/src/sgml/ref/create_index.sgml <synopsis> section need to change to [ VISIBLE | INVISIBLE ] ?
hi, two more minor issues. src/bin/pg_dump/pg_dump.c if (fout->remoteVersion >= 180000) need change to if (fout->remoteVersion >= 190000) +-- Test index visibility with partitioned tables +CREATE TABLE part_tbl(id int, data text) PARTITION BY RANGE(id); +CREATE TABLE part1 PARTITION OF part_tbl FOR VALUES FROM (1) TO (100); +CREATE TABLE part2 PARTITION OF part_tbl FOR VALUES FROM (100) TO (200); +INSERT INTO part_tbl +SELECT g, 'data ' || g +FROM generate_series(1, 199) g; +CREATE INDEX idx_part_tbl ON part_tbl(data); +SELECT c.relname, i.indisvisible +FROM pg_index i +JOIN pg_class c ON i.indexrelid = c.oid +WHERE c.relname LIKE 'idx_part_tbl%' +ORDER BY c.relname; + relname | indisvisible +--------------+-------------- + idx_part_tbl | t +(1 row) + This test seems not that good? "idx_part_tbl" is the partitioned index, we also need to show each partition index pg_index.indisvisible value? we can change it to -------- CREATE TABLE part_tbl(id int, data text) PARTITION BY RANGE(id); CREATE TABLE part_tbl_1 PARTITION OF part_tbl FOR VALUES FROM (1) TO (100); CREATE TABLE part_tbl_2 PARTITION OF part_tbl FOR VALUES FROM (100) TO (200); CREATE INDEX ON part_tbl(data); SELECT c.relname, i.indisvisible FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid WHERE c.relname LIKE 'part_tbl%' ORDER BY c.relname; -----
On Thu, Apr 24, 2025 at 12:45 AM jian he <jian.universality@gmail.com> wrote:
hi.
The following is a review of version 17.
ATExecSetIndexVisibility
if (indexForm->indisvisible != visible)
{
indexForm->indisvisible = visible;
CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
CacheInvalidateRelcache(heapRel);
InvokeObjectPostAlterHook(IndexRelationId, indexOid, 0);
CommandCounterIncrement();
}
I am slightly confused. if we already used
CommandCounterIncrement();
then we don't need CacheInvalidateRelcache?
Thank you for this catch. I misunderstood the behavior of the two and was performing both to avoid inconsistency between state within a transaction and cross session, but as you pointed out CommandCounterIncrement() helps achieve both. Updated.
doc/src/sgml/ref/alter_index.sgml
<para>
<command>ALTER INDEX</command> changes the definition of an existing index.
There are several subforms described below. Note that the lock level required
may differ for each subform. An <literal>ACCESS EXCLUSIVE</literal>
lock is held
unless explicitly noted. When multiple subcommands are listed, the lock
held will be the strictest one required from any subcommand.
per the above para, we need mention that ALTER INDEX SET INVISIBLE|INVISIBLE
only use ShareUpdateExclusiveLock?
I wasn't sure at first where to add the note about ShareUpdateExclusiveLock. But it looks like we have a precedent from RENAME in doc/src/sgml/ref/alter_index.sgml, so I have done the same for VISIBLE & INVISIBLE in doc/src/sgml/ref/alter_index.sgml as well.
index_create is called in several places,
most of the time, we use INDEX_CREATE_VISIBLE.
if we define it as INDEX_CREATE_INVISIBLE rather than INDEX_CREATE_VISIBLE
then argument flags required code changes would be less, (i didn't try
it myself)
Looks like the only change we would save is the one in src/backend/catalog/toasting.c. Rest of the code change/diffs would still be needed IIUC (if I understand correctly). This approach felt a bit ergonomical, hence opted for it, but happy to update. Let me know.
Similar to get_index_isclustered,
We can place GetIndexVisibility in
src/backend/utils/cache/lsyscache.c,
make it an extern function, so others can use it;
to align with other function names,
maybe rename it as get_index_visibility.
I was a bit torn on this one and figured I wouldn't introduce it as it could be a bit of premature optimization, until there were more use cases (or maybe one more). Plus, I figured the next time we need this info, we could expose a more public function like get_index_visibility (given N=2, N being the number of callers). However, given you mentioned and spotted this as well, I have introduced get_index_visibility in the new patch now.
create index v2_idx on v1(data) visible;
is allowed,
doc/src/sgml/ref/create_index.sgml
<synopsis> section need to change to
[ VISIBLE | INVISIBLE ]
?
Updated to match the same pattern as the one in doc/src/sgml/ref/alter_index.sgml.
Thank you for the feedback. I have also updated the feedback from [1] as well. Few extra notes:
- Attached v18
- Rebased against master
- Updated the commit message
- Updated the target remote version to now be fout->remoteVersion >= 190000
- Using a UNION ALL query to show all indexes from part_tbl partitioned tables in the specs as noted in [1]. The query suggested in [1] wasn't encompassing all the indexes, hence the UNION ALL for WHERE i.indrelid = 'part_tbl'::regclass::oid.
Thank you
Shayon
Attachment
On Mon, Apr 28, 2025 at 7:23 AM Shayon Mukherjee <shayonj@gmail.com> wrote: > On Thu, Apr 24, 2025 at 12:45 AM jian he <jian.universality@gmail.com> wrote: >> > Thank you for the feedback. I have also updated the feedback from [1] as well. Few extra notes: > > - Attached v18 > - Rebased against master > - Updated the commit message > - Updated the target remote version to now be fout->remoteVersion >= 190000 > - Using a UNION ALL query to show all indexes from part_tbl partitioned tables in the specs as noted in [1]. The querysuggested in [1] wasn't encompassing all the indexes, hence the UNION ALL for WHERE i.indrelid = 'part_tbl'::regclass::oid. > Having looked at this patch, I'm a bit surprised that it would be considered for commit; not that the work hasn't been done with rigor, but the implementation seems extremely obtuse for the common use cases that have been envisioned. As a primary example, imagine you have 2 indexes and you want to test if one index can handle the load via skip scans. With this feature, in order to do that SAFELY, you would need to first figure out how to ensure that the `force_invisible_index` GUC has been set to true across all possible backends, even though there seems like a general agreement that there isn't an easy way to do this (see comments around cached plans), and to make it more complicated, this needs to occur across all downstream replicas. Only then would it be safe to run the alter index to set your index invisible, at which point you could then test at the query/session level to determine which queries will be supportable without the risk of having your server(s) tank due to overload when you start getting hundreds of queries who plan has gone sideways. Ideally you would be able to do this in the opposite fashion; start with a session level guc that allows you to test in a controlled manner, and then if that works you start to roll that out across multiple sessions, and then to multiple servers, before eventually dropping the index. But that isn't the only gap; imagine if you want to test across 3 or more indexes; with this implementation, the "use invisible" flag is all or nothing, which again makes it difficult to work with; especially if you have multiple cases within the system that might make use of this feature (and people will surely run invisible indexes for weeks in production to ensure some random monthly report doesn't come along and cause trouble). I'm also skeptical of the idea that users need a way to add invisible indexes they can then test to see if they are useful because 1) this is basically how indexes already work, meaning if you add an index and it isn't useful, it doesn't get used, and 2) we have an extension (hypopg) which arguably provides this functionality without causing a bunch of i/o, and there isn't nearly the clamor to add this functionality in to core as there is for having a way to "soft drop" indexes. TBH, with this implementation, I can see people running with all indexes set invisible and force_invisible_index set to true, just to enable simple granular control when they need it. I know this thread is rather old and there doesn't seem to be full agreement on the ALTER vs GUC implementation idea, and even though I agree with the sentiment that the GUC system is little more than the "half-baked take on planner hints", the upside of GUC first implementations is that they tend to provide better usability than most grammer related implementations. Consider that any implementation which requires the use of ALTER statements (which this one does) undercuts its own usefulness because it adds significant operational risk in any attempt to use it just by the nature of ALTER leading to system-wide (including multi-server) changes, and while it feels like we often dismiss operational risk, those are exactly the folks who need this feature the most. P.S. I really do want to thank Shayon for sticking with this; I thought about saying that up front but it felt cliche, but I do think it is important to say it. Robert Treat https://xzilla.net
On Fri, 6 Jun 2025 at 08:14, Robert Treat <rob@xzilla.net> wrote: > I know this thread is rather old and there doesn't seem to be full > agreement on the ALTER vs GUC implementation idea, and even though I > agree with the sentiment that the GUC system is little more than the > "half-baked take on planner hints", the upside of GUC first > implementations is that they tend to provide better usability than > most grammer related implementations. Consider that any implementation > which requires the use of ALTER statements (which this one does) > undercuts its own usefulness because it adds significant operational > risk in any attempt to use it just by the nature of ALTER leading to > system-wide (including multi-server) changes, and while it feels like > we often dismiss operational risk, those are exactly the folks who > need this feature the most. Thanks for weighing in. In my mind, this feature is for "I'm almost 100% certain this index isn't needed, I want to make sure I'm 100% right in a way that I can quickly fix the ensuing chaos if I'm wrong". It sounds like in your mind it's "I want to run some experiments to see if this index is needed or not". I think both have merit, but I think the former gets you closer to 100% certainty, as it'll be replicated to physical replica servers. I'd personally be looking at something like pg_stat_all_indexes instead of playing around with session-level GUC setting to figure out if an index was being used or not and I'd be looking to the ALTER TABLE once I'd seen nothing changing in pg_stat_all_indexes for some time period. I mean, what am I really going to do in session-level GUC? -- Run all possible queries that the application runs and check they're still fast? If I could do that, then I could equally just not use the GUC and look at EXPLAIN on all those queries to see if the index is picked anywhere. Maybe we need to hear from a few more people who have recently faced the dilemma of removing a seemingly unused index on a critical application. For me, I have been in this situation before. The database wasn't massive. I could likely have put the index back in 10 mins or so. However, it'd still have been nice to have something else to try before trying DROP INDEX. It's quite easy to imagine your finger hovering over the [Enter] key for a while before typing that statement when the index is large. > P.S. I really do want to thank Shayon for sticking with this; +1 David
> Thanks for weighing in. +1 > In my mind, this feature is for "I'm almost 100% certain this index > isn't needed, I want to make sure I'm 100% right in a way that I can > quickly fix the ensuing chaos if I'm wrong". This is the primary use-case. A user performs an ALTER INDEX... INVISIBLE, and they monitor the workload and pg_stat_all_indexes ( on primary and hot standbys ) until they feel confident enough to fully commit to dropping the index. This is the case that many users out there want. The bonus is the locking acquired to flip the VISIBLE/INVISIBLE flag is a ShareUpdateExclusiveLock on the index, so this operation can only be blocked by VACUUM or other ALTERs, etc, > I'm also skeptical of the idea that > users need a way to add invisible indexes they can then test to see if > they are useful because 1) this is basically how indexes already work, > meaning if you add an index and it isn't useful, it doesn't get used, The GUC will be useful for experimentation or for the safer rollout of new indexes. For example, an index can be created as INVISIBLE initially, and with use_invisible_index, one can observe how the index may impact various queries before fully committing to enabling it. Also, if we allow an index to be INVISIBLE initially, we need to provide the user with this GUC; otherwise, I can’t see why a user would want to make an index INVISIBLE initially. > and 2) we have an extension (hypopg) which arguably provides this > functionality without causing a bunch of i/o, and there isn't nearly > the clamor to add this functionality in to core as there is for having > a way to "soft drop" indexes. I have not worked much with HypoPG, but from what I understand, it works only at the EXPLAIN level. It is purely an experimentation tool. However, the proposed GUC can also be used in more places, including, pg_hint_plan ( at least with the SET hint without any changes to pg_hint_plan). > > P.S. I really do want to thank Shayon for sticking with this; > +1 +1 -- Sami Imseih Amazon Web Services (AWS)
On Thu, Jun 5, 2025 at 8:16 PM Sami Imseih <samimseih@gmail.com> wrote:
> Thanks for weighing in.
+1
> In my mind, this feature is for "I'm almost 100% certain this index
> isn't needed, I want to make sure I'm 100% right in a way that I can
> quickly fix the ensuing chaos if I'm wrong".
This is the primary use-case. A user performs an ALTER INDEX...
INVISIBLE, and they monitor the workload and pg_stat_all_indexes
( on primary and hot standbys ) until they feel confident enough
to fully commit to dropping the index. This is the case that many
users out there want.
To be blunt, the users who think they want this either aren't trying to solve the actual hard problem, or they haven't thought about how this operation needs to happen that deeply. Don't get me wrong, it would be an improvement to have some type of mechanism that can move you from almost 100% to 100%, but the real problem is how do you SAFELY get to almost 100% in the first place? You need to be able to build that confidence through smaller incremental changes to your production workload, and ALTER INDEX won't help you with that. In production, you aren't watching to see what happen with pg_stat_all_indexes, because you will first be watching pg_stat_activity to see if the plans have flipped in some way that leads to an overloaded server (extra latency, poor caching effects, extra buffers usage, etc). And the replicated bit? Sadly someone launched some big DML operation so you're waiting for that to finish so the "quick rollback" can actually get to those other servers.
> I'm also skeptical of the idea that
> users need a way to add invisible indexes they can then test to see if
> they are useful because 1) this is basically how indexes already work,
> meaning if you add an index and it isn't useful, it doesn't get used,
The GUC will be useful for experimentation or for the safer rollout of
new indexes. For example, an index can be created as INVISIBLE initially,
and with use_invisible_index, one can observe how the index may impact
various queries before fully committing to enabling it. Also, if we allow an
index to be INVISIBLE initially, we need to provide the user with this
GUC; otherwise, I can’t see why a user would want to make an
index INVISIBLE initially.
Again, I can squint enough to see the use case, but the risk with indexes is FAR greater in their removal rather than in adding new ones; and to whatever degree you think slow rolling out the generally not dangerous addition of new indexes is, it's an argument that should really speak to how much more important the ability to slow roll index removal is.
> and 2) we have an extension (hypopg) which arguably provides this
> functionality without causing a bunch of i/o, and there isn't nearly
> the clamor to add this functionality in to core as there is for having
> a way to "soft drop" indexes.
I have not worked much with HypoPG, but from what I understand,
it works only at the EXPLAIN level. It is purely an experimentation tool.
However, the proposed GUC can also be used in more places,
including, pg_hint_plan ( at least with the SET hint without any changes
to pg_hint_plan).
To be clear, the reason I bring up hypopg is that if slow rolling the addition of indexes was a significant customer problem, we'd have people clamoring for better tools to do it, and by and large we don't, and I posit that by and large that's because adding new indexes is not really that dangerous.
I'm not saying there isn't any possible use case that could be solved with the above (although mind my example of people running with all indexes and the guc always enabled; I don't think thats a sceanrio that anyone thinks should be recommended, but it will be a far more common use case given this design; and btw it wont work well with pg_hint_plan because the GUC/ALTER combo doesn't play well with multiple indexes), but more importantly, if we only solve the simple cases at the expense of the hard problem, we're doing our users a disservice.
Robert Treat
On Thu, Jun 5, 2025 at 7:32 PM Robert Treat <rob@xzilla.net> wrote:
I'm not saying there isn't any possible use case that could be solved with the above (although mind my example of people running with all indexes and the guc always enabled; I don't think thats a sceanrio that anyone thinks should be recommended, but it will be a far more common use case given this design; and btw it wont work well with pg_hint_plan because the GUC/ALTER combo doesn't play well with multiple indexes), but more importantly, if we only solve the simple cases at the expense of the hard problem, we're doing our users a disservice.
So, as proposed:
Replicate-able DDL: Enables a holistic picture but is a foot-gun for the DBA in the "revert" case.
Boolean GUC: Enables some experimentation; can be used to quickly re-enable invisible indexes that are waiting for the DDL to make them visible again. Insufficiently granular for quickly exploring various options.
The granularity issue seems overcome-able:
Multi-Valued GUC: Specify explicitly which invisible indexes to make visible, eliminating the granularity problem of the boolean option. Can provide a "pg_catalog.pg_all_indexes" magic value impossible to exist in reality that would enable the "true" boolean option - false would just be an empty setting.
The foot-gun seems safe enough to offer given the benefit the feature provides.
Even without the GUC the proposed feature seems an improvement over the status quo. The boolean I'd probably leave on the table; while a bit ugly in usage for the expected experimentation the multi-valued text GUC seems reasonable (and will effectively prohibit relying on invisible indexes generally).
Are there other alternative designs this last bit of discussion is meant to promote or are people actively voting for the status quo over the addition of the index visibility attribute? Or, maybe more properly, is index replication the dividing line here and any new feature has to make that aspect optional?
If we are going to bite on the multi-valued text GUC it could just define which indexes to ignore when planning and we'd have the local-only feature done. Which leads then to just implementing this feature (with multi-valued GUC) as the option by which the DBA can choose to apply their local GUC changes across their secondaries without having to (but can if they wish) apply the GUC change to all those machines as well.
David J.
> Don't get me wrong, it would be an improvement to have some type of > mechanism that can move you from almost 100% to 100%, but the real > problem is how do you SAFELY get to almost 100% in the first place? This big use case is precisely the "almost 100% to 100%" confidence problem. Usually, users have done their homework, they've analyzed workloads, tuned queries and maybe created a better index. Now, they see some indexes that are unused or underused. In the current state, the only option is to drop the index. But if that turns out to be a mistake, they have to rebuild it, which can be slow and disruptive. With this feature, If making the index invisible causes problems, they can quickly make it visible again without needing to rebuild anything. Also, users coming from other databases, both commercial and open source, are already used to this kind of setup: an ALTER command for visibility, plus a parameter to control whether invisible indexes are used on a per session level. So we're not inventing something new here; we're following a well-known and useful pattern that makes life easier, especially for users migrating to Postgres. I am still trying to understand. Do you think the ALTER command is not useful? or, do you think the GUC is all we need and it should be more granular? or maybe something different? -- Sami
On Fri, 6 Jun 2025 at 14:32, Robert Treat <rob@xzilla.net> wrote: > In production, you aren't watching to see what happen with pg_stat_all_indexes, because you will first be watching pg_stat_activityto see if the plans have flipped in some way that leads to an overloaded server (extra latency, poor cachingeffects, extra buffers usage, etc). And the replicated bit? Sadly someone launched some big DML operation so you'rewaiting for that to finish so the "quick rollback" can actually get to those other servers. I think you've misunderstood when you'd be looking at pg_stat_all_indexes. The time when you'd want to look at pg_stat_all_indexes is *before* you DROP INDEX and before you ALTER TABLE INVISIBLE the index. What you'd likely want to look for there are indexes that have the last_idx_scan set to something far in the past or set to NULL. I'm curious to know if you've ever had to drop an index out of production before? What did you think about when you'd just typed the DROP INDEX command and were contemplating your future? How long did you pause before pressing [Enter]? Can you list your proposed series of steps you'd recommend to a DBA wishing to remove an index, assuming this feature exists in core as you'd like it to? David
On Fri, Jun 6, 2025 at 8:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 6 Jun 2025 at 14:32, Robert Treat <rob@xzilla.net> wrote: > > In production, you aren't watching to see what happen with pg_stat_all_indexes, because you will first be watching pg_stat_activityto see if the plans have flipped in some way that leads to an overloaded server (extra latency, poor cachingeffects, extra buffers usage, etc). And the replicated bit? Sadly someone launched some big DML operation so you'rewaiting for that to finish so the "quick rollback" can actually get to those other servers. > > I think you've misunderstood when you'd be looking at > pg_stat_all_indexes. The time when you'd want to look at > pg_stat_all_indexes is *before* you DROP INDEX and before you ALTER > TABLE INVISIBLE the index. What you'd likely want to look for there > are indexes that have the last_idx_scan set to something far in the > past or set to NULL. > I guess you have never heard of the TREAT method of index management? :-D - Test for duplicate indexes - Reindex bloated indexes - Eliminate unused indexes - Add missing indexes - Tune indexes for generic queries The easy part of figuring out what to change, the hard part (sometimes) is getting those changes into production safely; that's the part I am focused on. > I'm curious to know if you've ever had to drop an index out of > production before? What did you think about when you'd just typed the > DROP INDEX command and were contemplating your future? How long did > you pause before pressing [Enter]? > ROFL... Uh... yes, I have had to do it at least a few times. So, years ago I used to say things like "I wish we had a way to make indexes invisible like they do in Oracle" on the regular; but as I worked through several different implementations and their potential effects, and had more and more exposure to more demanding Postgres installations, my thinking evolved. I spoke with Sami a bit about this off-list and he walked me through some of the Oracle documentation on this (I had, at best, forgot the specifics), which I think was helpful to better understand some of the allure of the alter index/guc method for many people who are used to it (and this current version of the implementation is very Oracle like), but it also crystalized my feeling that an Oracle-style implementation would be a red herring that can keep us from a better solution. > Can you list your proposed series of steps you'd recommend to a DBA > wishing to remove an index, assuming this feature exists in core as > you'd like it to? > Well, the series of steps differs depending on the nature of the system being managed. If you are running on a single node with normal traffic and resources, you just set the GUC to include the index you want to be invisible, wait for a few days (maybe no one runs monthly reports on this system?), take a quick look at your monitoring/stats to make sure things seem copacetic, and then you drop the index and reset the GUC. But of course the people who I am most worried about are the ones who are operating on high scale, high transaction, high connection, "mission critical" systems... ie. people operating in high risk environments, where things can go very bad very fast. Where safety considerations are a critical part of every deployment. In that type of environment, the GUC-only method enables you to control changes at very precise levels, so you can do things like: - run it ad-hoc at the session level to confirm that the explain plans you get in production match your expectations. - you can stay ad-hoc at the session level and run explain analyze and confirm acceptable performance within your workload, and see what kind of buffer impact you are going to have (typically overlooked, but a potential landmine for outages, but I'll come back to this) - because we are operating at the session level, we can then add this on a per query basis at the application level, and in really high traffic scenarios, you can use canary releases and/or feature flags to ramp up those new queries into the live system. - depending on how much risk you are concerned about, you can use this session level method across queries individually, or at some point roll it up to a user/application level. And again, we can roll it out to different users at different times if you want. - at some point when you feel confident that you have covered enough angles, you set the GUC globally and let that marinate for a few more weeks as needed. And the funny thing is, at this point, once you have the guc put in globally, and it's run for some number of weeks or months and everyone is confident, you don't actually need the ALTER INDEX part any more; you can just drop the index and be done with it. Now of course if you aren't running at this kind of scale or don't have this level of risk, you can speed run this a bit and go directly to the user level or skip right to adding it globally, so the ease of use is on par with using ALTER. But in any case where you do have elevated levels of risk, this is actually less steps (and less risk) that having to use the ALTER/guc method. Earlier I mentioned the idea of monitoring buffer impact; let's talk about that. I often hear people say that you should be doing things like confirming your explain plans in development or have some type of staging system where you do these kind of "experiments", as if a test on a secondary system could really give you absolute confidence when deploying to a system that automatically updates its settings (ie pg_stats) at semi-random times with randomly sampled values; but in any case, most people will at least agree that there is no way to match up buffer usage across machines. That means if we are making production changes that might have a significant impact on buffers, we are doing something inherently dangerous. Well, dropping an index is one of those things. Imagine a scenario where you have a large index on a column and a similar partial index on the same column, which are both used in production for different queries, and therefore taking up some amount of space within the buffer pool. When you make the partial index invisible, the index is still maintained, and therefore it likely still needs to maintain pages within the buffer pool to stay updated. However, with queries now shifting to the full index, the full index may very well need to pull in additional pages into the buffer pool that it didn't need before, and this action can cause other pages from some unknown object to get evicted. If you are lucky, this all works itself and nothing bad happens, if you aren't, you may end up with a server overloaded by latency in queries that aren't even related to the indexes you're working on. (If you have a hard time seeing it with partial indexes, the same can happen with consolidating indexes with different INCLUDE statements, and certainly will be a scenario when people look to drop indexes by way of skip-scan based plans). Now, is it possible to handle this with the ALTER/guc method? Well, you can mitigate it somewhat, but ironically to do so requires pushing out the guc part of the ALTER/guc to all the places you would have pushed out the GUC-only method, and that has to have been done BEFORE running ALTER INDEX, so what does it really buy you? I suppose while we're here, let me also make some observations about how these methods differ when dealing with replica clusters. You mentioned that one of the things you liked about the ALTER/guc method is that it replicates the changes across all systems which makes it easy to revert, however I believe that thinking is flawed. For starters, any change that has to occur across the WAL stream is not something that can be relied on to happen quickly; there are too many other items that traverse that space that could end up blocking a rollback from being applied in a timely fashion. The more complex the replica cluster, the worse this is. One very common use case is to run different workloads on different nodes, with the ALTER/guc method, you are forcing users to make changes on a primary when they want to target a workload that only runs on a replica. This means I have to account for all potential workloads on all clusters before I can safely start making changes, and to the degree that the ALTER/guc gives me a safety net, that safety net is... to deploy a guc globally, one at a time, on each individual server. I feel like this email is already long, and tbh I could go on even more, but hopefully I've covered enough to help explain some of the issues that are involved here. I'm not trying to say that GUC-only is a perfect solution, but I do think it handles every use case on par with ALTER/guc, and enables some use cases ALTER/guc can't, especially for people who have to operate in risk-first environments. And I get it that some people are going to want a thing that looks very simple or is familiar to how Oracle did it, but I can't help but think this is one of those cases like how people used to always ask us to implement UPSERT because that's what MySQL had, but instead we gave them INSERT ON CONFLICT because it was the better solution to the problem they (actually) had. Robert Treat https://xzilla.net
> In that type of environment, the GUC-only method enables you to > control changes at very precise levels, so you can do things like: > - run it ad-hoc at the session level to confirm that the explain plans > you get in production match your expectations. > - you can stay ad-hoc at the session level and run explain analyze and > confirm acceptable performance within your workload, and see what kind > of buffer impact you are going to have (typically overlooked, but a > potential landmine for outages, but I'll come back to this) > - because we are operating at the session level, we can then add this > on a per query basis at the application level, and in really high > traffic scenarios, you can use canary releases and/or feature flags to > ramp up those new queries into the live system. > - depending on how much risk you are concerned about, you can use this > session level method across queries individually, or at some point > roll it up to a user/application level. And again, we can roll it out > to different users at different times if you want. > - at some point when you feel confident that you have covered enough > angles, you set the GUC globally and let that marinate for a few more > weeks as needed. Do we need this level of granular control in core, or should this be delegated to other tools in the ecosystem, like pg_hint_plan? The de facto tool for influencing planning. There is probably some work that must happen in that extension to make the use-cases above work, but it is something to consider. With that said, I am not really opposed to a multi-value GUC that takes in a list of index names, but I do have several concerns with that approach being available in core: 1. The list of indexes getting too long, and the potential performance impact of having to translate the index name to a relid to find which index to make "invisible". I don't think a list of index relids will be good from a usability perspective either. 2. A foot-gun such as adding an index name to my list, dropping the index, recreating it with the same name, and now my new index is not being used. 3. not sync'd up with the replica, so manual work is required there. That could be seen as a positive aspect of this approach as well. 4. The above points speak on the level of maintenance required for this. > You mentioned that one of the things you liked about the ALTER/guc method > is that it replicates the changes across all systems which makes it > easy to revert, however I believe that thinking is flawed. For > starters, any change that has to occur across the WAL stream is not > something that can be relied on to happen quickly; there are too many > other items that traverse that space that could end up blocking a > rollback from being applied in a timely fashion. This is not going to be unique to this feature though. Other critical DDLs will be blocked, so this is a different problem, IMO. > but it also crystalized my > feeling that an Oracle-style implementation would be a red herring > that can keep us from a better solution. Going back to this point, I still think that the ALTER option is useful after the user's confidence is near 100% and they are ready to drop the index for good, and which also gets replicated. The GUC is useful for experimentation or for users that want to do a slow rollout of dropping an index. We can discuss whether this should be a multi-value setting or a boolean in core, or if it should be delegated to an extension. Essentially, I don't think we need to choose one or the other, but perhaps we can improve upon the GUC. -- Sami
On Sun, 8 Jun 2025 at 01:35, Robert Treat <rob@xzilla.net> wrote: > > On Fri, Jun 6, 2025 at 8:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > > Can you list your proposed series of steps you'd recommend to a DBA > > wishing to remove an index, assuming this feature exists in core as > > you'd like it to? > > > > Well, the series of steps differs depending on the nature of the > system being managed. If you are running on a single node with normal > traffic and resources, you just set the GUC to include the index you > want to be invisible, wait for a few days (maybe no one runs monthly > reports on this system?), take a quick look at your monitoring/stats > to make sure things seem copacetic, and then you drop the index and > reset the GUC. Thanks for explaining. What are your thoughts on cached plans? In this scenario, do you assume that waiting a few days means that connections get reset and prepared statements will have been replanned? Or do you think cached plans don't matter in this scenario? David
On Sat, Jun 7, 2025 at 9:17 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Sun, 8 Jun 2025 at 01:35, Robert Treat <rob@xzilla.net> wrote: > > On Fri, Jun 6, 2025 at 8:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > > > Can you list your proposed series of steps you'd recommend to a DBA > > > wishing to remove an index, assuming this feature exists in core as > > > you'd like it to? > > > > > > > Well, the series of steps differs depending on the nature of the > > system being managed. If you are running on a single node with normal > > traffic and resources, you just set the GUC to include the index you > > want to be invisible, wait for a few days (maybe no one runs monthly > > reports on this system?), take a quick look at your monitoring/stats > > to make sure things seem copacetic, and then you drop the index and > > reset the GUC. > > Thanks for explaining. > > What are your thoughts on cached plans? In this scenario, do you > assume that waiting a few days means that connections get reset and > prepared statements will have been replanned? Or do you think cached > plans don't matter in this scenario? > Heh; I did say that the GUC model wasn't perfect, so good on you for getting right to one of the more wonky parts. In practice, I actually don't think it matters as much as one might think; IME there is a sort of inverse relationship were the more sensitive you are to production changes and/or running at high scale, the more likely you are going to want to slow deploy / ramp up these changes, and doing things like adding the GUC at the session level will likely require a connection recycle anyway. Also keeping invisible indexes in place for days or weeks is likely to be a common scenario, and again we don't normally expect connections, or cached plans, to stay alive for weeks at a time. Of course you can't dismiss this; you'd definitely have to document that if they are worried about queries with cached plans the best solution would be to recycle any connections that might have existed before setting the guc in place. That may not sound ideal, but I think in practice it is no worse than the practical effects of thinking that ANALYZE will help keep your queries fast; sure it keeps your statistics up to date, but if you are running cached plans for indefinite periods of time, you wouldn't actually pick those up those statistics changes*, which means cached plans are already susceptible to degrading over time, and we are expecting people to recycle connections regularly even if we don't say it very loud. * As an aside, I once looked into implementing some kind of pg_invalidate_cached_plans() function that would send a signal to all backend to dump their plans; kind of like a global DISCARD ALL, but it always seemed scarier than just recycling connections, so I gave up on it pretty quick; maybe some would find that useful though? Robert Treat https://xzilla.net
On Mon, 9 Jun 2025 at 06:53, Robert Treat <rob@xzilla.net> wrote: > > On Sat, Jun 7, 2025 at 9:17 PM David Rowley <dgrowleyml@gmail.com> wrote: > > What are your thoughts on cached plans? In this scenario, do you > > assume that waiting a few days means that connections get reset and > > prepared statements will have been replanned? Or do you think cached > > plans don't matter in this scenario? > > > > Heh; I did say that the GUC model wasn't perfect, so good on you for > getting right to one of the more wonky parts. In practice, I actually > don't think it matters as much as one might think; IME there is a sort > of inverse relationship were the more sensitive you are to production > changes and/or running at high scale, the more likely you are going to > want to slow deploy / ramp up these changes, and doing things like > adding the GUC at the session level will likely require a connection > recycle anyway. Also keeping invisible indexes in place for days or > weeks is likely to be a common scenario, and again we don't normally > expect connections, or cached plans, to stay alive for weeks at a > time. Of course you can't dismiss this; you'd definitely have to > document that if they are worried about queries with cached plans the > best solution would be to recycle any connections that might have > existed before setting the guc in place. That may not sound ideal, but > I think in practice it is no worse than the practical effects of > thinking that ANALYZE will help keep your queries fast; sure it keeps > your statistics up to date, but if you are running cached plans for > indefinite periods of time, you wouldn't actually pick those up those > statistics changes*, which means cached plans are already susceptible > to degrading over time, and we are expecting people to recycle > connections regularly even if we don't say it very loud. I agree that it doesn't seem ideal. I feel like if we're adding a feature that we have to list a bunch of caveats in the documentation, then we're doing something wrong. BTW, the ALTER INDEX will correctly invalidate cached plans and does not suffer from the same issue. My thoughts on this are that extensions are a better place to keep solutions that work most of the time. Once you start committing quirky things to Postgres, you sentence yourself to answering the same question for possibly a few decades in the -bugs or -general mailing list. I do my best to avoid that and feel we have enough of that already, so I'm -1 on the GUC solution for this. I know there are a few other people that are for it, so feel free to listen to them instead. Personally, I'd rather see us getting query hints in core and having some method to specify a global hint to hint "not using index X". I'm not holding my breath for that one. David
On Sun, Jun 8, 2025 at 9:37 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Mon, 9 Jun 2025 at 06:53, Robert Treat <rob@xzilla.net> wrote: > > On Sat, Jun 7, 2025 at 9:17 PM David Rowley <dgrowleyml@gmail.com> wrote: > > > What are your thoughts on cached plans? In this scenario, do you > > > assume that waiting a few days means that connections get reset and > > > prepared statements will have been replanned? Or do you think cached > > > plans don't matter in this scenario? > > > > > > > Heh; I did say that the GUC model wasn't perfect, so good on you for > > getting right to one of the more wonky parts. In practice, I actually > > don't think it matters as much as one might think; IME there is a sort > > of inverse relationship were the more sensitive you are to production > > changes and/or running at high scale, the more likely you are going to > > want to slow deploy / ramp up these changes, and doing things like > > adding the GUC at the session level will likely require a connection > > recycle anyway. Also keeping invisible indexes in place for days or > > weeks is likely to be a common scenario, and again we don't normally > > expect connections, or cached plans, to stay alive for weeks at a > > time. Of course you can't dismiss this; you'd definitely have to > > document that if they are worried about queries with cached plans the > > best solution would be to recycle any connections that might have > > existed before setting the guc in place. That may not sound ideal, but > > I think in practice it is no worse than the practical effects of > > thinking that ANALYZE will help keep your queries fast; sure it keeps > > your statistics up to date, but if you are running cached plans for > > indefinite periods of time, you wouldn't actually pick those up those > > statistics changes*, which means cached plans are already susceptible > > to degrading over time, and we are expecting people to recycle > > connections regularly even if we don't say it very loud. > > I agree that it doesn't seem ideal. I feel like if we're adding a > feature that we have to list a bunch of caveats in the documentation, > then we're doing something wrong. BTW, the ALTER INDEX will correctly > invalidate cached plans and does not suffer from the same issue. > While the ALTER INDEX provides a simple way to do cache invalidation, for practical application you still have most of the same issues and need to jump through many of the same guc hoops with force_invisible_index, which is a large part of why this is such a red herring. > My thoughts on this are that extensions are a better place to keep > solutions that work most of the time. Once you start committing quirky > things to Postgres, you sentence yourself to answering the same > question for possibly a few decades in the -bugs or -general mailing > list. I do my best to avoid that and feel we have enough of that > already, so I'm -1 on the GUC solution for this. I know there are a > few other people that are for it, so feel free to listen to them > instead. > I hear you wrt explaining quirky things to users; you wouldn't believe the level of confusion I got when I started explaining "plan_cache_mode" to users when v12 rolled out. I'd guess the vast majority of users have still never heard of this guc and have no idea that Postgres behaves like this, which is another reason why I'd rather not optimize for a very small segment of the user base at the expense of a much larger set of users. And to be clear, this isn't a case of a GUC solution vs an ALTER solution. There is a reason that the proposed ALTER solution contains a GUC as well, and why Oracle had to make use of a session flag in their implementation. You are going to have a guc either way, which means you are going to have to explain a bunch of these different caveats in BOTH solutions. It's just that in one of the solutions, you are further entangling the usage with DDL changes (and the additional caveats that come with that). Robert Treat https://xzilla.net
On Tue, 10 Jun 2025 at 04:40, Robert Treat <rob@xzilla.net> wrote: > You are going to have a guc either way, which > means you are going to have to explain a bunch of these different > caveats in BOTH solutions. It's just that in one of the solutions, you > are further entangling the usage with DDL changes (and the additional > caveats that come with that). IMO, having this GUC to force the use of invisible indexes is quite strange. In my view, it detracts from the guarantees that you're meant to get from disabling indexes. What if some connection has use_invisible_index set to true? The DBA might assume all is well after having seen nobody complain and then drop the index. The user might then complain. David