Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX - Mailing list pgsql-hackers
From | Robert Treat |
---|---|
Subject | Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX |
Date | |
Msg-id | CAJSLCQ0JbqEk+iofrTpKaOgUhWr1t1zRt0kV=yNyVGo_xMGtFg@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX (Shayon Mukherjee <shayonj@gmail.com>) |
List | pgsql-hackers |
On Tue, Jul 15, 2025 at 8:19 AM Shayon Mukherjee <shayonj@gmail.com> wrote: > On Jun 23, 2025, at 10:14 AM, Robert Treat <rob@xzilla.net> wrote: > Glad to hear you are still interested, slightly disheartened by the > general lack of concern around operational safety in this thread. I > actually think what you have done covers a lot of the ground for > multiple implementations, so I'm optimistic we can get something for > 19. > > Just for my own learning and mental model - what would be a good way to understand the change that wasn’t operationallysafe? > Generally speaking, the two biggest factors for operational safety are the ability to slowly ramp up changes in a controlled fashion, and conversely the ability to quickly reverse them. On its surface, the ALTER feature looks like it passes both of these tests because (in simple cases) it appears better than drop/create index process alone; indeed, the ability to "turn off" an index before dropping it feels like a slower roll out than dropping it, and the ability to "turn it back on" seems like a much quicker reversal than having to recreate the index. Our problem is that this only gives the appearance of safety without having provided any significant improvement in system safety, especially in more complex and/or demanding setups. With regards to roll out specifically, the ALTER method is no safer than drop index because both use DDL which means they are both open to blocking or being blocked by conflicting queries, which increase operational risk within the system. Similarly, the nature of the DDL change also requires that all sessions be impacted everywhere at once; there is no way to slowly roll the change to some segment of the database or some specific workload within the system. So it fails the first test. With regards to the ability to quickly reverse the change, it does beat the need to rebuild an index, but that only helps in a very small subset of the typical use cases for this feature; ie where you are concerned that your server might get "swamped" by poorly performing queries while the index rebuilds. But that's a pretty low level version of the problem; on very busy systems and/or system with delicately balanced buffer caching, even a small pause measured in seconds could be enough to bring a system down, and again our use of DDL opens us up to delays from conflicting queries, untimely wraparound vacuums, concurrent WAL traffic in the case of wanting to do this across replica trees (which you can't not do). So we generally fail the second test for a large portion of the use cases involved. And maybe that would be ok if we didn't have a way to solve this problem that doesn't fail these tests, but we do, which is through using a GUC. > I was thinking about this some more over the weekend and it does seem > like you can't get away from doing something with DDL; even though it > is the wrong mental model... like when your AC is running but you > don't think it is cool enough, so you turn it down farther, as if it > would blow colder air... but that isn't how AC actually work... it > seems you can't eliminate the desire for this mental model entirely. > Which to be clear, I am not against, it's just a bad tool for the hard > cases, but not in every case. Anyway, if I were picking this up, I > would separate out the two ideas; as I laid out in my email to David, > the GUC solution can stand on it's own without the DDL implementation, > and I would do that first, and then add a simplified DDL > implementation after the fact. Of course it could be done the other > way around, but I think you're more likely to land on the correct GUC > implementation if it isn't mixed up with DDL, and the best way to > assure that is by not having the DDL for the initial patch. Just my > .02, but happy to help spec it out further. > > > I am happy to split this into two, however I think starting with GUC first may not achieve a lot of cases that David andI were talking about earlier in the thread, perhaps? Where, if you want quick feedback without needing to make application/ session / connection level changes (i.e GUC) then you can quickly do it via the ALTER statement. Happy to redothe patch and just keep ALTER for v1 accordingly, if it still makes sense. > I think it is much more the other way around; the GUC handles far more of the potential use cases that you might want to use the ALTER for, and the ALTER clearly falls short of what the GUC can do. (Side note, remember you can modify the GUC at the database level. And if you really want to get ambitious, GUCs can be extended to work through ALTER TABLE). > Would folks have any preference between the two approaches? > Contrary to how it sounds, I'm not actually opposed to having both :-) But I am very concerned that an implementation which does ALTER first sets a sort of anchoring bias that would affect how the GUC feature gets implemented, which is how I suspect Oracle ended up with their crappy implementation. I don't think this happens in reverse; ie. the GUC first implementation handles most of the heavy lifting so the ALTER only needs to cover the suite spot of the use cases that it can actually help with. Robert Treat https://xzilla.net
pgsql-hackers by date: