Thread: Allow deleting enumerated values from an existing enumerated data type
Greetings, everyone!
I would like to offer my patch on the problem of removing values from enums
It adds support for expression ALTER TYPE <enum_name> DROP VALUE <value_name>
Added:
1. expression in grammar
2. function to drop enum values
3. regression tests
4. documentation
It adds support for expression ALTER TYPE <enum_name> DROP VALUE <value_name>
Added:
1. expression in grammar
2. function to drop enum values
3. regression tests
4. documentation
Attachment
On 9/28/23 14:13, Данил Столповских wrote: > Greetings, everyone! > I would like to offer my patch on the problem of removing values from enums > > It adds support for expression ALTER TYPE <enum_name> DROP VALUE > <value_name> > > Added: > 1. expression in grammar > 2. function to drop enum values > 3. regression tests > 4. documentation Thanks for this patch that a lot of people want. However, it does not seem to address the issue of how to handle the dropped value being in the high key of an index. Until we solve that problem (and maybe others), this kind of patch is insufficient to add the feature. -- Vik Fearing
=?UTF-8?B?0JTQsNC90LjQuyDQodGC0L7Qu9C/0L7QstGB0LrQuNGF?= <danil.stolpovskikh@gmail.com> writes: > I would like to offer my patch on the problem of removing values from enums > It adds support for expression ALTER TYPE <enum_name> DROP VALUE > <value_name> This does not fix any of the hard problems that caused us not to have such a feature to begin with. Notably, what happens to stored data of the enum type if it is a now-deleted value? regards, tom lane
Re: Allow deleting enumerated values from an existing enumerated data type
From
Andrew Dunstan
Date:
On 2023-09-28 Th 10:28, Tom Lane wrote: > =?UTF-8?B?0JTQsNC90LjQuyDQodGC0L7Qu9C/0L7QstGB0LrQuNGF?= <danil.stolpovskikh@gmail.com> writes: >> I would like to offer my patch on the problem of removing values from enums >> It adds support for expression ALTER TYPE <enum_name> DROP VALUE >> <value_name> > This does not fix any of the hard problems that caused us not to > have such a feature to begin with. Notably, what happens to > stored data of the enum type if it is a now-deleted value? > > I wonder if we could have a boolean flag in pg_enum, indicating that setting an enum to that value was forbidden. That wouldn't delete the value but it wouldn't show up in enum_range and friends. We'd have to teach pg_dump and pg_upgrade to deal with it, but that shouldn't be too hard. Perhaps the command could be something like ALTER TYPE enum_name DISABLE value; cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > I wonder if we could have a boolean flag in pg_enum, indicating that > setting an enum to that value was forbidden. Yeah, but that still offers no coherent solution to the problem of what happens if there's a table that already contains such a value. It doesn't seem terribly useful to forbid new entries if you can't get rid of old ones. Admittedly, a DISABLE flag would at least offer a chance at a race-condition-free scan to verify that no such values remain in tables. But as somebody already mentioned upthread, that wouldn't guarantee that the value doesn't appear in non-leaf index pages. So basically you could never get rid of the pg_enum row, short of a full dump and restore. We went through all these points years ago when the enum feature was first developed, as I recall. Nobody thought that the ability to remove an enum value was worth the amount of complexity it'd entail. regards, tom lane
On 9/28/23 20:46, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I wonder if we could have a boolean flag in pg_enum, indicating that >> setting an enum to that value was forbidden. > > Yeah, but that still offers no coherent solution to the problem of > what happens if there's a table that already contains such a value. > It doesn't seem terribly useful to forbid new entries if you can't > get rid of old ones. > > Admittedly, a DISABLE flag would at least offer a chance at a > race-condition-free scan to verify that no such values remain > in tables. But as somebody already mentioned upthread, that > wouldn't guarantee that the value doesn't appear in non-leaf > index pages. So basically you could never get rid of the pg_enum > row, short of a full dump and restore. > > We went through all these points years ago when the enum feature > was first developed, as I recall. Nobody thought that the ability > to remove an enum value was worth the amount of complexity it'd > entail. This issue comes up regularly (although far from often). Do we want to put some comments right where would-be implementors would be sure to see it? Attached is an example of what I mean. Documentation is intentionally omitted. -- Vik Fearing
Attachment
Vik Fearing <vik@postgresfriends.org> writes: > On 9/28/23 20:46, Tom Lane wrote: >> We went through all these points years ago when the enum feature >> was first developed, as I recall. Nobody thought that the ability >> to remove an enum value was worth the amount of complexity it'd >> entail. > This issue comes up regularly (although far from often). Do we want to > put some comments right where would-be implementors would be sure to see it? Perhaps. I'd be kind of inclined to leave the "yet" out of "not yet implemented" in the error message, as that wording sounds like we just haven't got round to it. regards, tom lane
Re: Allow deleting enumerated values from an existing enumerated data type
From
Andrew Dunstan
Date:
On 2023-09-28 Th 14:46, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I wonder if we could have a boolean flag in pg_enum, indicating that >> setting an enum to that value was forbidden. > Yeah, but that still offers no coherent solution to the problem of > what happens if there's a table that already contains such a value. > It doesn't seem terribly useful to forbid new entries if you can't > get rid of old ones. > > Admittedly, a DISABLE flag would at least offer a chance at a > race-condition-free scan to verify that no such values remain > in tables. But as somebody already mentioned upthread, that > wouldn't guarantee that the value doesn't appear in non-leaf > index pages. So basically you could never get rid of the pg_enum > row, short of a full dump and restore. or a reindex, I think, although getting the timing right would be messy. I agree the non-leaf index pages are rather pesky in dealing with this. I guess the alternative would be to create a new enum with the to-be-deleted value missing, and then alter the column type to the new enum type. For massive tables that would be painful. > > We went through all these points years ago when the enum feature > was first developed, as I recall. Nobody thought that the ability > to remove an enum value was worth the amount of complexity it'd > entail. > > That's quite true, and I accept my part in this history. But I'm not sure we were correct back then. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 9/29/23 03:17, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> On 9/28/23 20:46, Tom Lane wrote: >>> We went through all these points years ago when the enum feature >>> was first developed, as I recall. Nobody thought that the ability >>> to remove an enum value was worth the amount of complexity it'd >>> entail. > >> This issue comes up regularly (although far from often). Do we want to >> put some comments right where would-be implementors would be sure to see it? > > Perhaps. I'd be kind of inclined to leave the "yet" out of "not yet > implemented" in the error message, as that wording sounds like we just > haven't got round to it. I see your point, but should we be dissuading people who might want to work on solving those problems? I intentionally did not document that this syntax exists so the only people seeing the message are those who just try it, and those wanting to write a patch like Danil did. No one except you has said anything about this patch. I think it would be good to commit it, wordsmithing aside. -- Vik Fearing
Re: Allow deleting enumerated values from an existing enumerated data type
From
Dagfinn Ilmari Mannsåker
Date:
Vik Fearing <vik@postgresfriends.org> writes: > On 9/29/23 03:17, Tom Lane wrote: >> Vik Fearing <vik@postgresfriends.org> writes: >>> On 9/28/23 20:46, Tom Lane wrote: >>>> We went through all these points years ago when the enum feature >>>> was first developed, as I recall. Nobody thought that the ability >>>> to remove an enum value was worth the amount of complexity it'd >>>> entail. >> >>> This issue comes up regularly (although far from often). Do we want to >>> put some comments right where would-be implementors would be sure to see it? >> Perhaps. I'd be kind of inclined to leave the "yet" out of "not yet >> implemented" in the error message, as that wording sounds like we just >> haven't got round to it. > > I see your point, but should we be dissuading people who might want to > work on solving those problems? I intentionally did not document that > this syntax exists so the only people seeing the message are those who > just try it, and those wanting to write a patch like Danil did. > > No one except you has said anything about this patch. I think it would > be good to commit it, wordsmithing aside. FWIW I'm +1 on this patch, and with Tom on dropping the "yet". To me it makes it sound like we intend to implement it soon (fsvo). - ilmari
Andrew Dunstan <andrew@dunslane.net> writes: > On 2023-09-28 Th 14:46, Tom Lane wrote: >> We went through all these points years ago when the enum feature >> was first developed, as I recall. Nobody thought that the ability >> to remove an enum value was worth the amount of complexity it'd >> entail. > That's quite true, and I accept my part in this history. But I'm not > sure we were correct back then. I think it was the right decision at the time, given that the alternative was to not add the enum feature at all. The question is whether we're now prepared to do additional work to support DROP VALUE. But the tradeoff still looks pretty grim, because the problems haven't gotten any easier. I've been trying to convince myself that there'd be some value in your idea about a DISABLE flag, but I feel like there's something missing there. The easiest implementation would be to have enum_in() reject disabled values, while still allowing enum_out() to print them. But that doesn't seem to lead to nice results: * You couldn't do, say, SELECT * FROM my_table WHERE enum_col = 'disabled_value' to look for rows that you need to clean up. I guess this'd work: SELECT * FROM my_table WHERE enum_col::text = 'disabled_value' but it's un-obvious and could not use an index on enum_col. * If any of the disabled values remain, dump/restore would fail. Maybe you'd want that to be sure you got rid of them, but it sounds like a foot-gun. ("What do you mean, our only backup doesn't restore?") Probably people would wish for two different behaviors: either don't list disabled values at all in the dumped CREATE TYPE, or do list them but disable them only after loading data. The latter approach will still have problems in data-only restores, but there are comparable hazards with things like foreign keys. (pg_upgrade would need still a third behavior, perhaps.) On the whole this is still a long way from a clean easy-to-use DROP facility, and it adds a lot of complexity of its own for pg_dump. So I'm not sure we want to build it. regards, tom lane
On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote: > Vik Fearing <vik@postgresfriends.org> writes: > >> No one except you has said anything about this patch. I think it would >> be good to commit it, wordsmithing aside. > > FWIW I'm +1 on this patch, Thanks. > and with Tom on dropping the "yet". To me it > makes it sound like we intend to implement it soon (fsvo). I am not fundamentally opposed to it, nor to any other wordsmithing the committer (probably Tom) wants to do. The main point of the patch is to list at least some of the problems that need to be solved in a correct implementation. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote: >> FWIW I'm +1 on this patch, > Thanks. >> and with Tom on dropping the "yet". To me it >> makes it sound like we intend to implement it soon (fsvo). > I am not fundamentally opposed to it, nor to any other wordsmithing the > committer (probably Tom) wants to do. The main point of the patch is to > list at least some of the problems that need to be solved in a correct > implementation. Pushed with a bit more work on the text. I left out the regression test, as it seems like it'd add test cycles to little purpose. It won't do anything to improve the odds that someone finds this text. regards, tom lane
On 10/3/23 17:44, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote: >>> FWIW I'm +1 on this patch, > >> Thanks. > >>> and with Tom on dropping the "yet". To me it >>> makes it sound like we intend to implement it soon (fsvo). > >> I am not fundamentally opposed to it, nor to any other wordsmithing the >> committer (probably Tom) wants to do. The main point of the patch is to >> list at least some of the problems that need to be solved in a correct >> implementation. > > Pushed with a bit more work on the text. > > I left out the regression test, as it seems like it'd add test cycles > to little purpose. It won't do anything to improve the odds that > someone finds this text. Thanks! -- Vik Fearing
Re: Allow deleting enumerated values from an existing enumerated data type
From
Matthias van de Meent
Date:
On Tue, 3 Oct 2023 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > On 2023-09-28 Th 14:46, Tom Lane wrote: > >> We went through all these points years ago when the enum feature > >> was first developed, as I recall. Nobody thought that the ability > >> to remove an enum value was worth the amount of complexity it'd > >> entail. > > > That's quite true, and I accept my part in this history. But I'm not > > sure we were correct back then. > > I think it was the right decision at the time, given that the > alternative was to not add the enum feature at all. The question > is whether we're now prepared to do additional work to support DROP > VALUE. But the tradeoff still looks pretty grim, because the > problems haven't gotten any easier. > > I've been trying to convince myself that there'd be some value in > your idea about a DISABLE flag, but I feel like there's something > missing there. The easiest implementation would be to have > enum_in() reject disabled values, while still allowing enum_out() > to print them. But that doesn't seem to lead to nice results: > > [...] > > On the whole this is still a long way from a clean easy-to-use DROP > facility, and it adds a lot of complexity of its own for pg_dump. > So I'm not sure we want to build it. I don't quite get what the hard problem is that we haven't already solved for other systems: We already can add additional constraints to domains (e.g. VALUE::int <> 4), which (according to docs) scan existing data columns for violations. We already drop columns without rewriting the table to remove the column's data, and reject new data insertions for those still-in-the-catalogs-but-inaccessible columns. So, if a user wants to drop an enum value, why couldn't we "just" use the DOMAIN facilities and 1.) add a constraint WHERE value NOT IN (deleted_values), and after validation of that constraint 2.) mark the enum value as deleted like we do with table column's pg_attribute entries? The only real issue that I can think of is making sure that concurrent backends don't modify this data, but that shouldn't be very different from the other locks we already have to take in e.g. ALTER TYPE ... DROP ATTRIBUTE. Kind regards, Matthias van de Meent
Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > I don't quite get what the hard problem is that we haven't already > solved for other systems: > We already can add additional constraints to domains (e.g. VALUE::int > <> 4), which (according to docs) scan existing data columns for > violations. That's "solved" only for rather small values of "solved". While the code does try to look for violations of the new constraint, there is no interlock against race conditions (ie, concurrent insertions of a conflicting value). It doesn't check temp tables belonging to other backends, because it can't. And IIRC it doesn't look for instances in metadata, such as stored views. The reason we've considered that Good Enough(TM) for domain constraints is that if something does sneak through those loopholes, nothing terribly surprising happens. You've got a value there that shouldn't be there, but that's mostly your own fault, and the system continues to behave sanely. Also you don't have any problem introspecting what you've got, because such values will still print normally. Plus, we can't really guarantee that users won't get into such a state in other ways, for example if their constraint isn't really immutable. The problem with dropping an enum value is that surprising things might very well happen, because removal of the pg_enum row risks comparisons failing if they involve the dropped value. Thus for example you might find yourself with a broken index that fails all insertion and search attempts, even if you'd carefully removed every user-visible instance of the doomed value: an instance of it high up in the index tree will break most index accesses. Even for values in user-visible places like views, the fact that enum_out will fail doesn't make it any easier to figure out what is wrong. We might be able to get to a place where the surprise factor is low enough to tolerate, but the domain-constraint precedent isn't good enough for that IMO. Andrew's idea of DISABLE rather than full DROP is one way of ameliorating these problems: comparisons would still work, and we can still print a value that perhaps shouldn't have been there. But it's not without other problems. > We already drop columns without rewriting the table to > remove the column's data, and reject new data insertions for those > still-in-the-catalogs-but-inaccessible columns. Those cases don't seem to have a lot of connection to the enum problem. > The only real issue that I can think of is making sure that concurrent > backends don't modify this data, but that shouldn't be very different > from the other locks we already have to take in e.g. ALTER TYPE ... > DROP ATTRIBUTE. I'd bet a good deal of money that those cases aren't too bulletproof. We do not lock types simply because somebody has a value of the type in flight somewhere in a query, and the cost of doing so would be quite discouraging I fear. On the whole, I'd rather accept the idea that the DROP might not be completely watertight; but then we have to work out the details of coping with orphaned values in an acceptable way. regards, tom lane