Thread: Allow deleting enumerated values from an existing enumerated data type

Allow deleting enumerated values from an existing enumerated data type

From
Данил Столповских
Date:
Greetings, everyone!
I would like to offer my patch on the problem of removing values from enums

It adds support for expression ALTER TYPE <enum_name> DROP VALUE <value_name>

Added:
1. expression in grammar
2. function to drop enum values
3. regression tests
4. documentation
Attachment

Re: Allow deleting enumerated values from an existing enumerated data type

From
Vik Fearing
Date:
On 9/28/23 14:13, Данил Столповских wrote:
> Greetings, everyone!
> I would like to offer my patch on the problem of removing values from enums
> 
> It adds support for expression ALTER TYPE <enum_name> DROP VALUE
> <value_name>
> 
> Added:
> 1. expression in grammar
> 2. function to drop enum values
> 3. regression tests
> 4. documentation

Thanks for this patch that a lot of people want.

However, it does not seem to address the issue of how to handle the 
dropped value being in the high key of an index.  Until we solve that 
problem (and maybe others), this kind of patch is insufficient to add 
the feature.
-- 
Vik Fearing




Re: Allow deleting enumerated values from an existing enumerated data type

From
Tom Lane
Date:
=?UTF-8?B?0JTQsNC90LjQuyDQodGC0L7Qu9C/0L7QstGB0LrQuNGF?= <danil.stolpovskikh@gmail.com> writes:
> I would like to offer my patch on the problem of removing values from enums
> It adds support for expression ALTER TYPE <enum_name> DROP VALUE
> <value_name>

This does not fix any of the hard problems that caused us not to
have such a feature to begin with.  Notably, what happens to
stored data of the enum type if it is a now-deleted value?

            regards, tom lane



Re: Allow deleting enumerated values from an existing enumerated data type

From
Andrew Dunstan
Date:
On 2023-09-28 Th 10:28, Tom Lane wrote:
> =?UTF-8?B?0JTQsNC90LjQuyDQodGC0L7Qu9C/0L7QstGB0LrQuNGF?= <danil.stolpovskikh@gmail.com> writes:
>> I would like to offer my patch on the problem of removing values from enums
>> It adds support for expression ALTER TYPE <enum_name> DROP VALUE
>> <value_name>
> This does not fix any of the hard problems that caused us not to
> have such a feature to begin with.  Notably, what happens to
> stored data of the enum type if it is a now-deleted value?
>
>             


I wonder if we could have a boolean flag in pg_enum, indicating that 
setting an enum to that value was forbidden. That wouldn't delete the 
value but it wouldn't show up in enum_range and friends. We'd have to 
teach pg_dump and pg_upgrade to deal with it, but that shouldn't be too 
hard.


Perhaps the command could be something like


ALTER TYPE enum_name DISABLE value;


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Allow deleting enumerated values from an existing enumerated data type

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I wonder if we could have a boolean flag in pg_enum, indicating that 
> setting an enum to that value was forbidden.

Yeah, but that still offers no coherent solution to the problem of
what happens if there's a table that already contains such a value.
It doesn't seem terribly useful to forbid new entries if you can't
get rid of old ones.

Admittedly, a DISABLE flag would at least offer a chance at a
race-condition-free scan to verify that no such values remain
in tables.  But as somebody already mentioned upthread, that
wouldn't guarantee that the value doesn't appear in non-leaf
index pages.  So basically you could never get rid of the pg_enum
row, short of a full dump and restore.

We went through all these points years ago when the enum feature
was first developed, as I recall.  Nobody thought that the ability
to remove an enum value was worth the amount of complexity it'd
entail.

            regards, tom lane



Re: Allow deleting enumerated values from an existing enumerated data type

From
Vik Fearing
Date:
On 9/28/23 20:46, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I wonder if we could have a boolean flag in pg_enum, indicating that
>> setting an enum to that value was forbidden.
> 
> Yeah, but that still offers no coherent solution to the problem of
> what happens if there's a table that already contains such a value.
> It doesn't seem terribly useful to forbid new entries if you can't
> get rid of old ones.
> 
> Admittedly, a DISABLE flag would at least offer a chance at a
> race-condition-free scan to verify that no such values remain
> in tables.  But as somebody already mentioned upthread, that
> wouldn't guarantee that the value doesn't appear in non-leaf
> index pages.  So basically you could never get rid of the pg_enum
> row, short of a full dump and restore.
> 
> We went through all these points years ago when the enum feature
> was first developed, as I recall.  Nobody thought that the ability
> to remove an enum value was worth the amount of complexity it'd
> entail.

This issue comes up regularly (although far from often).  Do we want to 
put some comments right where would-be implementors would be sure to see it?

Attached is an example of what I mean.  Documentation is intentionally 
omitted.
-- 
Vik Fearing

Attachment

Re: Allow deleting enumerated values from an existing enumerated data type

From
Tom Lane
Date:
Vik Fearing <vik@postgresfriends.org> writes:
> On 9/28/23 20:46, Tom Lane wrote:
>> We went through all these points years ago when the enum feature
>> was first developed, as I recall.  Nobody thought that the ability
>> to remove an enum value was worth the amount of complexity it'd
>> entail.

> This issue comes up regularly (although far from often).  Do we want to 
> put some comments right where would-be implementors would be sure to see it?

Perhaps.  I'd be kind of inclined to leave the "yet" out of "not yet
implemented" in the error message, as that wording sounds like we just
haven't got round to it.

            regards, tom lane



Re: Allow deleting enumerated values from an existing enumerated data type

From
Andrew Dunstan
Date:
On 2023-09-28 Th 14:46, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I wonder if we could have a boolean flag in pg_enum, indicating that
>> setting an enum to that value was forbidden.
> Yeah, but that still offers no coherent solution to the problem of
> what happens if there's a table that already contains such a value.
> It doesn't seem terribly useful to forbid new entries if you can't
> get rid of old ones.
>
> Admittedly, a DISABLE flag would at least offer a chance at a
> race-condition-free scan to verify that no such values remain
> in tables.  But as somebody already mentioned upthread, that
> wouldn't guarantee that the value doesn't appear in non-leaf
> index pages.  So basically you could never get rid of the pg_enum
> row, short of a full dump and restore.


or a reindex, I think, although getting the timing right would be messy. 
I agree the non-leaf index pages are rather pesky in dealing with this.

I guess the alternative would be to create a new enum with the 
to-be-deleted value missing, and then alter the column type to the new 
enum type. For massive tables that would be painful.


>
> We went through all these points years ago when the enum feature
> was first developed, as I recall.  Nobody thought that the ability
> to remove an enum value was worth the amount of complexity it'd
> entail.
>
>     


That's quite true, and I accept my part in this history. But I'm not 
sure we were correct back then.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Allow deleting enumerated values from an existing enumerated data type

From
Vik Fearing
Date:
On 9/29/23 03:17, Tom Lane wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>> On 9/28/23 20:46, Tom Lane wrote:
>>> We went through all these points years ago when the enum feature
>>> was first developed, as I recall.  Nobody thought that the ability
>>> to remove an enum value was worth the amount of complexity it'd
>>> entail.
> 
>> This issue comes up regularly (although far from often).  Do we want to
>> put some comments right where would-be implementors would be sure to see it?
> 
> Perhaps.  I'd be kind of inclined to leave the "yet" out of "not yet
> implemented" in the error message, as that wording sounds like we just
> haven't got round to it.

I see your point, but should we be dissuading people who might want to 
work on solving those problems?  I intentionally did not document that 
this syntax exists so the only people seeing the message are those who 
just try it, and those wanting to write a patch like Danil did.

No one except you has said anything about this patch.  I think it would 
be good to commit it, wordsmithing aside.
-- 
Vik Fearing




Re: Allow deleting enumerated values from an existing enumerated data type

From
Dagfinn Ilmari Mannsåker
Date:
Vik Fearing <vik@postgresfriends.org> writes:

> On 9/29/23 03:17, Tom Lane wrote:
>> Vik Fearing <vik@postgresfriends.org> writes:
>>> On 9/28/23 20:46, Tom Lane wrote:
>>>> We went through all these points years ago when the enum feature
>>>> was first developed, as I recall.  Nobody thought that the ability
>>>> to remove an enum value was worth the amount of complexity it'd
>>>> entail.
>> 
>>> This issue comes up regularly (although far from often).  Do we want to
>>> put some comments right where would-be implementors would be sure to see it?
>> Perhaps.  I'd be kind of inclined to leave the "yet" out of "not yet
>> implemented" in the error message, as that wording sounds like we just
>> haven't got round to it.
>
> I see your point, but should we be dissuading people who might want to
> work on solving those problems?  I intentionally did not document that 
> this syntax exists so the only people seeing the message are those who
> just try it, and those wanting to write a patch like Danil did.
>
> No one except you has said anything about this patch.  I think it would
> be good to commit it, wordsmithing aside.

FWIW I'm +1 on this patch, and with Tom on dropping the "yet".  To me it
makes it sound like we intend to implement it soon (fsvo).

- ilmari



Andrew Dunstan <andrew@dunslane.net> writes:
> On 2023-09-28 Th 14:46, Tom Lane wrote:
>> We went through all these points years ago when the enum feature
>> was first developed, as I recall.  Nobody thought that the ability
>> to remove an enum value was worth the amount of complexity it'd
>> entail.

> That's quite true, and I accept my part in this history. But I'm not 
> sure we were correct back then.

I think it was the right decision at the time, given that the
alternative was to not add the enum feature at all.  The question
is whether we're now prepared to do additional work to support DROP
VALUE.  But the tradeoff still looks pretty grim, because the
problems haven't gotten any easier.

I've been trying to convince myself that there'd be some value in
your idea about a DISABLE flag, but I feel like there's something
missing there.  The easiest implementation would be to have
enum_in() reject disabled values, while still allowing enum_out()
to print them.  But that doesn't seem to lead to nice results:

* You couldn't do, say,
    SELECT * FROM my_table WHERE enum_col = 'disabled_value'
to look for rows that you need to clean up.  I guess this'd work:
    SELECT * FROM my_table WHERE enum_col::text = 'disabled_value'
but it's un-obvious and could not use an index on enum_col.

* If any of the disabled values remain, dump/restore would fail.
Maybe you'd want that to be sure you got rid of them, but it sounds
like a foot-gun.  ("What do you mean, our only backup doesn't
restore?")  Probably people would wish for two different behaviors:
either don't list disabled values at all in the dumped CREATE TYPE,
or do list them but disable them only after loading data.  The latter
approach will still have problems in data-only restores, but there are
comparable hazards with things like foreign keys.  (pg_upgrade would
need still a third behavior, perhaps.)

On the whole this is still a long way from a clean easy-to-use DROP
facility, and it adds a lot of complexity of its own for pg_dump.
So I'm not sure we want to build it.

            regards, tom lane



Re: Allow deleting enumerated values from an existing enumerated data type

From
Vik Fearing
Date:
On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
> 
>> No one except you has said anything about this patch.  I think it would
>> be good to commit it, wordsmithing aside.
> 
> FWIW I'm +1 on this patch,

Thanks.

> and with Tom on dropping the "yet".  To me it
> makes it sound like we intend to implement it soon (fsvo).
I am not fundamentally opposed to it, nor to any other wordsmithing the 
committer (probably Tom) wants to do.  The main point of the patch is to 
list at least some of the problems that need to be solved in a correct 
implementation.
-- 
Vik Fearing




Vik Fearing <vik@postgresfriends.org> writes:
> On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote:
>> FWIW I'm +1 on this patch,

> Thanks.

>> and with Tom on dropping the "yet".  To me it
>> makes it sound like we intend to implement it soon (fsvo).

> I am not fundamentally opposed to it, nor to any other wordsmithing the 
> committer (probably Tom) wants to do.  The main point of the patch is to 
> list at least some of the problems that need to be solved in a correct 
> implementation.

Pushed with a bit more work on the text.

I left out the regression test, as it seems like it'd add test cycles
to little purpose.  It won't do anything to improve the odds that
someone finds this text.

            regards, tom lane



Re: Allow deleting enumerated values from an existing enumerated data type

From
Vik Fearing
Date:
On 10/3/23 17:44, Tom Lane wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>> On 10/2/23 20:07, Dagfinn Ilmari Mannsåker wrote:
>>> FWIW I'm +1 on this patch,
> 
>> Thanks.
> 
>>> and with Tom on dropping the "yet".  To me it
>>> makes it sound like we intend to implement it soon (fsvo).
> 
>> I am not fundamentally opposed to it, nor to any other wordsmithing the
>> committer (probably Tom) wants to do.  The main point of the patch is to
>> list at least some of the problems that need to be solved in a correct
>> implementation.
> 
> Pushed with a bit more work on the text.
> 
> I left out the regression test, as it seems like it'd add test cycles
> to little purpose.  It won't do anything to improve the odds that
> someone finds this text.

Thanks!
-- 
Vik Fearing




Re: Allow deleting enumerated values from an existing enumerated data type

From
Matthias van de Meent
Date:
On Tue, 3 Oct 2023 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andrew Dunstan <andrew@dunslane.net> writes:
> > On 2023-09-28 Th 14:46, Tom Lane wrote:
> >> We went through all these points years ago when the enum feature
> >> was first developed, as I recall.  Nobody thought that the ability
> >> to remove an enum value was worth the amount of complexity it'd
> >> entail.
>
> > That's quite true, and I accept my part in this history. But I'm not
> > sure we were correct back then.
>
> I think it was the right decision at the time, given that the
> alternative was to not add the enum feature at all.  The question
> is whether we're now prepared to do additional work to support DROP
> VALUE.  But the tradeoff still looks pretty grim, because the
> problems haven't gotten any easier.
>
> I've been trying to convince myself that there'd be some value in
> your idea about a DISABLE flag, but I feel like there's something
> missing there.  The easiest implementation would be to have
> enum_in() reject disabled values, while still allowing enum_out()
> to print them.  But that doesn't seem to lead to nice results:
>
> [...]
>
> On the whole this is still a long way from a clean easy-to-use DROP
> facility, and it adds a lot of complexity of its own for pg_dump.
> So I'm not sure we want to build it.

I don't quite get what the hard problem is that we haven't already
solved for other systems:
We already can add additional constraints to domains (e.g. VALUE::int
<> 4), which (according to docs) scan existing data columns for
violations. We already drop columns without rewriting the table to
remove the column's data, and reject new data insertions for those
still-in-the-catalogs-but-inaccessible columns.

So, if a user wants to drop an enum value, why couldn't we "just" use
the DOMAIN facilities and 1.) add a constraint WHERE value NOT IN
(deleted_values), and after validation of that constraint 2.) mark the
enum value as deleted like we do with table column's pg_attribute
entries?

The only real issue that I can think of is making sure that concurrent
backends don't modify this data, but that shouldn't be very different
from the other locks we already have to take in e.g. ALTER TYPE ...
DROP ATTRIBUTE.

Kind regards,

Matthias van de Meent



Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> I don't quite get what the hard problem is that we haven't already
> solved for other systems:
> We already can add additional constraints to domains (e.g. VALUE::int
> <> 4), which (according to docs) scan existing data columns for
> violations.

That's "solved" only for rather small values of "solved".  While the
code does try to look for violations of the new constraint, there is
no interlock against race conditions (ie, concurrent insertions of
a conflicting value).  It doesn't check temp tables belonging to
other backends, because it can't.  And IIRC it doesn't look for
instances in metadata, such as stored views.

The reason we've considered that Good Enough(TM) for domain
constraints is that if something does sneak through those loopholes,
nothing terribly surprising happens.  You've got a value there that
shouldn't be there, but that's mostly your own fault, and the
system continues to behave sanely.  Also you don't have any problem
introspecting what you've got, because such values will still print
normally.  Plus, we can't really guarantee that users won't get
into such a state in other ways, for example if their constraint
isn't really immutable.

The problem with dropping an enum value is that surprising things
might very well happen, because removal of the pg_enum row risks
comparisons failing if they involve the dropped value.  Thus for
example you might find yourself with a broken index that fails
all insertion and search attempts, even if you'd carefully removed
every user-visible instance of the doomed value: an instance of
it high up in the index tree will break most index accesses.
Even for values in user-visible places like views, the fact that
enum_out will fail doesn't make it any easier to figure out what
is wrong.

We might be able to get to a place where the surprise factor is
low enough to tolerate, but the domain-constraint precedent isn't
good enough for that IMO.

Andrew's idea of DISABLE rather than full DROP is one way of
ameliorating these problems: comparisons would still work, and
we can still print a value that perhaps shouldn't have been there.
But it's not without other problems.

> We already drop columns without rewriting the table to
> remove the column's data, and reject new data insertions for those
> still-in-the-catalogs-but-inaccessible columns.

Those cases don't seem to have a lot of connection to the enum problem.

> The only real issue that I can think of is making sure that concurrent
> backends don't modify this data, but that shouldn't be very different
> from the other locks we already have to take in e.g. ALTER TYPE ...
> DROP ATTRIBUTE.

I'd bet a good deal of money that those cases aren't too bulletproof.
We do not lock types simply because somebody has a value of the type
in flight somewhere in a query, and the cost of doing so would be
quite discouraging I fear.  On the whole, I'd rather accept the
idea that the DROP might not be completely watertight; but then
we have to work out the details of coping with orphaned values
in an acceptable way.

            regards, tom lane