Should we allow ALTER OPERATOR CLASS to ADD/DROP operators and procedures? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Should we allow ALTER OPERATOR CLASS to ADD/DROP operators and procedures?
Date
Msg-id a4f8c172-4850-49ff-a98f-566d16664c54@vondra.me
Whole thread Raw
List pgsql-hackers
Hi,

At the moment we allow ALTER OPERATOR FAMILY to add/drop almost any part
of the family, while ALTER OPERATOR CLASS allows changing only the name,
owner and schema.

The ALTER OPERATOR FAMILY docs [1] explain that this is because the
elements included in operator family are loose/optional, and not
required for any specific index, and can be dropped at any time, while
opclass defines stuff "required" by individual indexes ... and dropping
any of it would require dropping the opclass, which requires dropping
the indexes referencing it.

I wonder if maybe we could relax this a bit, in some way ...

The reason is a recent bug PostGIS bug [2], and how it got fixed. The
gist of it is that PostgGIS defined a couple BRIN opclasses, those
opclasses were always broken, but the parallel builds in PG17 made it
trivial to hit it and cause a crash. Which could happen even before
PG17, but it was much less likely (requires concurrent updates, but
those tables are usually static).

The thing is, the index was fine - even if it crashed, the index should
not have been corrupted or anything. But fixing the bug requires
changing the opclasses - adding a missing (required) MERGE procedure,
and replacing a custom add_value proc with the built-in one. The latter
could have been done by keeping the custom function and just calling the
built-in variant, but there was no way to add the missing MERGE proc.

At that point the only "supported" fix is to drop the opclass with all
the indexes, and then recreate everything. But that's pretty painful,
especially when you know the indexes are OK and the tables are massive
(which is often the case for tables with GIS data, and that's also why
people use BRIN, after all).

Which is why PostGIS devs chose a different path - updating the catalogs
to modify the opclass definition [3], using this script [4].

That's a bit ... terrifying. I assume the reason why we don't allow
ALTER OPERATOR CLASS to change these fields is we're concerned people
might break existing indexes / crash the server. But there's probably
many "supported" ways leading to the same thing (I could change the C
function backing the opclass), and if people instead opt to update
catalogs directly, did it even protect against anything?

In any case, I'd expect only superuser to be allowed to do this, just
like for ALTER OPERATOR FAMILY. So why is this different?


Or am I missing some fundamental reason why we don't want to allow ALTER
OPERATOR CLASS to change some of the parts?


regards


[1] https://www.postgresql.org/docs/current/sql-alteropfamily.html

[2] https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030457.html

[3] https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030469.html

[4]
https://git.osgeo.org/gitea/postgis/postgis/src/branch/stable-3.5/postgis/postgis_after_upgrade.sql#L281

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: connection establishment versus parallel workers
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.