Thread: Should we allow ALTER OPERATOR CLASS to ADD/DROP operators and procedures?

Hi,

At the moment we allow ALTER OPERATOR FAMILY to add/drop almost any part
of the family, while ALTER OPERATOR CLASS allows changing only the name,
owner and schema.

The ALTER OPERATOR FAMILY docs [1] explain that this is because the
elements included in operator family are loose/optional, and not
required for any specific index, and can be dropped at any time, while
opclass defines stuff "required" by individual indexes ... and dropping
any of it would require dropping the opclass, which requires dropping
the indexes referencing it.

I wonder if maybe we could relax this a bit, in some way ...

The reason is a recent bug PostGIS bug [2], and how it got fixed. The
gist of it is that PostgGIS defined a couple BRIN opclasses, those
opclasses were always broken, but the parallel builds in PG17 made it
trivial to hit it and cause a crash. Which could happen even before
PG17, but it was much less likely (requires concurrent updates, but
those tables are usually static).

The thing is, the index was fine - even if it crashed, the index should
not have been corrupted or anything. But fixing the bug requires
changing the opclasses - adding a missing (required) MERGE procedure,
and replacing a custom add_value proc with the built-in one. The latter
could have been done by keeping the custom function and just calling the
built-in variant, but there was no way to add the missing MERGE proc.

At that point the only "supported" fix is to drop the opclass with all
the indexes, and then recreate everything. But that's pretty painful,
especially when you know the indexes are OK and the tables are massive
(which is often the case for tables with GIS data, and that's also why
people use BRIN, after all).

Which is why PostGIS devs chose a different path - updating the catalogs
to modify the opclass definition [3], using this script [4].

That's a bit ... terrifying. I assume the reason why we don't allow
ALTER OPERATOR CLASS to change these fields is we're concerned people
might break existing indexes / crash the server. But there's probably
many "supported" ways leading to the same thing (I could change the C
function backing the opclass), and if people instead opt to update
catalogs directly, did it even protect against anything?

In any case, I'd expect only superuser to be allowed to do this, just
like for ALTER OPERATOR FAMILY. So why is this different?


Or am I missing some fundamental reason why we don't want to allow ALTER
OPERATOR CLASS to change some of the parts?


regards


[1] https://www.postgresql.org/docs/current/sql-alteropfamily.html

[2] https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030457.html

[3] https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030469.html

[4]
https://git.osgeo.org/gitea/postgis/postgis/src/branch/stable-3.5/postgis/postgis_after_upgrade.sql#L281

-- 
Tomas Vondra



On 2/7/25 01:15, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>> Which is why PostGIS devs chose a different path - updating the catalogs
>> to modify the opclass definition [3], using this script [4].
> 
>> That's a bit ... terrifying.
> 
> Indeed.  Did it not occur to them to use ALTER OPERATOR FAMILY?
> 

No, it did not. But that may be my fault - I've reported the bug, and
I've been discussing possible fixes with them, and it didn't occur to me
either :-(

It seems quite non-intuitive to fix a bug in OPERATOR CLASS by doing
ALTER OPERATOR FAMILY.

> The end result of doing so is that the added opfamily members
> wouldn't have been bound tightly to the opclass, but that does
> not seem like a huge deal in context.  They'd still have worked
> fine AFAIK.
> 

I haven't tried but I assume you're suggesting adding the MERGE proc to
the opfamily would be enough for the opclass to pick it up?

However, I'm not sure this would be a complete fix, because what would
they do with the existing ADD_VALUE proc? Sure, they could just tweak
the C code to "redirect" the call to the built-in one. But that requires
the MERGE proc being defined, so it'd crash if someone installs the new
version .so but doesn't do ALTER EXTENSION ... UPDATE.


regards

-- 
Tomas Vondra




Tomas Vondra <tomas@vondra.me> writes:
> On 2/7/25 01:15, Tom Lane wrote:
>> The end result of doing so is that the added opfamily members
>> wouldn't have been bound tightly to the opclass, but that does
>> not seem like a huge deal in context.  They'd still have worked
>> fine AFAIK.

> I haven't tried but I assume you're suggesting adding the MERGE proc to
> the opfamily would be enough for the opclass to pick it up?

Yes.  AFAIR (didn't check the code) the only difference between
pg_amop/pg_amproc entries made in CREATE OPERATOR CLASS and those
made in ALTER OPERATOR FAMILY is that the former have a dependency
on the opclass while the latter have a dependency on the opfamily.
That won't matter unless you do DROP OPERATOR CLASS without
dropping the rest of the family.

> However, I'm not sure this would be a complete fix, because what would
> they do with the existing ADD_VALUE proc? Sure, they could just tweak
> the C code to "redirect" the call to the built-in one.

Seems better to use CREATE OR REPLACE FUNCTION to do that redirection,
so that it happens during ALTER EXTENSION ... UPDATE rather than
whenever the updated shlib is installed.

            regards, tom lane