Thread: Re: BRIN index creation on geometry column causes crash

Re: BRIN index creation on geometry column causes crash

From
Tomas Vondra
Date:
On 2/12/25 11:55, Álvaro Herrera wrote:
> On 2025-Jan-03, Tomas Vondra wrote:
> 
>> Reproduced, but I belive this is actually a long-standing PostGIS bug.
>> The exact place where it crashes is here:
>>
>> /* Finally, merge B to A. */
>> finfo = inclusion_get_procinfo(bdesc, attno, PROCNUM_MERGE);
>> Assert(finfo != NULL);
>> result = FunctionCall2Coll(finfo, colloid, ...);
>>
>> With asserts, it fails on the assert, i.e. finfo is NULL. This means the
>> opfamily is missing the "MERGE" support procedure, which is however
>> required (from the very beginning of BRIN in 9.5).
> 
> Hmm, I completely agree that this is a PostGIS bug, but at the same time
> I think it's wrong for Postgres not to verify that the support function
> exists, so that we can throw an error instead of crashing.  I think the
> most expedient way to implement this is to add a "missing_ok" argument
> to the various "foo_get_procinfo" functions, and make to pass it true
> only in cases where the usage of the return value already admit that it
> could be a null pointer (meaning it's really an optional support proc).
> 
> This should avoid further crashes with invalidly defined opclasses; see
> attached POC patch.  The user is still in a bad place, because for
> instance if it's autovacuum that tries to do the range merging, then
> vacuuming will fail but people may not notice for months unless they're
> paying careful attention to the server logs.  Crashing is good (tm)
> because now a bunch of people are aware that the opclass is broken.
> 

Yeah, I'm not opposed to doing this. I don't know what's the right level
of paranoia when dealing with user-defined stuff.

But if I get this right, this is merely a mitigation turning a crash
into ERROR, it would not help with detecting the opclass brokenness any
earlier. Until the parallel builds it was very rare to need the MERGE.

> My next question is why didn't brinvalidate detect this problem.  Is it
> just that nobody ran 'amvalidate' on the opclass, or is it that
> brinvalidate does not know that it needs to require the merge proc?  But
> I'll leave that for later.
> 

I don't know. Either no one tried running it, or maybe it wasn't quite
clear which of the reported issues are serious and which can be ignored.
I certainly wasn't quite clear to me when I tried running it, because
all the messages are INFO, it doesn't say which functions are missing,
and at least some of it seemed to be due to cross-type operators.

See this for details:

https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030457.html

> 
> Now, in writing this patch I noticed that both brin_bloom.c and
> brin_minmax_multi.c seem to have cargo-culted the idea that support
> procs can be optional; both of them have a single support proc
> (PROCNUM_HASH and PROCNUM_DISTANCE, respectively) which is not optional;
> so the "extra_proc_missing" stuff seems unnecessary for them.  And
> therefore the new missing_ok support I introduce in this patch would
> also be unnecessary.  As far as I can tell, there are no ABI concerns
> about removing opaque->extra_proc_missing[] from both these opclass
> scaffolds (and reintroducing it later, if we determine that we do need
> optional support procs in those scaffolds after all.)
> 

Yeah, copy-paste is my basic coding tool, so it's quite possible the
bloom and minmax_multi could do stuff in a simpler way.

I can't think of a reason why would removing extra_proc_missing[] break
ABI, but I don't quite see the need to change this in backbranches. And
if we want to stop doing unnecessary stuff, maybe it'd be enough to just
remove the code, but leave extra_proc_missing in the struct.


regards

-- 
Tomas Vondra




Re: BRIN index creation on geometry column causes crash

From
Álvaro Herrera
Date:
On 2025-Feb-12, Tomas Vondra wrote:

> On 2/12/25 11:55, Álvaro Herrera wrote:

> > Hmm, I completely agree that this is a PostGIS bug, but at the same time
> > I think it's wrong for Postgres not to verify that the support function
> > exists, so that we can throw an error instead of crashing.

> Yeah, I'm not opposed to doing this. I don't know what's the right level
> of paranoia when dealing with user-defined stuff.

Yeah, I think "make sure we don't crash" is better.  I've pushed this to
all live branches now.

> But if I get this right, this is merely a mitigation turning a crash
> into ERROR, it would not help with detecting the opclass brokenness any
> earlier. Until the parallel builds it was very rare to need the MERGE.

True.  The final fix belongs in the PostGIS opclass definition.  With
this patch, you can still have an index defined with a bogus opclass,
but we won't crash anymore.

> > My next question is why didn't brinvalidate detect this problem.
> 
> I don't know. Either no one tried running it, or maybe it wasn't quite
> clear which of the reported issues are serious and which can be ignored.
> I certainly wasn't quite clear to me when I tried running it, because
> all the messages are INFO, it doesn't say which functions are missing,
> and at least some of it seemed to be due to cross-type operators.
> 
> See this for details:
> 
> https://lists.osgeo.org/pipermail/postgis-devel/2025-January/030457.html

Ugh.  Yeah, this report does not appear to be good enough -- it's far
too noisy and at the same time not detailed enough.  And maybe we're
missing some detailed guidance on what to do about each of those errors.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/