Re: Bug in brin_minmax_multi_distance_numeric() - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Bug in brin_minmax_multi_distance_numeric()
Date
Msg-id d48b2a78-7ce0-46d8-b096-2ff2331aa819@eisentraut.org
Whole thread Raw
In response to Re: Bug in brin_minmax_multi_distance_numeric()  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Bug in brin_minmax_multi_distance_numeric()
List pgsql-hackers
On 01.08.25 19:17, Tomas Vondra wrote:
> 
> 
> On 7/31/25 20:35, Tom Lane wrote:
>> Tomas Vondra <tomas@vondra.me> writes:
>>> On 7/31/25 19:33, Tom Lane wrote:
>>>> ... It is certainly broken on
>>>> 32-bit machines where the Datum result of numeric_float8 will
>>>> be a pointer, so that we will convert the numeric pointer value
>>>> to a float and return that, yielding a totally-garbage distance
>>>> value.  But I think it's broken on 64-bit machines too, because
>>>> we'll be interpreting the output of numeric_float8 as a uintptr_t
>>>> and applying some unwanted conversion to make that a float.
>>
>>> Agreed it's a bug on 32-bit machines. Not sure about 64-bits.
>>
>> Yeah, I'm not 100% sure about that.  It's certainly doing something
>> unexpected, but we might accidentally end up with relatively-sane
>> relative distance comparisons anyway.  (I assume the outputs will
>> only be compared to other outputs of the same function, right?)
> 
> 
> Yes. The index accumulates values, sorts them, calculates distance
> between the points, and them "merges" the closest ones. So it only
> compares results of the same function.
> 
>> I have a vague recollection that the IEEE float format was chosen with
>> an eye to making comparisons cheap, ie not too much different from
>> integer comparisons.  So the sort order might be about the same
>> even after incorrectly reinterpreting the bit-pattern as an int.
>> NaNs probably mess that up, but they would anyway.
>>
>>> Actually, no - it should not cause "broken" indexes, as in "giving
>>> incorrect results". The distance functions determine in what order we
>>> merge points into ranges, and if the distances are bogus then we can
>>> build a summary that is less efficient.
>>
>> Got it.  So it might be worth reindexing such indexes after the
>> fix, but it's not strictly necessary.

Do we want to make a separate commit for this issue that can be 
backpatched and have some user-facing information attached to it?



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pgaio_io_get_id() type (was Re: Datum as struct)
Next
From: Nikhil Kumar Veldanda
Date:
Subject: Re: Dead code with short varlenas in toast_save_datum()