On 01.08.25 19:17, Tomas Vondra wrote:
>
>
> On 7/31/25 20:35, Tom Lane wrote:
>> Tomas Vondra <tomas@vondra.me> writes:
>>> On 7/31/25 19:33, Tom Lane wrote:
>>>> ... It is certainly broken on
>>>> 32-bit machines where the Datum result of numeric_float8 will
>>>> be a pointer, so that we will convert the numeric pointer value
>>>> to a float and return that, yielding a totally-garbage distance
>>>> value. But I think it's broken on 64-bit machines too, because
>>>> we'll be interpreting the output of numeric_float8 as a uintptr_t
>>>> and applying some unwanted conversion to make that a float.
>>
>>> Agreed it's a bug on 32-bit machines. Not sure about 64-bits.
>>
>> Yeah, I'm not 100% sure about that. It's certainly doing something
>> unexpected, but we might accidentally end up with relatively-sane
>> relative distance comparisons anyway. (I assume the outputs will
>> only be compared to other outputs of the same function, right?)
>
>
> Yes. The index accumulates values, sorts them, calculates distance
> between the points, and them "merges" the closest ones. So it only
> compares results of the same function.
>
>> I have a vague recollection that the IEEE float format was chosen with
>> an eye to making comparisons cheap, ie not too much different from
>> integer comparisons. So the sort order might be about the same
>> even after incorrectly reinterpreting the bit-pattern as an int.
>> NaNs probably mess that up, but they would anyway.
>>
>>> Actually, no - it should not cause "broken" indexes, as in "giving
>>> incorrect results". The distance functions determine in what order we
>>> merge points into ranges, and if the distances are bogus then we can
>>> build a summary that is less efficient.
>>
>> Got it. So it might be worth reindexing such indexes after the
>> fix, but it's not strictly necessary.
Do we want to make a separate commit for this issue that can be
backpatched and have some user-facing information attached to it?