On 7/31/25 20:35, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>> On 7/31/25 19:33, Tom Lane wrote:
>>> ... It is certainly broken on
>>> 32-bit machines where the Datum result of numeric_float8 will
>>> be a pointer, so that we will convert the numeric pointer value
>>> to a float and return that, yielding a totally-garbage distance
>>> value. But I think it's broken on 64-bit machines too, because
>>> we'll be interpreting the output of numeric_float8 as a uintptr_t
>>> and applying some unwanted conversion to make that a float.
>
>> Agreed it's a bug on 32-bit machines. Not sure about 64-bits.
>
> Yeah, I'm not 100% sure about that. It's certainly doing something
> unexpected, but we might accidentally end up with relatively-sane
> relative distance comparisons anyway. (I assume the outputs will
> only be compared to other outputs of the same function, right?)
Yes. The index accumulates values, sorts them, calculates distance
between the points, and them "merges" the closest ones. So it only
compares results of the same function.
> I have a vague recollection that the IEEE float format was chosen with
> an eye to making comparisons cheap, ie not too much different from
> integer comparisons. So the sort order might be about the same
> even after incorrectly reinterpreting the bit-pattern as an int.
> NaNs probably mess that up, but they would anyway.
>
>> Actually, no - it should not cause "broken" indexes, as in "giving
>> incorrect results". The distance functions determine in what order we
>> merge points into ranges, and if the distances are bogus then we can
>> build a summary that is less efficient.
>
> Got it. So it might be worth reindexing such indexes after the
> fix, but it's not strictly necessary.
>
Exactly.
regards
--
Tomas Vondra