On Thu, 30 Sept 2021 at 11:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > This allows us to memoize any join expression, not just equality
> > expressions.
>
> I am clearly failing to get through to you. Do I need to build
> an example?
Taking your -0.0 / +0.0 float example, If I understand correctly due
to -0.0 and +0.0 hashing to the same value and being classed as equal,
we're really only guaranteed to get the same rows if the join
condition uses the float value (in this example) directly.
If for example there was something like a function we could pass that
float value through that was able to distinguish -0.0 from +0.0, then
that could cause issues as the return value of that function could be
anything and have completely different join partners on the other side
of the join.
A function something like:
create or replace function text_sign(f float) returns text as
$$
begin
if f::text like '-%' then
return 'neg';
else
return 'pos';
end if;
end;
$$ language plpgsql;
would be enough to do it. If the join condition was text_sign(t1.f) =
t2.col and the cache key was t1.f rather than text_sign(t1.f)
On Thu, 30 Sept 2021 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I'm now thinking you weren't that far wrong to be looking at
> hashability of the top-level qual operator. What is missing is
> that you have to restrict candidate cache keys to be the *direct*
> arguments of such an operator. Looking any further down in the
> expression introduces untenable assumptions.
I think I also follow you now with the above. The problem is that if
the join operator is able to distinguish something that the equality
operator and hash function are not then we have the same problem.
Restricting the join operator to hash equality ensures that the join
condition cannot distinguish anything that we cannot distinguish in
the cache hash table.
David