On Thu, 11 Nov 2021 at 18:08, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 30 Sept 2021 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Actually, the more I look at this the more unhappy I get, because
> > it's becoming clear that you have made unfounded semantic
> > assumptions. The hash functions generally only promise that they
> > will distinguish values that are distinguishable by the associated
> > equality operator. We have plenty of data types in which that does
> > not map to bitwise equality ... you need not look further than
> > float8 for an example.
>
> I think this part might be best solved by allowing Memoize to work in
> a binary mode. We already have datum_image_eq() for performing a
> binary comparison on a Datum. We'll also need to supplement that with
> a function that generates a hash value based on the binary value too.
Since I really don't want to stop Memoize from working with LATERAL
joined function calls, I see no other way other than to make use of a
binary key'd cache for cases where we can't be certain that it's safe
to make work in non-binary mode.
I've had thoughts about if we should just make it work in binary mode
all the time, but my thoughts are that that's not exactly a great idea
since it could have a negative effect on cache hits due to there being
the possibility that some types such as case insensitive text where
the number of variations in the binary representation might be vast.
The reason this could be bad is that the estimated cache hit ratio is
calculated by looking at n_distinct, which is obviously not looking
for distinctions in the binary representation. So in binary mode, we
may get a lower cache hit ratio than we might think we'll get, even
with accurate statistics. I'd like to minimise those times by only
using binary mode when we can't be certain that logical mode is safe.
The patch does add new fields to the Memoize plan node type, the
MemoizeState executor node and also MemoizePath. The new fields do
fit inside the padding of the existing structs. I've also obviously
had to modify the read/write/copy functions for Memoize to add the new
field there too. My understanding is that this should be ok since
those are only used for parallel query to send plans to the working
and to deserialise it on the worker side. There should never be any
version mismatches there.
If anyone wants to chime in about my proposed patch for this, then
please do so soon. I'm planning to look at this in my Tuesday morning
(UTC+13).
David