On Wed, 22 Mar 2023 at 19:18, David Rowley <dgrowleyml@gmail.com> wrote:
> The reason this only seems to happen when Memoize is in binary mode is
> because we use datum_image_hash() which will hash the full 64 bytes of
> the name Datum. When in logical mode and in hash joins, we'll use
> hashname(), which calls strlen() on the input name Datum and only
> hashes the bytes as if the name were a string:
A relevant comment is in StoreIndexTuple():
/*
* Note: we must use the tupdesc supplied by the AM in index_deform_tuple,
* not the slot's tupdesc, in case the latter has different datatypes
* (this happens for btree name_ops in particular). They'd better have
* the same number of columns though, as well as being datatype-compatible
* which is something we can't so easily check.
*/
I'm just not really certain if we can say name is
"datatype-compatible" with cstring or not. It seems that namehash,
namecmp, nameout etc are all coded so that they can accept cstrings as
inputs. It's just not going to be safe for anything that wants to
access all of the NAMEDATALEN bytes.
Another bug I've found relating to this is:
CREATE TABLE t (n name);
INSERT INTO t VALUES('name'), ('name');
CREATE INDEX t_idx ON t(n);
ANALYZE t;
set enable_seqscan=0;
set enable_indexscan=0;
postgres=# select record_image_eq(row('name'::name), row(t.n)) from t
order by n;
record_image_eq
-----------------
f <- incorrect result
f <- incorrect result
(2 rows)
set enable_indexscan=1;
set enable_indexonlyscan=0;
postgres=# select record_image_eq(row('name'::name), row(t.n)) from t
order by n;
record_image_eq
-----------------
t <- correct result
t <- correct result
(2 rows)
It just seems to me that this whole reading one type from an index and
assuming it's some other different time is going to be a source of
bugs, even if we were to fix the Memoize and record_image_eq() bugs.
I tried and failed to find problems with the usage of datum_image_eq()
used in ri_KeysEqual(). This is because to get an UPDATE to perform an
Index Only Scan would require that the index had the ctid column. It
does not seem impossible that we might want to special case that,
since the index does contain the heap ctid, after all. If we did
that, we'd subtly break foreign key checks when performing an
UPDATE/DELETE on the referenced table when the foreign key column is
of type "name".
Right now, I don't have any bright ideas on how we might fix this.
The only thing I can think of right now is to adjust StoreIndexTuple()
to check for mismatching data types and if it finds one, convert with
the index type's output function and import with the heap type's input
function. Sounds like painful overhead not to mention how to manage
the memory allocations so it does not leak any copied Datum values.
David