Thread: clarify equalTupleDescs()

clarify equalTupleDescs()

From
Peter Eisentraut
Date:
In a recent patch thread it was discussed[0] which fields should be 
compared by equalTupleDescs() and whether it is ok to remove a field 
from tuple descriptors and how that should affect their equality 
(attstattarget in that case).

After analyzing all the callers, I have noticed that there are two 
classes of callers of equalTupleDescs():

The first want to compare what I call row-type equality, which means 
they want to check specifically for equal number of attributes, and the 
same attribute names, types, and typmods for each attribute.  Most 
callers actually want that behavior.  The remaining callers just want to 
compare the tuple descriptors as they are, they don't care why the 
fields are in there, they just want to compare all of them.

In the attached patch, I add a new function equalRowTypes() that is 
effectively a subset of equalTupleDescs() and switch most callers to that.

The purpose of this patch is to make the semantics less uncertain. 
Questions like the one in [0] about attstattarget now have a clear 
answer for both functions.  I think this would be useful to have, as we 
are thinking about more changes in pg_attribute and tuple descriptors.


[0]: 
https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql
Attachment

Re: clarify equalTupleDescs()

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> The first want to compare what I call row-type equality, which means 
> they want to check specifically for equal number of attributes, and the 
> same attribute names, types, and typmods for each attribute.  Most 
> callers actually want that behavior.

Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

            regards, tom lane



Re: clarify equalTupleDescs()

From
jian he
Date:
On Tue, Feb 6, 2024 at 8:59 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> In a recent patch thread it was discussed[0] which fields should be
> compared by equalTupleDescs() and whether it is ok to remove a field
> from tuple descriptors and how that should affect their equality
> (attstattarget in that case).
>
> After analyzing all the callers, I have noticed that there are two
> classes of callers of equalTupleDescs():
>
> The first want to compare what I call row-type equality, which means
> they want to check specifically for equal number of attributes, and the
> same attribute names, types, and typmods for each attribute.  Most
> callers actually want that behavior.  The remaining callers just want to
> compare the tuple descriptors as they are, they don't care why the
> fields are in there, they just want to compare all of them.
>
> In the attached patch, I add a new function equalRowTypes() that is
> effectively a subset of equalTupleDescs() and switch most callers to that.
>
> The purpose of this patch is to make the semantics less uncertain.
> Questions like the one in [0] about attstattarget now have a clear
> answer for both functions.  I think this would be useful to have, as we
> are thinking about more changes in pg_attribute and tuple descriptors.
>
>
> [0]:
> https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql

function name record_type_typmod_hash imply that
hashRowType should also hash atttypmod field?

also:

bool
equalRowTypes(TupleDesc tupdesc1, TupleDesc tupdesc2)
{
if (tupdesc1->natts != tupdesc2->natts)
return false;
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

for (int i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i);

if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0)
return false;
if (attr1->atttypid != attr2->atttypid)
return false;
if (attr1->atttypmod != attr2->atttypmod)
return false;
}

return true;
}

/*
 * hashRowType
 *
 * If two tuple descriptors would be considered equal by equalRowTypes()
 * then their hash value will be equal according to this function.
 */
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?



Re: clarify equalTupleDescs()

From
Peter Eisentraut
Date:
On 07.02.24 04:06, jian he wrote:
> /*
>   * hashRowType
>   *
>   * If two tuple descriptors would be considered equal by equalRowTypes()
>   * then their hash value will be equal according to this function.
>   */
> uint32
> hashRowType(TupleDesc desc)
> {
> uint32 s;
> int i;
> 
> s = hash_combine(0, hash_uint32(desc->natts));
> s = hash_combine(s, hash_uint32(desc->tdtypeid));
> for (i = 0; i < desc->natts; ++i)
> s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));
> 
> return s;
> }
> 
> from the hashRowType comment, should we also hash attname and atttypmod?

In principle, hashRowType() could process all the fields that 
equalRowTypes() does.  But since it's only a hash function, it doesn't 
have to be perfect.  (This is also the case for the current 
hashTupleDesc().)  I'm not sure where the best tradeoff is.



Re: clarify equalTupleDescs()

From
Peter Eisentraut
Date:
On 06.02.24 16:14, Tom Lane wrote:
> Peter Eisentraut <peter@eisentraut.org> writes:
>> The first want to compare what I call row-type equality, which means
>> they want to check specifically for equal number of attributes, and the
>> same attribute names, types, and typmods for each attribute.  Most
>> callers actually want that behavior.
> 
> Should compare attcollation too, no?
> 
> +1 for the general idea, but it seems like "row type equality"
> might still be a slightly fuzzy concept.

I did another pass across the callers to check what pg_attribute fields 
might be relevant.

Collation definitely needs to be added, certainly for plancache.c, maybe 
for typcache.c, the other callers don't care.

Record types can have attisdropped fields, so it's probably good to 
check those.

I'm suspicious about attndims.  Maybe one could create a test case where 
record types differ only in that.  Support for attndims throughout the 
system is weak, but maybe there is something to check there.

On a conceptual level, I figured pg_attribute rows can be divided up 
into three categories:

1. "row type" stuff: attname, atttypid, atttypmod, attndims, 
attisdropped, attcollation

2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

3. table metadata stuff (everything else)

It's not perfect, and sometimes it's not clear whether these categories 
inform the implementation or the other way around, but I think it helps 
conceptualize it.

Attachment

Re: clarify equalTupleDescs()

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 06.02.24 16:14, Tom Lane wrote:
>> +1 for the general idea, but it seems like "row type equality"
>> might still be a slightly fuzzy concept.

> I did another pass across the callers to check what pg_attribute fields
> might be relevant.

> Collation definitely needs to be added, certainly for plancache.c, maybe
> for typcache.c, the other callers don't care.

+1

> Record types can have attisdropped fields, so it's probably good to
> check those.

Yeah, good idea.  (In most cases the attname comparison would catch
that, but we shouldn't rely on it.)  In a perfect world maybe a
dropped column should be invisible to this comparison, but we're
a very long way from being able to treat it that way.

> I'm suspicious about attndims.  Maybe one could create a test case where
> record types differ only in that.  Support for attndims throughout the
> system is weak, but maybe there is something to check there.

There was a discussion last year[1] about removing attndims
altogether, which still seems to me like possibly a good idea.
So I doubt we want to consider it as a core semantic field.

> On a conceptual level, I figured pg_attribute rows can be divided up
> into three categories:

> 1. "row type" stuff: attname, atttypid, atttypmod, attndims,
> attisdropped, attcollation

> 2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

I recall some discussion about taking attcacheoff out of this data
structure too ...

> 3. table metadata stuff (everything else)

> It's not perfect, and sometimes it's not clear whether these categories
> inform the implementation or the other way around, but I think it helps
> conceptualize it.

Sure.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/ZD%2B14YZ4IUue8Rhi%40gendo.asyd.net



Re: clarify equalTupleDescs()

From
jian he
Date:
On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
>
> In principle, hashRowType() could process all the fields that
> equalRowTypes() does.  But since it's only a hash function, it doesn't
> have to be perfect.  (This is also the case for the current
> hashTupleDesc().)  I'm not sure where the best tradeoff is.

That's where my confusion comes from.
hashRowType is used in record_type_typmod_hash.
record_type_typmod_hash is within assign_record_type_typmod.

in assign_record_type_typmod:
------------------------------------------------
if (RecordCacheHash == NULL)
{
/* First time through: initialize the hash table */
HASHCTL ctl;
ctl.keysize = sizeof(TupleDesc); /* just the pointer */
ctl.entrysize = sizeof(RecordCacheEntry);
ctl.hash = record_type_typmod_hash;
ctl.match = record_type_typmod_compare;
RecordCacheHash = hash_create("Record information cache", 64,
  &ctl,
  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
/* Also make sure CacheMemoryContext exists */
if (!CacheMemoryContext)
CreateCacheMemoryContext();
}
/*
* Find a hashtable entry for this tuple descriptor. We don't use
* HASH_ENTER yet, because if it's missing, we need to make sure that all
* the allocations succeed before we create the new entry.
*/
recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
&tupDesc,
HASH_FIND, &found);
------------------------------------------------
based on the comments in hash_create. The above hash_search function
would first use
record_type_typmod_hash to find out candidate entries in a hash table
then use record_type_typmod_compare to compare the given tupDesc with
candidate entries.

Is this how the hash_search in assign_record_type_typmod works?

equalRowTypes processed more fields than hashRowType,
hashRowType comments mentioned equalRowTypes,
maybe we should have some comments in hashRowType explaining why only
hashing natts, tdtypeid, atttypid will be fine.



Re: clarify equalTupleDescs()

From
Tomas Vondra
Date:
Hi,

I looked at this patch today. I went through all the calls switched to
equalRowTypes, and AFAIK all of them are correct - all the places
switched to equalRowTypes() only need the weaker checks.

There's only two places still calling equalTupleDescs() - relcache
certainly needs that, and so does the assert in execReplication().


As for attndims, I agree equalRowTypes() should not check that. We're
not really checking that anywhere, it'd be quite weird to start with it
here. Especially if the plan is to remove it entirely.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: clarify equalTupleDescs()

From
Tomas Vondra
Date:

On 2/27/24 12:13, jian he wrote:
> On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>>
>> In principle, hashRowType() could process all the fields that
>> equalRowTypes() does.  But since it's only a hash function, it doesn't
>> have to be perfect.  (This is also the case for the current
>> hashTupleDesc().)  I'm not sure where the best tradeoff is.
> 
> That's where my confusion comes from.
> hashRowType is used in record_type_typmod_hash.
> record_type_typmod_hash is within assign_record_type_typmod.
>> in assign_record_type_typmod:
> ------------------------------------------------
> if (RecordCacheHash == NULL)
> {
> /* First time through: initialize the hash table */
> HASHCTL ctl;
> ctl.keysize = sizeof(TupleDesc); /* just the pointer */
> ctl.entrysize = sizeof(RecordCacheEntry);
> ctl.hash = record_type_typmod_hash;
> ctl.match = record_type_typmod_compare;
> RecordCacheHash = hash_create("Record information cache", 64,
>   &ctl,
>   HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
> /* Also make sure CacheMemoryContext exists */
> if (!CacheMemoryContext)
> CreateCacheMemoryContext();
> }
> /*
> * Find a hashtable entry for this tuple descriptor. We don't use
> * HASH_ENTER yet, because if it's missing, we need to make sure that all
> * the allocations succeed before we create the new entry.
> */
> recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> &tupDesc,
> HASH_FIND, &found);
> ------------------------------------------------
> based on the comments in hash_create. The above hash_search function
> would first use
> record_type_typmod_hash to find out candidate entries in a hash table
> then use record_type_typmod_compare to compare the given tupDesc with
> candidate entries.
> 
> Is this how the hash_search in assign_record_type_typmod works?
> 

Yes.

> equalRowTypes processed more fields than hashRowType,
> hashRowType comments mentioned equalRowTypes,
> maybe we should have some comments in hashRowType explaining why only
> hashing natts, tdtypeid, atttypid will be fine.
> 

Not sure I understand what the confusion is - omitting fields with
little entropy is not uncommon, and collisions are inherent to hash
tables, and need to be handled anyway.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: clarify equalTupleDescs()

From
Peter Eisentraut
Date:
On 13.03.24 19:43, Tomas Vondra wrote:
> I looked at this patch today. I went through all the calls switched to
> equalRowTypes, and AFAIK all of them are correct - all the places
> switched to equalRowTypes() only need the weaker checks.
> 
> There's only two places still calling equalTupleDescs() - relcache
> certainly needs that, and so does the assert in execReplication().
> 
> As for attndims, I agree equalRowTypes() should not check that. We're
> not really checking that anywhere, it'd be quite weird to start with it
> here. Especially if the plan is to remove it entirely.

Thanks for checking this again.  I have committed the patch as it was 
presented then.