Thread: clarify equalTupleDescs()
In a recent patch thread it was discussed[0] which fields should be compared by equalTupleDescs() and whether it is ok to remove a field from tuple descriptors and how that should affect their equality (attstattarget in that case). After analyzing all the callers, I have noticed that there are two classes of callers of equalTupleDescs(): The first want to compare what I call row-type equality, which means they want to check specifically for equal number of attributes, and the same attribute names, types, and typmods for each attribute. Most callers actually want that behavior. The remaining callers just want to compare the tuple descriptors as they are, they don't care why the fields are in there, they just want to compare all of them. In the attached patch, I add a new function equalRowTypes() that is effectively a subset of equalTupleDescs() and switch most callers to that. The purpose of this patch is to make the semantics less uncertain. Questions like the one in [0] about attstattarget now have a clear answer for both functions. I think this would be useful to have, as we are thinking about more changes in pg_attribute and tuple descriptors. [0]: https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql
Attachment
Peter Eisentraut <peter@eisentraut.org> writes: > The first want to compare what I call row-type equality, which means > they want to check specifically for equal number of attributes, and the > same attribute names, types, and typmods for each attribute. Most > callers actually want that behavior. Should compare attcollation too, no? +1 for the general idea, but it seems like "row type equality" might still be a slightly fuzzy concept. regards, tom lane
On Tue, Feb 6, 2024 at 8:59 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > In a recent patch thread it was discussed[0] which fields should be > compared by equalTupleDescs() and whether it is ok to remove a field > from tuple descriptors and how that should affect their equality > (attstattarget in that case). > > After analyzing all the callers, I have noticed that there are two > classes of callers of equalTupleDescs(): > > The first want to compare what I call row-type equality, which means > they want to check specifically for equal number of attributes, and the > same attribute names, types, and typmods for each attribute. Most > callers actually want that behavior. The remaining callers just want to > compare the tuple descriptors as they are, they don't care why the > fields are in there, they just want to compare all of them. > > In the attached patch, I add a new function equalRowTypes() that is > effectively a subset of equalTupleDescs() and switch most callers to that. > > The purpose of this patch is to make the semantics less uncertain. > Questions like the one in [0] about attstattarget now have a clear > answer for both functions. I think this would be useful to have, as we > are thinking about more changes in pg_attribute and tuple descriptors. > > > [0]: > https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql function name record_type_typmod_hash imply that hashRowType should also hash atttypmod field? also: bool equalRowTypes(TupleDesc tupdesc1, TupleDesc tupdesc2) { if (tupdesc1->natts != tupdesc2->natts) return false; if (tupdesc1->tdtypeid != tupdesc2->tdtypeid) return false; for (int i = 0; i < tupdesc1->natts; i++) { Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i); Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i); if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0) return false; if (attr1->atttypid != attr2->atttypid) return false; if (attr1->atttypmod != attr2->atttypmod) return false; } return true; } /* * hashRowType * * If two tuple descriptors would be considered equal by equalRowTypes() * then their hash value will be equal according to this function. */ uint32 hashRowType(TupleDesc desc) { uint32 s; int i; s = hash_combine(0, hash_uint32(desc->natts)); s = hash_combine(s, hash_uint32(desc->tdtypeid)); for (i = 0; i < desc->natts; ++i) s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid)); return s; } from the hashRowType comment, should we also hash attname and atttypmod?
On 07.02.24 04:06, jian he wrote: > /* > * hashRowType > * > * If two tuple descriptors would be considered equal by equalRowTypes() > * then their hash value will be equal according to this function. > */ > uint32 > hashRowType(TupleDesc desc) > { > uint32 s; > int i; > > s = hash_combine(0, hash_uint32(desc->natts)); > s = hash_combine(s, hash_uint32(desc->tdtypeid)); > for (i = 0; i < desc->natts; ++i) > s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid)); > > return s; > } > > from the hashRowType comment, should we also hash attname and atttypmod? In principle, hashRowType() could process all the fields that equalRowTypes() does. But since it's only a hash function, it doesn't have to be perfect. (This is also the case for the current hashTupleDesc().) I'm not sure where the best tradeoff is.
On 06.02.24 16:14, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> The first want to compare what I call row-type equality, which means >> they want to check specifically for equal number of attributes, and the >> same attribute names, types, and typmods for each attribute. Most >> callers actually want that behavior. > > Should compare attcollation too, no? > > +1 for the general idea, but it seems like "row type equality" > might still be a slightly fuzzy concept. I did another pass across the callers to check what pg_attribute fields might be relevant. Collation definitely needs to be added, certainly for plancache.c, maybe for typcache.c, the other callers don't care. Record types can have attisdropped fields, so it's probably good to check those. I'm suspicious about attndims. Maybe one could create a test case where record types differ only in that. Support for attndims throughout the system is weak, but maybe there is something to check there. On a conceptual level, I figured pg_attribute rows can be divided up into three categories: 1. "row type" stuff: attname, atttypid, atttypmod, attndims, attisdropped, attcollation 2. physical layout stuff: attlen, attcacheoff, attbyval, attalign 3. table metadata stuff (everything else) It's not perfect, and sometimes it's not clear whether these categories inform the implementation or the other way around, but I think it helps conceptualize it.
Attachment
Peter Eisentraut <peter@eisentraut.org> writes: > On 06.02.24 16:14, Tom Lane wrote: >> +1 for the general idea, but it seems like "row type equality" >> might still be a slightly fuzzy concept. > I did another pass across the callers to check what pg_attribute fields > might be relevant. > Collation definitely needs to be added, certainly for plancache.c, maybe > for typcache.c, the other callers don't care. +1 > Record types can have attisdropped fields, so it's probably good to > check those. Yeah, good idea. (In most cases the attname comparison would catch that, but we shouldn't rely on it.) In a perfect world maybe a dropped column should be invisible to this comparison, but we're a very long way from being able to treat it that way. > I'm suspicious about attndims. Maybe one could create a test case where > record types differ only in that. Support for attndims throughout the > system is weak, but maybe there is something to check there. There was a discussion last year[1] about removing attndims altogether, which still seems to me like possibly a good idea. So I doubt we want to consider it as a core semantic field. > On a conceptual level, I figured pg_attribute rows can be divided up > into three categories: > 1. "row type" stuff: attname, atttypid, atttypmod, attndims, > attisdropped, attcollation > 2. physical layout stuff: attlen, attcacheoff, attbyval, attalign I recall some discussion about taking attcacheoff out of this data structure too ... > 3. table metadata stuff (everything else) > It's not perfect, and sometimes it's not clear whether these categories > inform the implementation or the other way around, but I think it helps > conceptualize it. Sure. regards, tom lane [1] https://www.postgresql.org/message-id/flat/ZD%2B14YZ4IUue8Rhi%40gendo.asyd.net
On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > > In principle, hashRowType() could process all the fields that > equalRowTypes() does. But since it's only a hash function, it doesn't > have to be perfect. (This is also the case for the current > hashTupleDesc().) I'm not sure where the best tradeoff is. That's where my confusion comes from. hashRowType is used in record_type_typmod_hash. record_type_typmod_hash is within assign_record_type_typmod. in assign_record_type_typmod: ------------------------------------------------ if (RecordCacheHash == NULL) { /* First time through: initialize the hash table */ HASHCTL ctl; ctl.keysize = sizeof(TupleDesc); /* just the pointer */ ctl.entrysize = sizeof(RecordCacheEntry); ctl.hash = record_type_typmod_hash; ctl.match = record_type_typmod_compare; RecordCacheHash = hash_create("Record information cache", 64, &ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE); /* Also make sure CacheMemoryContext exists */ if (!CacheMemoryContext) CreateCacheMemoryContext(); } /* * Find a hashtable entry for this tuple descriptor. We don't use * HASH_ENTER yet, because if it's missing, we need to make sure that all * the allocations succeed before we create the new entry. */ recentry = (RecordCacheEntry *) hash_search(RecordCacheHash, &tupDesc, HASH_FIND, &found); ------------------------------------------------ based on the comments in hash_create. The above hash_search function would first use record_type_typmod_hash to find out candidate entries in a hash table then use record_type_typmod_compare to compare the given tupDesc with candidate entries. Is this how the hash_search in assign_record_type_typmod works? equalRowTypes processed more fields than hashRowType, hashRowType comments mentioned equalRowTypes, maybe we should have some comments in hashRowType explaining why only hashing natts, tdtypeid, atttypid will be fine.
Hi, I looked at this patch today. I went through all the calls switched to equalRowTypes, and AFAIK all of them are correct - all the places switched to equalRowTypes() only need the weaker checks. There's only two places still calling equalTupleDescs() - relcache certainly needs that, and so does the assert in execReplication(). As for attndims, I agree equalRowTypes() should not check that. We're not really checking that anywhere, it'd be quite weird to start with it here. Especially if the plan is to remove it entirely. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/27/24 12:13, jian he wrote: > On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> >> In principle, hashRowType() could process all the fields that >> equalRowTypes() does. But since it's only a hash function, it doesn't >> have to be perfect. (This is also the case for the current >> hashTupleDesc().) I'm not sure where the best tradeoff is. > > That's where my confusion comes from. > hashRowType is used in record_type_typmod_hash. > record_type_typmod_hash is within assign_record_type_typmod. >> in assign_record_type_typmod: > ------------------------------------------------ > if (RecordCacheHash == NULL) > { > /* First time through: initialize the hash table */ > HASHCTL ctl; > ctl.keysize = sizeof(TupleDesc); /* just the pointer */ > ctl.entrysize = sizeof(RecordCacheEntry); > ctl.hash = record_type_typmod_hash; > ctl.match = record_type_typmod_compare; > RecordCacheHash = hash_create("Record information cache", 64, > &ctl, > HASH_ELEM | HASH_FUNCTION | HASH_COMPARE); > /* Also make sure CacheMemoryContext exists */ > if (!CacheMemoryContext) > CreateCacheMemoryContext(); > } > /* > * Find a hashtable entry for this tuple descriptor. We don't use > * HASH_ENTER yet, because if it's missing, we need to make sure that all > * the allocations succeed before we create the new entry. > */ > recentry = (RecordCacheEntry *) hash_search(RecordCacheHash, > &tupDesc, > HASH_FIND, &found); > ------------------------------------------------ > based on the comments in hash_create. The above hash_search function > would first use > record_type_typmod_hash to find out candidate entries in a hash table > then use record_type_typmod_compare to compare the given tupDesc with > candidate entries. > > Is this how the hash_search in assign_record_type_typmod works? > Yes. > equalRowTypes processed more fields than hashRowType, > hashRowType comments mentioned equalRowTypes, > maybe we should have some comments in hashRowType explaining why only > hashing natts, tdtypeid, atttypid will be fine. > Not sure I understand what the confusion is - omitting fields with little entropy is not uncommon, and collisions are inherent to hash tables, and need to be handled anyway. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 13.03.24 19:43, Tomas Vondra wrote: > I looked at this patch today. I went through all the calls switched to > equalRowTypes, and AFAIK all of them are correct - all the places > switched to equalRowTypes() only need the weaker checks. > > There's only two places still calling equalTupleDescs() - relcache > certainly needs that, and so does the assert in execReplication(). > > As for attndims, I agree equalRowTypes() should not check that. We're > not really checking that anywhere, it'd be quite weird to start with it > here. Especially if the plan is to remove it entirely. Thanks for checking this again. I have committed the patch as it was presented then.