Thread: hiding variable-length fields from Form_pg_* structs
It would be helpful if variable length catalog fields (except the first one) would not be visible on the C level in the Form_pg_* structs. We keep them listed in the include/catalog/pg_*.h files so that the BKI generating code can see them and for general documentation, but the fields are meaningless in C, and some catalog files contain free-form comments advising the reader of that. If we could hide them somehow, we would avoid random programming bugs, deconfuse compilers trying to generate useful warnings, and save occasional stack space. There are several known cases of the first and second issue, at least. I haven't found the ideal way to implement that yet, but the general idea would be something like: CATALOG(pg_attribute,1249) BKI_BOOTSTRAP ... { ... int4 attinhcount; Oid attcollation; aclitem attacl[1]; CATVARLEN( text attoptions[1]; text attfdwoptions[1]; ) } FormData_pg_attribute; where CATVARLEN is defined empty in C, and ignored in the BKI generation code. The real trick is to find something that handles well with pgindent and indenting text editors. Ideas?
Peter Eisentraut <peter_e@gmx.net> writes: > CATALOG(pg_attribute,1249) BKI_BOOTSTRAP ... > { > ... > int4 attinhcount; > Oid attcollation; > aclitem attacl[1]; > CATVARLEN( > text attoptions[1]; > text attfdwoptions[1]; > ) > } FormData_pg_attribute; > where CATVARLEN is defined empty in C, and ignored in the BKI generation > code. > The real trick is to find something that handles well with pgindent and > indenting text editors. The low-tech way would be CATALOG(pg_attribute,1249) BKI_BOOTSTRAP ... { ... int4 attinhcount; Oid attcollation; aclitem attacl[1]; #ifdef CATALOG_VARLEN_FIELDS text attoptions[1]; text attfdwoptions[1]; #endif } FormData_pg_attribute; regards, tom lane
On sön, 2011-11-27 at 18:20 -0500, Tom Lane wrote: > The low-tech way would be > > CATALOG(pg_attribute,1249) BKI_BOOTSTRAP ... > { > ... > int4 attinhcount; > Oid attcollation; > aclitem attacl[1]; > #ifdef CATALOG_VARLEN_FIELDS > text attoptions[1]; > text attfdwoptions[1]; > #endif > } FormData_pg_attribute; Good enough. To clarify, I believe the rule is that the first variable-length field can be accessed as a struct field. Are there any exceptions to this? This kind of comment is pretty confusing: CATALOG(pg_rewrite,2618) { NameData rulename; Oid ev_class; int2 ev_attr; char ev_type; char ev_enabled; bool is_instead; /* NB: remaining fields must be accessed via heap_getattr */ pg_node_tree ev_qual; pg_node_tree ev_action; } FormData_pg_rewrite; Also, this code in relcache.c accesses indclass, which is after an int2vector and an oidvector field: /* Check to see if it is a unique, non-partial btree index on OID */ if (index->indnatts == 1 && index->indisunique&& index->indimmediate && index->indkey.values[0] == ObjectIdAttributeNumber && index->indclass.values[0]== OID_BTREE_OPS_OID && heap_attisnull(htup, Anum_pg_index_indpred)) oidIndex = index->indexrelid;
Peter Eisentraut <peter_e@gmx.net> writes: > To clarify, I believe the rule is that the first variable-length field > can be accessed as a struct field. Are there any exceptions to this? If it is known not null, yes, but I wonder just how many places actually depend on that. It might be better to remove all varlena fields from C visibility and require use of the accessor functions. We should at least look into what that would cost us. > Also, this code in relcache.c accesses indclass, which is after an > int2vector and an oidvector field: > /* Check to see if it is a unique, non-partial btree index on OID */ > if (index->indnatts == 1 && > index->indisunique && index->indimmediate && > index->indkey.values[0] == ObjectIdAttributeNumber && > index->indclass.values[0] == OID_BTREE_OPS_OID && > heap_attisnull(htup, Anum_pg_index_indpred)) > oidIndex = index->indexrelid; Hmm, that does look mighty broken, doesn't it ... but somehow it works, else GetNewOid would be bleating all the time. (Thinks about that for a bit) Oh, it accidentally fails to fail because the C declarations for int2vector and oidvector are actually correct if there is a single element in the arrays, which we already verified with the indnatts test. But yeah, this seems horribly fragile, and it should not be performance critical because we only go through here when loading up a cache entry. So let's change it. regards, tom lane
On Mon, Dec 5, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> To clarify, I believe the rule is that the first variable-length field >> can be accessed as a struct field. Are there any exceptions to this? > > If it is known not null, yes, but I wonder just how many places actually > depend on that. It might be better to remove all varlena fields from C > visibility and require use of the accessor functions. We should at > least look into what that would cost us. My impression is that all the varlena fields also allow nulls. So I think there's no point in trying to keep the first one C-accessible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Dec 5, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> To clarify, I believe the rule is that the first variable-length field >>> can be accessed as a struct field. Are there any exceptions to this? >> If it is known not null, yes, but I wonder just how many places actually >> depend on that. > My impression is that all the varlena fields also allow nulls. See MARKNOTNULL in bootstrap.c. I think the exceptions were designed to protect direct accesses to pg_index. regards, tom lane
On Mon, Dec 5, 2011 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Dec 5, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Peter Eisentraut <peter_e@gmx.net> writes: >>>> To clarify, I believe the rule is that the first variable-length field >>>> can be accessed as a struct field. Are there any exceptions to this? > >>> If it is known not null, yes, but I wonder just how many places actually >>> depend on that. > >> My impression is that all the varlena fields also allow nulls. > > See MARKNOTNULL in bootstrap.c. I think the exceptions were designed to > protect direct accesses to pg_index. Hmm, OK. rhaas=# select r.relname, a.attname, a.atttypid::regtype from pg_class r, pg_attribute a where relnamespace=11 and relkind='r' and attrelid = r.oid and a.attnotnull and a.attlen<0; relname | attname | atttypid ------------+--------------+------------pg_proc | proargtypes | oidvectorpg_index | indkey | int2vectorpg_index | indcollation | oidvectorpg_index | indclass | oidvectorpg_index | indoption | int2vectorpg_trigger| tgattr | int2vector (6 rows) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
So here is a patch for that. There are a few cases that break when hiding all variable length fields: Access to indclass in relcache.c, as discussed upthread, which should be fixed. Access to pg_largeobject.data. This is apparently OK, per comment in inv_api.c. Access to pg_proc.proargtypes in various places. This is clearly useful, so we'll keep it visible. So I think the relcache.c thing should be fixed and then this might be good to go.
Attachment
Peter Eisentraut <peter_e@gmx.net> writes: > So I think the relcache.c thing should be fixed and then this might be > good to go. Cosmetic gripes: I think we could get rid of the various comments that say things like "variable length fields start here", since the #ifdef CATALOG_VARLEN lines now represent that in a standardized fashion. Possibly those lines should be #ifdef CATALOG_VARLEN /* variable-length fields start here */ to be even clearer. What would be appropriate to add instead of those inconsistently-used comments is explicit comments about the exception cases such as proargtypes, to make it clear that the placement of the #ifdef CATALOG_VARLEN is intentional and not a bug in those cases. regards, tom lane
On mån, 2012-01-09 at 14:04 -0500, Tom Lane wrote: > #ifdef CATALOG_VARLEN /* variable-length fields start here > */ > > to be even clearer. > > What would be appropriate to add instead of those inconsistently-used > comments is explicit comments about the exception cases such as > proargtypes, to make it clear that the placement of the #ifdef > CATALOG_VARLEN is intentional and not a bug in those cases. > I implemented your suggestions in the attached patch.
Attachment
Peter Eisentraut <peter_e@gmx.net> writes: > On mån, 2012-01-09 at 14:04 -0500, Tom Lane wrote: >> #ifdef CATALOG_VARLEN /* variable-length fields start here >> */ >> >> to be even clearer. >> >> What would be appropriate to add instead of those inconsistently-used >> comments is explicit comments about the exception cases such as >> proargtypes, to make it clear that the placement of the #ifdef >> CATALOG_VARLEN is intentional and not a bug in those cases. > I implemented your suggestions in the attached patch. This looks ready to go to me, except for one trivial nit: in pg_extension.h, please keep the comment pointing out that extversion should never be null. regards, tom lane