Thread: Towards easier AMs: Cleaning up inappropriate use of name "relkind"
Hackers, The name "relkind" normally refers to a field of type 'char' with values like 'r' for "table" and 'i' for "index". In AlterTableStmtand CreateTableAsStmt, this naming convention was abused for a field of type enum ObjectType. Often, suchfields are named "objtype", though also "kind", "removeType", "renameType", etc. I don't care to debate those other names, though in passing I'll say that "kind" seems not great. The "relkind" name isparticularly bad, though. It is confusing in functions that also operate on a RangeTblEntry object, which also has a fieldnamed "relkind", and is confusing in light of the function get_relkind_objtype() which maps from "relkind" to "objtype",implying quite correctly that those two things are distinct. The attached patch cleans this up. How many toes am I stepping on here? Changing the names was straightforward and doesn'tseem to cause any issues with 'make check-world'. Any objection? For those interested in the larger context of this patch, I am trying to clean up any part of the code that makes it harderto write and test new access methods. When implementing a new AM, one currently needs to `grep -i relkind` to finda long list of files that need special treatment. One then needs to consider whether special logic for the new AM needsto be inserted into all these spots. As such, it is nice if these spots have as little confusing naming as possible. This patch makes that process a little easier. I have another patch (to be posted shortly) that cleans up the#define RELKIND_XXX stuff using a new RelKind enum and special macros while keeping the relkind fields as type 'char'. Along with converting code to use switch(relkind) rather than if (relkind...) statements, the compiler now warnson unhandled cases when you add a new RelKind to the list, making it easier to find all the places you need to update. I decided to keep that work independent of this patch, as the code is logically distinct. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2020-Jun-03, Mark Dilger wrote: > The name "relkind" normally refers to a field of type 'char' with > values like 'r' for "table" and 'i' for "index". In AlterTableStmt > and CreateTableAsStmt, this naming convention was abused for a field > of type enum ObjectType. I agree that "relkind" here is a misnomer, and I bet that what happened here is that the original patch Gavin developed was using the relkind enum from pg_class and was later changed to the OBJECT_ defines after patch review, but the struct member name remained. I don't object to the proposed renaming. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"
From
Daniel Gustafsson
Date:
> On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > The attached patch cleans this up. The gram.y hunks in this patch no longer applies, please submit a rebased version. I'm marking the entry Waiting on Author in the meantime. cheers ./daniel
> On Jul 1, 2020, at 2:45 AM, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > >> The attached patch cleans this up. > > The gram.y hunks in this patch no longer applies, please submit a rebased > version. I'm marking the entry Waiting on Author in the meantime. Rebased patch attached. Thanks for mentioning it! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
> On Jun 3, 2020, at 10:05 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > I have another patch (to be posted shortly) that cleans up the #define RELKIND_XXX stuff using a new RelKind enum andspecial macros while keeping the relkind fields as type 'char'. Along with converting code to use switch(relkind) ratherthan if (relkind...) statements, the compiler now warns on unhandled cases when you add a new RelKind to the list,making it easier to find all the places you need to update. I decided to keep that work independent of this patch,as the code is logically distinct. Most of the work in this patch is mechanical replacement of if/else if/else statements which hinge on relkind to switch statementson relkind. The patch is not philosophically very interesting, but it is fairly long. Reviewers might start byscrolling down the patch to the changes in src/include/catalog/pg_class.h There are no intentional behavioral changes in this patch. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Jul 01, 2020 at 09:46:34AM -0700, Mark Dilger wrote: > Rebased patch attached. Thanks for mentioning it! There are two patches on this thread v2-0001 being much smaller than v2-0002. I have looked at 0001 for now, and, like Alvaro, this renaming makes sense to me. Those commands work on objects that are relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch merged. Any objections from others? -- Michael
Attachment
On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote: > There are two patches on this thread v2-0001 being much smaller than > v2-0002. I have looked at 0001 for now, and, like Alvaro, this > renaming makes sense to me. Those commands work on objects that are > relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch > merged. Any objections from others? I have been through this one again and applied it as cc35d89. -- Michael
Attachment
On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote: > Most of the work in this patch is mechanical replacement of if/else > if/else statements which hinge on relkind to switch statements on > relkind. The patch is not philosophically very interesting, but it > is fairly long. Reviewers might start by scrolling down the patch > to the changes in src/include/catalog/pg_class.h > > There are no intentional behavioral changes in this patch. Please note that 0002 does not apply anymore, there are conflicts in heap.c and tablecmds.c, and that there are noise diffs, likely because you ran pgindent and included places unrelated to this patch. Anyway, I am not really a fan of this patch. I could see a benefit in switching to an enum so as for places where we use a switch/case without a default we would be warned if a new relkind gets added or if a value is not covered, but then we should not really need RELKIND_NULL, no? -- Michael
Attachment
> On Jul 10, 2020, at 9:44 PM, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote: >> There are two patches on this thread v2-0001 being much smaller than >> v2-0002. I have looked at 0001 for now, and, like Alvaro, this >> renaming makes sense to me. Those commands work on objects that are >> relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch >> merged. Any objections from others? > > I have been through this one again and applied it as cc35d89. > -- > Michael Thanks for committing! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote: >> Most of the work in this patch is mechanical replacement of if/else >> if/else statements which hinge on relkind to switch statements on >> relkind. The patch is not philosophically very interesting, but it >> is fairly long. Reviewers might start by scrolling down the patch >> to the changes in src/include/catalog/pg_class.h >> >> There are no intentional behavioral changes in this patch. > > Please note that 0002 does not apply anymore, there are conflicts in > heap.c and tablecmds.c, and that there are noise diffs, likely because > you ran pgindent and included places unrelated to this patch. I can resubmit, but should like to address your second point before bothering... > Anyway, > I am not really a fan of this patch. I could see a benefit in > switching to an enum so as for places where we use a switch/case > without a default we would be warned if a new relkind gets added or if > a value is not covered, but then we should not really need > RELKIND_NULL, no? There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions. This happens in allpaths.c: set_append_rel_size rewriteHandler.c: view_query_is_auto_updatable lockcmds.c: LockViewRecurse_walker pg_depend.c: getOwnedSequences_internal Doesn't this justify having RELKIND_NULL in the enum? It is not the purpose of this patch to change the behavior of the code. This is just a structural patch, using an enum andswitches rather than char and if/else if/else blocks. Subsequent patches could build on this work, such as changing the behavior when code encounters a relkind value outside thecode's expected set of relkind values. Whether those patches would add Assert()s, elog()s, or ereport()s is not somethingI'd like to have to debate as part of this patch submission. Assert()s have the advantage of costing nothing inproduction builds, but elog()s have the advantage of protecting against corrupt relkind values at runtime in production. Getting the compiler to warn when a new relkind is added to the enumeration but not handled in a switch is difficult. Onestrategy is to add -Wswitch-enum, but that would require refactoring switches over all enums, not just over the RelKindenum, and for some enums, that would require a large number of extra lines to be added to the code. Another strategyis to remove the default label from switches over RelKind, but that removes protections against invalid relkindsbeing encountered. Do you have a preference about which directions I should pursue? Or do you think the patch idea itself is dead? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: > On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote: >> I am not really a fan of this patch. I could see a benefit in >> switching to an enum so as for places where we use a switch/case >> without a default we would be warned if a new relkind gets added or if >> a value is not covered, but then we should not really need >> RELKIND_NULL, no? > There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions. This happens in > allpaths.c: set_append_rel_size > rewriteHandler.c: view_query_is_auto_updatable > lockcmds.c: LockViewRecurse_walker > pg_depend.c: getOwnedSequences_internal > Doesn't this justify having RELKIND_NULL in the enum? I'd say no. I think including an intentionally invalid value in such an enum is horrid, mainly because it will force a lot of places to cover that value when they shouldn't (or else draw "enum value not handled in switch" warnings). The confusion factor about whether it maybe *is* a valid value is not to be discounted, either. If we can't readily get rid of the use of '\0' in these code paths, maybe trying to convert to an enum isn't going to be a win after all. > Getting the compiler to warn when a new relkind is added to the > enumeration but not handled in a switch is difficult. We already have a project policy about how to do that. regards, tom lane
On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote: > Mark Dilger <mark.dilger@enterprisedb.com> writes: >> There are code paths where relkind is sometimes '\0' under normal, >> non-exceptional conditions. This happens in >> >> allpaths.c: set_append_rel_size >> rewriteHandler.c: view_query_is_auto_updatable >> lockcmds.c: LockViewRecurse_walker >> pg_depend.c: getOwnedSequences_internal There are more code paths than what's mentioned upthread when it comes to relkinds and \0. For example, I can quickly grep for acl.c that relies on get_rel_relkind() returning \0 when the relkind cannot be found. And we do that for get_typtype() as well in the syscache. >> Doesn't this justify having RELKIND_NULL in the enum? > > I'd say no. I think including an intentionally invalid value in such > an enum is horrid, mainly because it will force a lot of places to cover > that value when they shouldn't (or else draw "enum value not handled in > switch" warnings). The confusion factor about whether it maybe *is* > a valid value is not to be discounted, either. I agree here that the situation could be improved because we never store this value in the catalogs. Perhaps there would be a benefit in switching to an enum in the long run, I am not sure. But if we do so, RELKIND_NULL should not be around. -- Michael
Attachment
> On Jul 12, 2020, at 4:59 AM, Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote: >> Mark Dilger <mark.dilger@enterprisedb.com> writes: >>> There are code paths where relkind is sometimes '\0' under normal, >>> non-exceptional conditions. This happens in >>> >>> allpaths.c: set_append_rel_size >>> rewriteHandler.c: view_query_is_auto_updatable >>> lockcmds.c: LockViewRecurse_walker >>> pg_depend.c: getOwnedSequences_internal > > There are more code paths than what's mentioned upthread when it comes > to relkinds and \0. For example, I can quickly grep for acl.c that > relies on get_rel_relkind() returning \0 when the relkind cannot be > found. And we do that for get_typtype() as well in the syscache. I was thinking about places in the code that test a relkind variable against a list of values, rather than places that returna relkind to callers, though certainly those two things are related. It's possible that I've missed some places inthe code where \0 might be encountered, but I've added Asserts against unexpected values in v3. I left get_rel_relkind() as is. There does not seem to be anything wrong with it returning \0 as long as all callers areprepared to deal with that result. > >>> Doesn't this justify having RELKIND_NULL in the enum? >> >> I'd say no. I think including an intentionally invalid value in such >> an enum is horrid, mainly because it will force a lot of places to cover >> that value when they shouldn't (or else draw "enum value not handled in >> switch" warnings). The confusion factor about whether it maybe *is* >> a valid value is not to be discounted, either. > > I agree here that the situation could be improved because we never > store this value in the catalogs. Perhaps there would be a benefit in > switching to an enum in the long run, I am not sure. But if we do so, > RELKIND_NULL should not be around. In the v3 patch, I have removed RELKIND_NULL from the enum, and also removed default: labels from switches over RelKind. The patch is also rebased. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
This patch is way too large. Probably in part explained by the fact that you seem to have run pgindent and absorbed a lot of unrelated changes. This makes the patch essentially unreviewable. I think you should define a RelationGetRelkind() static function that returns the appropriate datatype without requiring a cast and assert in every single place that processes a relation's relkind. Similarly you've chosen to leave get_rel_relkind untouched, but that seems unwise. I think the chr_ macros are pointless. Reading back the thread, it seems that the whole point of your patch was to change the tests that currently use 'if' tests to switch blocks. I cannot understand what's the motivation for that, but it appears to me that the approach is backwards: I'd counsel to *first* change the APIs (get_rel_relkind and defining an enum, plus adding RelationGetRelkind) so that everything is more sensible and safe, including appropriate answers for the places where an "invalid" relkind is returned; and once that's in place, replace if tests with switch blocks where it makes sense to do so. Also, I suggest that this thread is not a good one for this patch. Subject is entirely not appropriate. Open a new thread perhaps? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Jul 14, 2020, at 4:12 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > This patch is way too large. Probably in part explained by the fact > that you seem to have run pgindent and absorbed a lot of unrelated > changes. This makes the patch essentially unreviewable. I did not run pgindent, but when changing if (relkind == RELKIND_INDEX) { /* foo */ } to switch (relkind) { case RELKIND_INDEX: /* foo */ } the indentation of /* foo */ changes. For large foo, that results in a lot of lines. There are also cases in the code wherecomparisons of multiple variables are mixed together. To split those out into switch/case statements I had to rearrangesome of the code blocks. > I think you should define a RelationGetRelkind() static function that > returns the appropriate datatype without requiring a cast and assert in > every single place that processes a relation's relkind. Similarly > you've chosen to leave get_rel_relkind untouched, but that seems unwise. I was well aware of how large the patch had gotten, and didn't want to add more.... > I think the chr_ macros are pointless. Look more closely at the #define RelKindAsString(x) CppAsString2(chr_##x) > Reading back the thread, it seems that the whole point of your patch was > to change the tests that currently use 'if' tests to switch blocks. I > cannot understand what's the motivation for that, There might not be sufficient motivation to make the patch worth doing. The motivation was to leverage the project's recentaddition of -Wswitch to make it easier to know which code needs updating when you add a new relkind. That doesn'thappen very often, but I was working towards that kind of thing, and thought this might be a good prerequisite patchfor that work. Stylistically, I also prefer + switch ((RelKind) rel->rd_rel->relkind) + { + case RELKIND_RELATION: + case RELKIND_MATVIEW: + case RELKIND_TOASTVALUE: over - if (rel->rd_rel->relkind == RELKIND_RELATION || - rel->rd_rel->relkind == RELKIND_MATVIEW || - rel->rd_rel->relkind == RELKIND_TOASTVALUE) which is a somewhat common pattern in the code. It takes less mental effort to see that only one variable is being comparedagainst those three enum values. In some cases, though not necessarily this exact example, it also *might* saveduplicated work computing the variable, depending on the situation and what the compiler can optimize away. > but it appears to me > that the approach is backwards: I'd counsel to *first* change the APIs > (get_rel_relkind and defining an enum, plus adding RelationGetRelkind) > so that everything is more sensible and safe, including appropriate > answers for the places where an "invalid" relkind is returned; Ok. > and once > that's in place, replace if tests with switch blocks where it makes > sense to do so. Ok. > > Also, I suggest that this thread is not a good one for this patch. > Subject is entirely not appropriate. Open a new thread perhaps? I've changed the subject line. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company