Thread: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

03 June 2020, 17:05:26

Hackers,

The name "relkind" normally refers to a field of type 'char' with values like 'r' for "table" and 'i' for "index". In
AlterTableStmtand CreateTableAsStmt, this naming convention was abused for a field of type enum ObjectType. Often,
suchfields are named "objtype", though also "kind", "removeType", "renameType", etc.

I don't care to debate those other names, though in passing I'll say that "kind" seems not great. The "relkind" name
isparticularly bad, though. It is confusing in functions that also operate on a RangeTblEntry object, which also has a
fieldnamed "relkind", and is confusing in light of the function get_relkind_objtype() which maps from "relkind" to
"objtype",implying quite correctly that those two things are distinct.

The attached patch cleans this up. How many toes am I stepping on here? Changing the names was straightforward and
doesn'tseem to cause any issues with 'make check-world'. Any objection?

For those interested in the larger context of this patch, I am trying to clean up any part of the code that makes it
harderto write and test new access methods. When implementing a new AM, one currently needs to `grep -i relkind` to
finda long list of files that need special treatment. One then needs to consider whether special logic for the new AM
needsto be inserted into all these spots. As such, it is nice if these spots have as little confusing naming as
possible. This patch makes that process a little easier. I have another patch (to be posted shortly) that cleans up
the#define RELKIND_XXX stuff using a new RelKind enum and special macros while keeping the relkind fields as type
'char'. Along with converting code to use switch(relkind) rather than if (relkind...) statements, the compiler now
warnson unhandled cases when you add a new RelKind to the list, making it easier to find all the places you need to
update. I decided to keep that work independent of this patch, as the code is logically distinct.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v1-0001-Renaming-relkind-as-objtype-where-appropriate.patch

Re: Towards easier AMs: Cleaning up inappropriate use of name"relkind"

From

Alvaro Herrera

Date:

03 June 2020, 17:26:28

On 2020-Jun-03, Mark Dilger wrote:

> The name "relkind" normally refers to a field of type 'char' with
> values like 'r' for "table" and 'i' for "index".  In AlterTableStmt
> and CreateTableAsStmt, this naming convention was abused for a field
> of type enum ObjectType.

I agree that "relkind" here is a misnomer, and I bet that what happened
here is that the original patch Gavin developed was using the relkind
enum from pg_class and was later changed to the OBJECT_ defines after
patch review, but the struct member name remained.  I don't object to
the proposed renaming.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Daniel Gustafsson

Date:

01 July 2020, 09:45:06

> On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

> The attached patch cleans this up.

The gram.y hunks in this patch no longer applies, please submit a rebased
version.  I'm marking the entry Waiting on Author in the meantime.

cheers ./daniel

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

01 July 2020, 16:46:34


> On Jul 1, 2020, at 2:45 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
> 
>> On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> 
>> The attached patch cleans this up.
> 
> The gram.y hunks in this patch no longer applies, please submit a rebased
> version.  I'm marking the entry Waiting on Author in the meantime.

Rebased patch attached.  Thanks for mentioning it!




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v2-0001-Renaming-relkind-as-objtype-where-appropriate.patch

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

02 July 2020, 00:04:19

> On Jun 3, 2020, at 10:05 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>  I have another patch (to be posted shortly) that cleans up the #define RELKIND_XXX stuff using a new RelKind enum
andspecial macros while keeping the relkind fields as type 'char'.  Along with converting code to use switch(relkind)
ratherthan if (relkind...) statements, the compiler now warns on unhandled cases when you add a new RelKind to the
list,making it easier to find all the places you need to update.  I decided to keep that work independent of this
patch,as the code is logically distinct. 

Most of the work in this patch is mechanical replacement of if/else if/else statements which hinge on relkind to switch
statementson relkind.  The patch is not philosophically very interesting, but it is fairly long.  Reviewers might start
byscrolling down the patch to the changes in src/include/catalog/pg_class.h 

There are no intentional behavioral changes in this patch.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v2-0002-Refactoring-relkind-handling.patch

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Michael Paquier

Date:

08 July 2020, 13:00:47

On Wed, Jul 01, 2020 at 09:46:34AM -0700, Mark Dilger wrote:
> Rebased patch attached.  Thanks for mentioning it!

There are two patches on this thread v2-0001 being much smaller than
v2-0002.  I have looked at 0001 for now, and, like Alvaro, this
renaming makes sense to me.  Those commands work on objects that are
relkinds, except for one OBJECT_TYPE.  So, let's get 0001 patch
merged.  Any objections from others?
--
Michael

Attachment

signature.asc

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Michael Paquier

Date:

11 July 2020, 04:44:01

On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
> There are two patches on this thread v2-0001 being much smaller than
> v2-0002.  I have looked at 0001 for now, and, like Alvaro, this
> renaming makes sense to me.  Those commands work on objects that are
> relkinds, except for one OBJECT_TYPE.  So, let's get 0001 patch
> merged.  Any objections from others?

I have been through this one again and applied it as cc35d89.
--
Michael

Attachment

signature.asc

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Michael Paquier

Date:

11 July 2020, 06:00:32

On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote:
> Most of the work in this patch is mechanical replacement of if/else
> if/else statements which hinge on relkind to switch statements on
> relkind.  The patch is not philosophically very interesting, but it
> is fairly long.  Reviewers might start by scrolling down the patch
> to the changes in src/include/catalog/pg_class.h
>
> There are no intentional behavioral changes in this patch.

Please note that 0002 does not apply anymore, there are conflicts in
heap.c and tablecmds.c, and that there are noise diffs, likely because
you ran pgindent and included places unrelated to this patch.  Anyway,
I am not really a fan of this patch.  I could see a benefit in
switching to an enum so as for places where we use a switch/case
without a default we would be warned if a new relkind gets added or if
a value is not covered, but then we should not really need
RELKIND_NULL, no?
--
Michael

Attachment

signature.asc

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

11 July 2020, 16:23:34


> On Jul 10, 2020, at 9:44 PM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
>> There are two patches on this thread v2-0001 being much smaller than
>> v2-0002.  I have looked at 0001 for now, and, like Alvaro, this
>> renaming makes sense to me.  Those commands work on objects that are
>> relkinds, except for one OBJECT_TYPE.  So, let's get 0001 patch
>> merged.  Any objections from others?
>
> I have been through this one again and applied it as cc35d89.
> --
> Michael

Thanks for committing!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

11 July 2020, 19:14:11

> On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote:
>> Most of the work in this patch is mechanical replacement of if/else
>> if/else statements which hinge on relkind to switch statements on
>> relkind.  The patch is not philosophically very interesting, but it
>> is fairly long.  Reviewers might start by scrolling down the patch
>> to the changes in src/include/catalog/pg_class.h
>>
>> There are no intentional behavioral changes in this patch.
>
> Please note that 0002 does not apply anymore, there are conflicts in
> heap.c and tablecmds.c, and that there are noise diffs, likely because
> you ran pgindent and included places unrelated to this patch.

I can resubmit, but should like to address your second point before bothering...

> Anyway,
> I am not really a fan of this patch.  I could see a benefit in
> switching to an enum so as for places where we use a switch/case
> without a default we would be warned if a new relkind gets added or if
> a value is not covered, but then we should not really need
> RELKIND_NULL, no?

There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions.  This happens in

    allpaths.c: set_append_rel_size
    rewriteHandler.c: view_query_is_auto_updatable
    lockcmds.c: LockViewRecurse_walker
    pg_depend.c: getOwnedSequences_internal

Doesn't this justify having RELKIND_NULL in the enum?

It is not the purpose of this patch to change the behavior of the code.  This is just a structural patch, using an enum
andswitches rather than char and if/else if/else blocks. 

Subsequent patches could build on this work, such as changing the behavior when code encounters a relkind value outside
thecode's expected set of relkind values.  Whether those patches would add Assert()s, elog()s, or ereport()s is not
somethingI'd like to have to debate as part of this patch submission.  Assert()s have the advantage of costing nothing
inproduction builds, but elog()s have the advantage of protecting against corrupt relkind values at runtime in
production.

Getting the compiler to warn when a new relkind is added to the enumeration but not handled in a switch is difficult.
Onestrategy is to add -Wswitch-enum, but that would require refactoring switches over all enums, not just over the
RelKindenum, and for some enums, that would require a large number of extra lines to be added to the code.  Another
strategyis to remove the default label from switches over RelKind, but that removes protections against invalid
relkindsbeing encountered. 

Do you have a preference about which directions I should pursue?  Or do you think the patch idea itself is dead?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Tom Lane

Date:

11 July 2020, 19:32:55

Mark Dilger <mark.dilger@enterprisedb.com> writes:
> On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote:
>> I am not really a fan of this patch.  I could see a benefit in
>> switching to an enum so as for places where we use a switch/case
>> without a default we would be warned if a new relkind gets added or if
>> a value is not covered, but then we should not really need
>> RELKIND_NULL, no?

> There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions.  This happens in

>     allpaths.c: set_append_rel_size
>     rewriteHandler.c: view_query_is_auto_updatable
>     lockcmds.c: LockViewRecurse_walker
>     pg_depend.c: getOwnedSequences_internal

> Doesn't this justify having RELKIND_NULL in the enum?

I'd say no.  I think including an intentionally invalid value in such
an enum is horrid, mainly because it will force a lot of places to cover
that value when they shouldn't (or else draw "enum value not handled in
switch" warnings).  The confusion factor about whether it maybe *is*
a valid value is not to be discounted, either.

If we can't readily get rid of the use of '\0' in these code paths,
maybe trying to convert to an enum isn't going to be a win after all.

> Getting the compiler to warn when a new relkind is added to the
> enumeration but not handled in a switch is difficult.

We already have a project policy about how to do that.

            regards, tom lane

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Michael Paquier

Date:

12 July 2020, 11:59:39

On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote:
> Mark Dilger <mark.dilger@enterprisedb.com> writes:
>> There are code paths where relkind is sometimes '\0' under normal,
>> non-exceptional conditions.  This happens in
>>
>>     allpaths.c: set_append_rel_size
>>     rewriteHandler.c: view_query_is_auto_updatable
>>     lockcmds.c: LockViewRecurse_walker
>>     pg_depend.c: getOwnedSequences_internal

There are more code paths than what's mentioned upthread when it comes
to relkinds and \0.  For example, I can quickly grep for acl.c that
relies on get_rel_relkind() returning \0 when the relkind cannot be
found.  And we do that for get_typtype() as well in the syscache.

>> Doesn't this justify having RELKIND_NULL in the enum?
>
> I'd say no.  I think including an intentionally invalid value in such
> an enum is horrid, mainly because it will force a lot of places to cover
> that value when they shouldn't (or else draw "enum value not handled in
> switch" warnings).  The confusion factor about whether it maybe *is*
> a valid value is not to be discounted, either.

I agree here that the situation could be improved because we never
store this value in the catalogs.  Perhaps there would be a benefit in
switching to an enum in the long run, I am not sure.  But if we do so,
RELKIND_NULL should not be around.
--
Michael

Attachment

signature.asc

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Mark Dilger

Date:

14 July 2020, 22:28:27


> On Jul 12, 2020, at 4:59 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote:
>> Mark Dilger <mark.dilger@enterprisedb.com> writes:
>>> There are code paths where relkind is sometimes '\0' under normal,
>>> non-exceptional conditions.  This happens in
>>>
>>>     allpaths.c: set_append_rel_size
>>>     rewriteHandler.c: view_query_is_auto_updatable
>>>     lockcmds.c: LockViewRecurse_walker
>>>     pg_depend.c: getOwnedSequences_internal
>
> There are more code paths than what's mentioned upthread when it comes
> to relkinds and \0.  For example, I can quickly grep for acl.c that
> relies on get_rel_relkind() returning \0 when the relkind cannot be
> found.  And we do that for get_typtype() as well in the syscache.

I was thinking about places in the code that test a relkind variable against a list of values, rather than places that
returna relkind to callers, though certainly those two things are related.  It's possible that I've missed some places
inthe code where \0 might be encountered, but I've added Asserts against unexpected values in v3. 

I left get_rel_relkind() as is.  There does not seem to be anything wrong with it returning \0 as long as all callers
areprepared to deal with that result. 

>
>>> Doesn't this justify having RELKIND_NULL in the enum?
>>
>> I'd say no.  I think including an intentionally invalid value in such
>> an enum is horrid, mainly because it will force a lot of places to cover
>> that value when they shouldn't (or else draw "enum value not handled in
>> switch" warnings).  The confusion factor about whether it maybe *is*
>> a valid value is not to be discounted, either.
>
> I agree here that the situation could be improved because we never
> store this value in the catalogs.  Perhaps there would be a benefit in
> switching to an enum in the long run, I am not sure.  But if we do so,
> RELKIND_NULL should not be around.

In the v3 patch, I have removed RELKIND_NULL from the enum, and also removed default: labels from switches over
RelKind. The patch is also rebased. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v3-0001-Refactoring-relkind-handling.patch

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

From

Alvaro Herrera

Date:

14 July 2020, 23:12:25

This patch is way too large.  Probably in part explained by the fact
that you seem to have run pgindent and absorbed a lot of unrelated
changes.  This makes the patch essentially unreviewable.

I think you should define a RelationGetRelkind() static function that
returns the appropriate datatype without requiring a cast and assert in
every single place that processes a relation's relkind.  Similarly
you've chosen to leave get_rel_relkind untouched, but that seems unwise.

I think the chr_ macros are pointless.

Reading back the thread, it seems that the whole point of your patch was
to change the tests that currently use 'if' tests to switch blocks.  I
cannot understand what's the motivation for that, but it appears to me
that the approach is backwards: I'd counsel to *first* change the APIs
(get_rel_relkind and defining an enum, plus adding RelationGetRelkind)
so that everything is more sensible and safe, including appropriate
answers for the places where an "invalid" relkind is returned; and once
that's in place, replace if tests with switch blocks where it makes
sense to do so.

Also, I suggest that this thread is not a good one for this patch.
Subject is entirely not appropriate.  Open a new thread perhaps?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Refactoring relkind as an enum

From

Mark Dilger

Date:

15 July 2020, 00:15:50

> On Jul 14, 2020, at 4:12 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> This patch is way too large.  Probably in part explained by the fact
> that you seem to have run pgindent and absorbed a lot of unrelated
> changes.  This makes the patch essentially unreviewable.

I did not run pgindent, but when changing

    if (relkind == RELKIND_INDEX)
    {
        /* foo */
    }

to

    switch (relkind)
    {
        case RELKIND_INDEX:
            /* foo */
    }

the indentation of /* foo */ changes.  For large foo, that results in a lot of lines.  There are also cases in the code
wherecomparisons of multiple variables are mixed together.  To split those out into switch/case statements I had to
rearrangesome of the code blocks. 

> I think you should define a RelationGetRelkind() static function that
> returns the appropriate datatype without requiring a cast and assert in
> every single place that processes a relation's relkind.  Similarly
> you've chosen to leave get_rel_relkind untouched, but that seems unwise.

I was well aware of how large the patch had gotten, and didn't want to add more....

> I think the chr_ macros are pointless.

Look more closely at the #define RelKindAsString(x) CppAsString2(chr_##x)

> Reading back the thread, it seems that the whole point of your patch was
> to change the tests that currently use 'if' tests to switch blocks.  I
> cannot understand what's the motivation for that,

There might not be sufficient motivation to make the patch worth doing.  The motivation was to leverage the project's
recentaddition of -Wswitch to make it easier to know which code needs updating when you add a new relkind.  That
doesn'thappen very often, but I was working towards that kind of thing, and thought this might be a good prerequisite
patchfor that work.  Stylistically, I also prefer 

+       switch ((RelKind) rel->rd_rel->relkind)
+       {
+               case RELKIND_RELATION:
+               case RELKIND_MATVIEW:
+               case RELKIND_TOASTVALUE:

over

-       if (rel->rd_rel->relkind == RELKIND_RELATION ||
-                 rel->rd_rel->relkind == RELKIND_MATVIEW ||
-                 rel->rd_rel->relkind == RELKIND_TOASTVALUE)

which is a somewhat common pattern in the code.  It takes less mental effort to see that only one variable is being
comparedagainst those three enum values.  In some cases, though not necessarily this exact example, it also *might*
saveduplicated work computing the variable, depending on the situation and what the compiler can optimize away. 

> but it appears to me
> that the approach is backwards: I'd counsel to *first* change the APIs
> (get_rel_relkind and defining an enum, plus adding RelationGetRelkind)
> so that everything is more sensible and safe, including appropriate
> answers for the places where an "invalid" relkind is returned;

Ok.

> and once
> that's in place, replace if tests with switch blocks where it makes
> sense to do so.

Ok.

>
> Also, I suggest that this thread is not a good one for this patch.
> Subject is entirely not appropriate.  Open a new thread perhaps?

I've changed the subject line.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company