Thread: small pg_dump code cleanup

small pg_dump code cleanup

From

Nathan Bossart

Date:

05 June 2024, 15:13:57

While reviewing Daniel's pg_dump patch [0], I was initially confused
because the return value of getTypes() isn't saved anywhere.  Once I found
commit 92316a4, I realized that data was actually stored in a separate hash
table.  In fact, many of the functions in this area don't actually need to
return anything, so we can trim some code and hopefully reduce confusion a
bit.  Patch attached.

[0] https://postgr.es/m/8F1F1E1D-D17B-4B33-B014-EDBCD15F3F0B%40yesql.se

-- 
nathan

Attachment

v1-0001-Trim-some-unnecessary-pg_dump-code.patch

Re: small pg_dump code cleanup

From

Neil Conway

Date:

05 June 2024, 16:22:03

On Wed, Jun 5, 2024 at 11:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

In fact, many of the functions in this area don't actually need to

return anything, so we can trim some code and hopefully reduce confusion a
bit. Patch attached.

Nice cleanup! Two minor comments:

(1) Names like `getXXX` for these functions suggest to me that they return a value, rather than side-effecting. I realize some variants continue to return a value, but the majority no longer do. Perhaps a name like lookupXXX() or readXXX() would be clearer?

(2) These functions malloc() a single ntups * sizeof(struct) allocation and then index into it to fill-in each struct before entering it into the hash table. It might be more straightforward to just malloc each individual struct.

Neil

Re: small pg_dump code cleanup

From

Nathan Bossart

Date:

05 June 2024, 16:37:54

On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:
> Nice cleanup! Two minor comments:

Thanks for taking a look.

> (1) Names like `getXXX` for these functions suggest to me that they return
> a value, rather than side-effecting. I realize some variants continue to
> return a value, but the majority no longer do. Perhaps a name like
> lookupXXX() or readXXX() would be clearer?

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
> then index into it to fill-in each struct before entering it into the hash
> table. It might be more straightforward to just malloc each individual
> struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios.  At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

-- 
nathan

Re: small pg_dump code cleanup

From

Neil Conway

Date:

05 June 2024, 17:51:08

On Wed, Jun 5, 2024 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

sgtm.

> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
> then index into it to fill-in each struct before entering it into the hash
> table. It might be more straightforward to just malloc each individual
> struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

As you say, I'd be surprised if the performance difference is noticeable. Personally I don't think the marginal performance win justifies the hit to readability, but I don't feel strongly about it.

Neil

Re: small pg_dump code cleanup

From

Tom Lane

Date:

05 June 2024, 17:58:54

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:
>> (1) Names like `getXXX` for these functions suggest to me that they return
>> a value, rather than side-effecting. I realize some variants continue to
>> return a value, but the majority no longer do. Perhaps a name like
>> lookupXXX() or readXXX() would be clearer?

> What about collectXXX() to match similar functions in pg_dump.c (e.g.,
> collectRoleNames(), collectComments(), collectSecLabels())?

Personally I see nothing much wrong with leaving them as getXXX.

>> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
>> then index into it to fill-in each struct before entering it into the hash
>> table. It might be more straightforward to just malloc each individual
>> struct.

> That'd increase the number of allocations quite significantly, but I'd be
> surprised if that was noticeable outside of extreme scenarios.  At the
> moment, I'm inclined to leave these as-is for this reason and because I
> doubt it'd result in much cleanup, but I'll yield to the majority opinion
> here.

I think that would be quite an invasive change; it would require
many hundreds of edits like

-        finfo[i].dobj.objType = DO_FUNC;
+        finfo->dobj.objType = DO_FUNC;

which aside from being tedious would create a back-patching hazard.
So I'm kind of -0.1 or so.

Another angle to this is that Coverity and possibly other tools tend
to report that these functions leak these allocations, apparently
because they don't notice that pointers into the allocations get
stored in hash tables by a subroutine.  I'm not sure if making this
change would make that worse or better.  If we really want to change
it, that might be worth checking somehow before we jump.

            regards, tom lane

Re: small pg_dump code cleanup

From

Nathan Bossart

Date:

11 June 2024, 20:30:14

On Wed, Jun 05, 2024 at 01:58:54PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:
>>> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
>>> then index into it to fill-in each struct before entering it into the hash
>>> table. It might be more straightforward to just malloc each individual
>>> struct.
> 
>> That'd increase the number of allocations quite significantly, but I'd be
>> surprised if that was noticeable outside of extreme scenarios.  At the
>> moment, I'm inclined to leave these as-is for this reason and because I
>> doubt it'd result in much cleanup, but I'll yield to the majority opinion
>> here.
> 
> I think that would be quite an invasive change; it would require
> many hundreds of edits like
> 
> -        finfo[i].dobj.objType = DO_FUNC;
> +        finfo->dobj.objType = DO_FUNC;
> 
> which aside from being tedious would create a back-patching hazard.
> So I'm kind of -0.1 or so.
> 
> Another angle to this is that Coverity and possibly other tools tend
> to report that these functions leak these allocations, apparently
> because they don't notice that pointers into the allocations get
> stored in hash tables by a subroutine.  I'm not sure if making this
> change would make that worse or better.  If we really want to change
> it, that might be worth checking somehow before we jump.

At the moment, I'm inclined to commit v1 once v18 development opens up.  We
can consider any additional adjustments separately.

-- 
nathan

Re: small pg_dump code cleanup

From

Daniel Gustafsson

Date:

01 July 2024, 15:08:12

> On 11 Jun 2024, at 22:30, Nathan Bossart <nathandbossart@gmail.com> wrote:

> At the moment, I'm inclined to commit v1 once v18 development opens up.  We
> can consider any additional adjustments separately.

Patch LGTM and the tests pass, +1 on pushing this version.

--
Daniel Gustafsson

Re: small pg_dump code cleanup

From

Nathan Bossart

Date:

02 July 2024, 16:27:00

Committed.

-- 
nathan