Thread: small pg_dump code cleanup
While reviewing Daniel's pg_dump patch [0], I was initially confused because the return value of getTypes() isn't saved anywhere. Once I found commit 92316a4, I realized that data was actually stored in a separate hash table. In fact, many of the functions in this area don't actually need to return anything, so we can trim some code and hopefully reduce confusion a bit. Patch attached. [0] https://postgr.es/m/8F1F1E1D-D17B-4B33-B014-EDBCD15F3F0B%40yesql.se -- nathan
Attachment
On Wed, Jun 5, 2024 at 11:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
In fact, many of the functions in this area don't actually need to
return anything, so we can trim some code and hopefully reduce confusion a
bit. Patch attached.
Nice cleanup! Two minor comments:
(1) Names like `getXXX` for these functions suggest to me that they return a value, rather than side-effecting. I realize some variants continue to return a value, but the majority no longer do. Perhaps a name like lookupXXX() or readXXX() would be clearer?
(2) These functions malloc() a single ntups * sizeof(struct) allocation and then index into it to fill-in each struct before entering it into the hash table. It might be more straightforward to just malloc each individual struct.
Neil
On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote: > Nice cleanup! Two minor comments: Thanks for taking a look. > (1) Names like `getXXX` for these functions suggest to me that they return > a value, rather than side-effecting. I realize some variants continue to > return a value, but the majority no longer do. Perhaps a name like > lookupXXX() or readXXX() would be clearer? What about collectXXX() to match similar functions in pg_dump.c (e.g., collectRoleNames(), collectComments(), collectSecLabels())? > (2) These functions malloc() a single ntups * sizeof(struct) allocation and > then index into it to fill-in each struct before entering it into the hash > table. It might be more straightforward to just malloc each individual > struct. That'd increase the number of allocations quite significantly, but I'd be surprised if that was noticeable outside of extreme scenarios. At the moment, I'm inclined to leave these as-is for this reason and because I doubt it'd result in much cleanup, but I'll yield to the majority opinion here. -- nathan
On Wed, Jun 5, 2024 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?
sgtm.
> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
> then index into it to fill-in each struct before entering it into the hash
> table. It might be more straightforward to just malloc each individual
> struct.
That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios. At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.
As you say, I'd be surprised if the performance difference is noticeable. Personally I don't think the marginal performance win justifies the hit to readability, but I don't feel strongly about it.
Neil
Nathan Bossart <nathandbossart@gmail.com> writes: > On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote: >> (1) Names like `getXXX` for these functions suggest to me that they return >> a value, rather than side-effecting. I realize some variants continue to >> return a value, but the majority no longer do. Perhaps a name like >> lookupXXX() or readXXX() would be clearer? > What about collectXXX() to match similar functions in pg_dump.c (e.g., > collectRoleNames(), collectComments(), collectSecLabels())? Personally I see nothing much wrong with leaving them as getXXX. >> (2) These functions malloc() a single ntups * sizeof(struct) allocation and >> then index into it to fill-in each struct before entering it into the hash >> table. It might be more straightforward to just malloc each individual >> struct. > That'd increase the number of allocations quite significantly, but I'd be > surprised if that was noticeable outside of extreme scenarios. At the > moment, I'm inclined to leave these as-is for this reason and because I > doubt it'd result in much cleanup, but I'll yield to the majority opinion > here. I think that would be quite an invasive change; it would require many hundreds of edits like - finfo[i].dobj.objType = DO_FUNC; + finfo->dobj.objType = DO_FUNC; which aside from being tedious would create a back-patching hazard. So I'm kind of -0.1 or so. Another angle to this is that Coverity and possibly other tools tend to report that these functions leak these allocations, apparently because they don't notice that pointers into the allocations get stored in hash tables by a subroutine. I'm not sure if making this change would make that worse or better. If we really want to change it, that might be worth checking somehow before we jump. regards, tom lane
On Wed, Jun 05, 2024 at 01:58:54PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote: >>> (2) These functions malloc() a single ntups * sizeof(struct) allocation and >>> then index into it to fill-in each struct before entering it into the hash >>> table. It might be more straightforward to just malloc each individual >>> struct. > >> That'd increase the number of allocations quite significantly, but I'd be >> surprised if that was noticeable outside of extreme scenarios. At the >> moment, I'm inclined to leave these as-is for this reason and because I >> doubt it'd result in much cleanup, but I'll yield to the majority opinion >> here. > > I think that would be quite an invasive change; it would require > many hundreds of edits like > > - finfo[i].dobj.objType = DO_FUNC; > + finfo->dobj.objType = DO_FUNC; > > which aside from being tedious would create a back-patching hazard. > So I'm kind of -0.1 or so. > > Another angle to this is that Coverity and possibly other tools tend > to report that these functions leak these allocations, apparently > because they don't notice that pointers into the allocations get > stored in hash tables by a subroutine. I'm not sure if making this > change would make that worse or better. If we really want to change > it, that might be worth checking somehow before we jump. At the moment, I'm inclined to commit v1 once v18 development opens up. We can consider any additional adjustments separately. -- nathan
> On 11 Jun 2024, at 22:30, Nathan Bossart <nathandbossart@gmail.com> wrote: > At the moment, I'm inclined to commit v1 once v18 development opens up. We > can consider any additional adjustments separately. Patch LGTM and the tests pass, +1 on pushing this version. -- Daniel Gustafsson
Committed. -- nathan