Thread: Splitting up guc.c
Here's a WIP stab at the project Andres mentioned [1] of splitting up guc.c into smaller files. As things stand here, we have: 1. guc.c: the core GUC machinery. 2. guc_tables.c: the data arrays, and some previously-exposed constant tables. guc_tables.h can now be considered the associated header. 3. guc_hooks.c: (most of) the per-variable check/assign/show hooks that had been in guc.c. guc_hooks.h declares these. File sizes are like so: $ wc guc*c 2629 9372 69467 guc-file.c 7422 25136 202284 guc.c 939 2693 22915 guc_hooks.c 4877 13163 126769 guc_tables.c 15867 50364 421435 total $ size guc*o text data bss dec hex filename 13653 4 112 13769 35c9 guc-file.o 54953 0 564 55517 d8dd guc.o 6951 0 112 7063 1b97 guc_hooks.o 43570 62998 216 106784 1a120 guc_tables.o I'm fairly happy with the way things turned out in guc.c and guc_tables.c, but I don't much like guc_hooks.c. I think instead of creating such a file, what we should do is to shove most of those functions into whatever module the GUC variable is associated with. (Perhaps commands/variable.c could absorb any stragglers that lack a better home.) I made a start at that for wal_consistency_checking and the syslog parameters, but haven't gone further yet. Before proceeding further, I wanted to ask for comments on a design choice that might be controversial. Even though I don't want to invent guc_hooks.c, I think we *should* invent guc_hooks.h, and consolidate all the GUC hook function declarations there. The point would be to not have to #include guc.h in headers of unrelated modules. This is similar to what we've done with utils/fmgrprotos.h, though the motivation is different. I already moved a few declarations from guc.h to there (and in consequence had to adjust #includes in the modules defining those hooks), but there's a lot more to be done if we apply that policy across the board. Does anybody think that's a bad approach, or have a better one? BTW, this is more or less orthogonal to my other GUC patch at [2], although both lead to the conclusion that we need to export guc_malloc and friends. regards, tom lane [1] https://www.postgresql.org/message-id/20220905233233.jhcu5jqsrtosmgh5%40awork3.anarazel.de [2] https://www.postgresql.org/message-id/flat/2982579.1662416866%40sss.pgh.pa.us
Attachment
Hi, On 2022-09-10 15:04:59 -0400, Tom Lane wrote: > Here's a WIP stab at the project Andres mentioned [1] of splitting up > guc.c into smaller files. Cool! > As things stand here, we have: > > 1. guc.c: the core GUC machinery. > 2. guc_tables.c: the data arrays, and some previously-exposed constant > tables. guc_tables.h can now be considered the associated header. > 3. guc_hooks.c: (most of) the per-variable check/assign/show hooks > that had been in guc.c. guc_hooks.h declares these. > > File sizes are like so: > > $ wc guc*c > 2629 9372 69467 guc-file.c > 7422 25136 202284 guc.c > 939 2693 22915 guc_hooks.c > 4877 13163 126769 guc_tables.c > 15867 50364 421435 total > $ size guc*o > text data bss dec hex filename > 13653 4 112 13769 35c9 guc-file.o > 54953 0 564 55517 d8dd guc.o > 6951 0 112 7063 1b97 guc_hooks.o > 43570 62998 216 106784 1a120 guc_tables.o A tad surprised by the text size of guc_tables.o - not that it is a problem, just seems a bit odd. > I'm fairly happy with the way things turned out in guc.c and > guc_tables.c, but I don't much like guc_hooks.c. I think instead of > creating such a file, what we should do is to shove most of those > functions into whatever module the GUC variable is associated with. +1. I think our associated habit of declaring externs in multiple .c files isn't great either. > Before proceeding further, I wanted to ask for comments on a design > choice that might be controversial. Even though I don't want to > invent guc_hooks.c, I think we *should* invent guc_hooks.h, and > consolidate all the GUC hook function declarations there. The > point would be to not have to #include guc.h in headers of unrelated > modules. This is similar to what we've done with utils/fmgrprotos.h, > though the motivation is different. I already moved a few declarations > from guc.h to there (and in consequence had to adjust #includes in > the modules defining those hooks), but there's a lot more to be done > if we apply that policy across the board. Does anybody think that's > a bad approach, or have a better one? Hm, I'm not opposed, the reasoning makes sense to me. How would this interact with the declaration of the variables underlying GUCs? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-09-10 15:04:59 -0400, Tom Lane wrote: >> $ size guc*o >> text data bss dec hex filename >> 13653 4 112 13769 35c9 guc-file.o >> 54953 0 564 55517 d8dd guc.o >> 6951 0 112 7063 1b97 guc_hooks.o >> 43570 62998 216 106784 1a120 guc_tables.o > A tad surprised by the text size of guc_tables.o - not that it is a problem, > just seems a bit odd. There's a pretty fair number of constant tables that got moved to there. Not to mention all the constant strings. >> Before proceeding further, I wanted to ask for comments on a design >> choice that might be controversial. Even though I don't want to >> invent guc_hooks.c, I think we *should* invent guc_hooks.h, and >> consolidate all the GUC hook function declarations there. The >> point would be to not have to #include guc.h in headers of unrelated >> modules. This is similar to what we've done with utils/fmgrprotos.h, >> though the motivation is different. I already moved a few declarations >> from guc.h to there (and in consequence had to adjust #includes in >> the modules defining those hooks), but there's a lot more to be done >> if we apply that policy across the board. Does anybody think that's >> a bad approach, or have a better one? > Hm, I'm not opposed, the reasoning makes sense to me. How would this interact > with the declaration of the variables underlying GUCs? I'd still declare the variables as we do now, ie just straightforwardly export them from the associated modules. Since they're all of native C types, they don't cause any inclusion-footprint issues. We could move their declarations to a common file I guess, but I don't see any benefit. regards, tom lane
Hi, On 2022-09-10 12:15:33 -0700, Andres Freund wrote: > On 2022-09-10 15:04:59 -0400, Tom Lane wrote: > > As things stand here, we have: > > > > 1. guc.c: the core GUC machinery. > > 2. guc_tables.c: the data arrays, and some previously-exposed constant > > tables. guc_tables.h can now be considered the associated header. > > 3. guc_hooks.c: (most of) the per-variable check/assign/show hooks > > that had been in guc.c. guc_hooks.h declares these. > > > > File sizes are like so: > > > > $ wc guc*c > > 2629 9372 69467 guc-file.c > > 7422 25136 202284 guc.c > > 939 2693 22915 guc_hooks.c > > 4877 13163 126769 guc_tables.c > > 15867 50364 421435 total > > $ size guc*o > > text data bss dec hex filename > > 13653 4 112 13769 35c9 guc-file.o > > 54953 0 564 55517 d8dd guc.o > > 6951 0 112 7063 1b97 guc_hooks.o > > 43570 62998 216 106784 1a120 guc_tables.o > > A tad surprised by the text size of guc_tables.o - not that it is a problem, > just seems a bit odd. Looks like that's just size misgrouping some section. Built a guc_tables.o without debug information (that makes the output too complicated): $ size guc_tables_nodebug.o text data bss dec hex filename 40044 66868 344 107256 1a2f8 guc_tables_nodebug.o $ size --format=sysv guc_tables_nodebug.o guc_tables_nodebug.o : section size addr .text 0 0 .data 52 0 .bss 344 0 .rodata 40044 0 .data.rel.ro.local 3720 0 .data.rel.local 8 0 .data.rel 63088 0 .comment 31 0 .note.GNU-stack 0 0 Total 107287 For some reason size adds .roata to the size for text in the default berkeley style. Which is even documented: The Berkeley style output counts read only data in the "text" column, not in the "data" column, the "dec" and"hex" columns both display the sum of the "text", "data", and "bss" columns in decimal and hexadecimal respectively. Greetings, Andres Freund
I wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2022-09-10 15:04:59 -0400, Tom Lane wrote: >>> $ size guc*o >>> text data bss dec hex filename >>> 13653 4 112 13769 35c9 guc-file.o >>> 54953 0 564 55517 d8dd guc.o >>> 6951 0 112 7063 1b97 guc_hooks.o >>> 43570 62998 216 106784 1a120 guc_tables.o >> A tad surprised by the text size of guc_tables.o - not that it is a problem, >> just seems a bit odd. > There's a pretty fair number of constant tables that got moved to there. > Not to mention all the constant strings. I forgot to include comparison numbers for HEAD: $ wc guc*c 2629 9372 69467 guc-file.c 13335 41584 356896 guc.c 15964 50956 426363 total $ size guc*o text data bss dec hex filename 13653 4 112 13769 35c9 guc-file.o 105848 63156 908 169912 297b8 guc.o This isn't completely apples-to-apples because of the few hook functions I'd moved to other places in v1, but you can see that the total text and data sizes didn't change much. It'd likely indicate a mistake if they had. (However, v1 does include const-ifying a few options tables that had somehow escaped being labeled that way, so the total data size did shrink a small amount.) regards, tom lane
On Sat, Sep 10, 2022 at 03:04:59PM -0400, Tom Lane wrote: > Before proceeding further, I wanted to ask for comments on a design > choice that might be controversial. Even though I don't want to > invent guc_hooks.c, I think we *should* invent guc_hooks.h, and > consolidate all the GUC hook function declarations there. The > point would be to not have to #include guc.h in headers of unrelated > modules. This is similar to what we've done with utils/fmgrprotos.h, > though the motivation is different. I already moved a few declarations > from guc.h to there (and in consequence had to adjust #includes in > the modules defining those hooks), but there's a lot more to be done > if we apply that policy across the board. Does anybody think that's > a bad approach, or have a better one? One part that I have found a bit strange lately about guc.c is that we have mix the core machinery with the SQL-callable parts. What do you think about the addition of a gucfuncs.c in src/backend/utils/adt/ to split things a bit more? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > One part that I have found a bit strange lately about guc.c is that we > have mix the core machinery with the SQL-callable parts. What do you > think about the addition of a gucfuncs.c in src/backend/utils/adt/ to > split things a bit more? I might be wrong, but I think the SQL-callable stuff makes use of some APIs that are currently private in guc.c. So we'd have to expose more API to make that possible. Maybe that wouldn't be a bad thing, but it seems to be getting beyond the original idea here. (Note I already had to expose find_option() in order to get the wal_consistency_checking stuff moved out.) It's not clear to me that "move the SQL-callable stuff" will end with a nice API boundary. regards, tom lane
I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> One part that I have found a bit strange lately about guc.c is that we >> have mix the core machinery with the SQL-callable parts. What do you >> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to >> split things a bit more? > I might be wrong, but I think the SQL-callable stuff makes use > of some APIs that are currently private in guc.c. So we'd have > to expose more API to make that possible. Maybe that wouldn't > be a bad thing, but it seems to be getting beyond the original > idea here. I tried this just to see, and it worked out better than I thought. The key extra idea is to also pull out the functions implementing the SET and SHOW commands, because (unsurprisingly) those are just about in the same place dependency-wise as the SQL functions, and they have some common subroutines. I had to export get_config_unit_name(), config_enum_get_options(), and _ShowOption() (here renamed to ShowGUCOption()) to make this work. That doesn't seem too awful. v2 attached does this, without any further relocation of hook functions as yet. I now see these file sizes: $ wc guc*c 2629 9372 69467 guc-file.c 6425 22282 176816 guc.c 1048 3005 26962 guc_funcs.c 939 2693 22915 guc_hooks.c 4877 13163 126769 guc_tables.c 15918 50515 422929 total $ size guc*o text data bss dec hex filename 13653 4 112 13769 35c9 guc-file.o 46589 0 564 47153 b831 guc.o 8509 0 0 8509 213d guc_funcs.o 6951 0 112 7063 1b97 guc_hooks.o 43570 62998 216 106784 1a120 guc_tables.o So this removes just about a thousand more lines from guc.c, which seems worth doing. regards, tom lane
Attachment
Here's a v3 that gets rid of guc_hooks.c in favor of moving the hook functions to related modules (though some did end up in variables.c for lack of a better idea). I also pushed all the hook function declarations to guc_hooks.h. Unsurprisingly, removal of guc.h #includes from header files led to discovery of some surprising indirect dependencies, notably a lot of places were evidently depending on indirect inclusions of array.h. I think this is code-complete at this point. I'd like to not sit on it too long, because it'll inevitably get side-swiped by additions of new GUCs. On the other hand, pushing it in the middle of a CF would presumably break other people's patches. Maybe push it at the end of this CF, to give people a month to rebase anything that's affected? regards, tom lane
Attachment
Hi, On 2022-09-11 18:31:41 -0400, Tom Lane wrote: > Here's a v3 that gets rid of guc_hooks.c in favor of moving the > hook functions to related modules (though some did end up in > variables.c for lack of a better idea). - a bit worried that in_hot_standby will be confusing due vs InHotStandby. I wonder if we could perhaps get rid of an underlying variable in cases where we really just need the GUC entry to trigger the show hook? - perhaps too annoying, but it'd be easier to review this if the function renaming were done in a preparatory patch - Are all those includes in guc_tables.c still necessary? I'd have assumed that more should be obsoleted by the introduction of guc_hooks.h? Although I guess many are just there for the variable's declaration? - It's a bit depressing that the GUC arrays aren't const, . But I guess that's better fixed separately. > I think this is code-complete at this point. I'd like to not > sit on it too long, because it'll inevitably get side-swiped > by additions of new GUCs. On the other hand, pushing it in > the middle of a CF would presumably break other people's patches. > Maybe push it at the end of this CF, to give people a month to > rebase anything that's affected? I think this is localized enough that asking people to manually resolve a conflict around adding a GUC entry wouldn't be asking for that much. And I think plenty changes might be automatically resolvable, despite the rename. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > - a bit worried that in_hot_standby will be confusing due vs InHotStandby. I > wonder if we could perhaps get rid of an underlying variable in cases where > we really just need the GUC entry to trigger the show hook? Yeah, that worried me too. We do need the variable because guc.c checks it directly, but let's use a less confusing name. in_hot_standby_guc, maybe? > - perhaps too annoying, but it'd be easier to review this if the function > renaming were done in a preparatory patch There were only a couple that I renamed, and I don't think any of them should be directly referenced by anything else. > - Are all those includes in guc_tables.c still necessary? The ones that are still there are necessary. I believe they're mostly pulling in variables that are GUC targets. > - It's a bit depressing that the GUC arrays aren't const, . But I guess that's > better fixed separately. Dunno that it'd be helpful, unless we separate the variable and constant parts of the structs. > I think this is localized enough that asking people to manually resolve a > conflict around adding a GUC entry wouldn't be asking for that much. And I > think plenty changes might be automatically resolvable, despite the rename. I wonder whether git will be able to figure out that this is mostly a code move. I would expect so for a straight file rename, but will that work when we're splitting the file 3 ways? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: >> I think this is localized enough that asking people to manually resolve a >> conflict around adding a GUC entry wouldn't be asking for that much. And I >> think plenty changes might be automatically resolvable, despite the rename. > > I wonder whether git will be able to figure out that this is mostly a > code move. I would expect so for a straight file rename, but will that > work when we're splitting the file 3 ways? Git can detect more complicated code movement (see the `--color-moved` option to `git diff`), but I'm not sure it's clever enough to realise that a change modifying a block of code that was moved in the meanwhile should be applied at the new destination. > regards, tom lane - ilmari
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes: > Git can detect more complicated code movement (see the `--color-moved` > option to `git diff`), but I'm not sure it's clever enough to realise > that a change modifying a block of code that was moved in the meanwhile > should be applied at the new destination. Yeah, I suspect people will have to manually reapply any changes in the GUC tables to guc_tables.c. That'll be the same amount of work for them whenever we commit this patch (unless theirs lands first, in which case I have to deal with it). The issue I think is whether it's politer to make that happen during a CF or between CFs. regards, tom lane
Hi, On 2022-09-12 21:12:03 +0100, Dagfinn Ilmari Mannsåker wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> I think this is localized enough that asking people to manually resolve a > >> conflict around adding a GUC entry wouldn't be asking for that much. And I > >> think plenty changes might be automatically resolvable, despite the rename. > > > > I wonder whether git will be able to figure out that this is mostly a > > code move. I would expect so for a straight file rename, but will that > > work when we're splitting the file 3 ways? > > Git can detect more complicated code movement (see the `--color-moved` > option to `git diff`), but I'm not sure it's clever enough to realise > that a change modifying a block of code that was moved in the meanwhile > should be applied at the new destination. It sometimes can for large code movements, but not in this case. I think because guc.c is more self-similar than guc_tables.c. Greetings, Andres Freund
On 2022-Sep-12, Tom Lane wrote: > Yeah, I suspect people will have to manually reapply any changes in > the GUC tables to guc_tables.c. That'll be the same amount of work > for them whenever we commit this patch (unless theirs lands first, > in which case I have to deal with it). The issue I think is > whether it's politer to make that happen during a CF or between > CFs. Personally I would prefer that this kind of thing is done quickly rather than delay it to some uncertain future. That way I can deal with it straight ahead rather than living with the anxiety that it will land later and I will have to deal with it then. I see no benefit in waiting. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Sep-12, Tom Lane wrote: >> Yeah, I suspect people will have to manually reapply any changes in >> the GUC tables to guc_tables.c. That'll be the same amount of work >> for them whenever we commit this patch (unless theirs lands first, >> in which case I have to deal with it). The issue I think is >> whether it's politer to make that happen during a CF or between >> CFs. > Personally I would prefer that this kind of thing is done quickly rather > than delay it to some uncertain future. That way I can deal with it > straight ahead rather than living with the anxiety that it will land > later and I will have to deal with it then. I see no benefit in > waiting. Fair enough. I'm also not looking forward to having to rebase my patch over anybody else's GUC changes -- even just a new GUC would invalidate a thousand-line diff hunk, and I doubt that "git apply" would deal with that very helpfully. I'll go ahead and get this pushed. regards, tom lane