Thread: Splitting up guc.c

Splitting up guc.c

From

Tom Lane

Date:

10 September 2022, 19:04:59

Here's a WIP stab at the project Andres mentioned [1] of splitting up
guc.c into smaller files.  As things stand here, we have:

1. guc.c: the core GUC machinery.
2. guc_tables.c: the data arrays, and some previously-exposed constant
tables.  guc_tables.h can now be considered the associated header.
3. guc_hooks.c: (most of) the per-variable check/assign/show hooks
that had been in guc.c.  guc_hooks.h declares these.

File sizes are like so:

$ wc guc*c
  2629   9372  69467 guc-file.c
  7422  25136 202284 guc.c
   939   2693  22915 guc_hooks.c
  4877  13163 126769 guc_tables.c
 15867  50364 421435 total
$ size guc*o
   text    data     bss     dec     hex filename
  13653       4     112   13769    35c9 guc-file.o
  54953       0     564   55517    d8dd guc.o
   6951       0     112    7063    1b97 guc_hooks.o
  43570   62998     216  106784   1a120 guc_tables.o

I'm fairly happy with the way things turned out in guc.c and
guc_tables.c, but I don't much like guc_hooks.c.  I think instead of
creating such a file, what we should do is to shove most of those
functions into whatever module the GUC variable is associated with.
(Perhaps commands/variable.c could absorb any stragglers that lack
a better home.)  I made a start at that for wal_consistency_checking
and the syslog parameters, but haven't gone further yet.

Before proceeding further, I wanted to ask for comments on a design
choice that might be controversial.  Even though I don't want to
invent guc_hooks.c, I think we *should* invent guc_hooks.h, and
consolidate all the GUC hook function declarations there.  The
point would be to not have to #include guc.h in headers of unrelated
modules.  This is similar to what we've done with utils/fmgrprotos.h,
though the motivation is different.  I already moved a few declarations
from guc.h to there (and in consequence had to adjust #includes in
the modules defining those hooks), but there's a lot more to be done
if we apply that policy across the board.  Does anybody think that's
a bad approach, or have a better one?

BTW, this is more or less orthogonal to my other GUC patch at [2],
although both lead to the conclusion that we need to export
guc_malloc and friends.

            regards, tom lane

[1] https://www.postgresql.org/message-id/20220905233233.jhcu5jqsrtosmgh5%40awork3.anarazel.de
[2] https://www.postgresql.org/message-id/flat/2982579.1662416866%40sss.pgh.pa.us

Attachment

split-up-guc-code-1.patch.gz

Re: Splitting up guc.c

From

Andres Freund

Date:

10 September 2022, 19:15:33

Hi,

On 2022-09-10 15:04:59 -0400, Tom Lane wrote:
> Here's a WIP stab at the project Andres mentioned [1] of splitting up
> guc.c into smaller files.

Cool!


> As things stand here, we have:
> 
> 1. guc.c: the core GUC machinery.
> 2. guc_tables.c: the data arrays, and some previously-exposed constant
> tables.  guc_tables.h can now be considered the associated header.
> 3. guc_hooks.c: (most of) the per-variable check/assign/show hooks
> that had been in guc.c.  guc_hooks.h declares these.
> 
> File sizes are like so:
> 
> $ wc guc*c
>   2629   9372  69467 guc-file.c
>   7422  25136 202284 guc.c
>    939   2693  22915 guc_hooks.c
>   4877  13163 126769 guc_tables.c
>  15867  50364 421435 total
> $ size guc*o
>    text    data     bss     dec     hex filename
>   13653       4     112   13769    35c9 guc-file.o
>   54953       0     564   55517    d8dd guc.o
>    6951       0     112    7063    1b97 guc_hooks.o
>   43570   62998     216  106784   1a120 guc_tables.o

A tad surprised by the text size of guc_tables.o - not that it is a problem,
just seems a bit odd.


> I'm fairly happy with the way things turned out in guc.c and
> guc_tables.c, but I don't much like guc_hooks.c.  I think instead of
> creating such a file, what we should do is to shove most of those
> functions into whatever module the GUC variable is associated with.

+1. I think our associated habit of declaring externs in multiple .c files
isn't great either.



> Before proceeding further, I wanted to ask for comments on a design
> choice that might be controversial.  Even though I don't want to
> invent guc_hooks.c, I think we *should* invent guc_hooks.h, and
> consolidate all the GUC hook function declarations there.  The
> point would be to not have to #include guc.h in headers of unrelated
> modules.  This is similar to what we've done with utils/fmgrprotos.h,
> though the motivation is different.  I already moved a few declarations
> from guc.h to there (and in consequence had to adjust #includes in
> the modules defining those hooks), but there's a lot more to be done
> if we apply that policy across the board.  Does anybody think that's
> a bad approach, or have a better one?

Hm, I'm not opposed, the reasoning makes sense to me. How would this interact
with the declaration of the variables underlying GUCs?


Greetings,

Andres Freund

Re: Splitting up guc.c

From

Tom Lane

Date:

10 September 2022, 19:23:40

Andres Freund <andres@anarazel.de> writes:
> On 2022-09-10 15:04:59 -0400, Tom Lane wrote:
>> $ size guc*o
>> text     data     bss     dec     hex filename
>> 13653       4     112   13769    35c9 guc-file.o
>> 54953       0     564   55517    d8dd guc.o
>> 6951        0     112    7063    1b97 guc_hooks.o
>> 43570   62998     216  106784   1a120 guc_tables.o

> A tad surprised by the text size of guc_tables.o - not that it is a problem,
> just seems a bit odd.

There's a pretty fair number of constant tables that got moved to there.
Not to mention all the constant strings.

>> Before proceeding further, I wanted to ask for comments on a design
>> choice that might be controversial.  Even though I don't want to
>> invent guc_hooks.c, I think we *should* invent guc_hooks.h, and
>> consolidate all the GUC hook function declarations there.  The
>> point would be to not have to #include guc.h in headers of unrelated
>> modules.  This is similar to what we've done with utils/fmgrprotos.h,
>> though the motivation is different.  I already moved a few declarations
>> from guc.h to there (and in consequence had to adjust #includes in
>> the modules defining those hooks), but there's a lot more to be done
>> if we apply that policy across the board.  Does anybody think that's
>> a bad approach, or have a better one?

> Hm, I'm not opposed, the reasoning makes sense to me. How would this interact
> with the declaration of the variables underlying GUCs?

I'd still declare the variables as we do now, ie just straightforwardly
export them from the associated modules.  Since they're all of native
C types, they don't cause any inclusion-footprint issues.  We could move
their declarations to a common file I guess, but I don't see any benefit.

            regards, tom lane

Re: Splitting up guc.c

From

Andres Freund

Date:

10 September 2022, 19:24:40

Hi,

On 2022-09-10 12:15:33 -0700, Andres Freund wrote:
> On 2022-09-10 15:04:59 -0400, Tom Lane wrote:
> > As things stand here, we have:
> >
> > 1. guc.c: the core GUC machinery.
> > 2. guc_tables.c: the data arrays, and some previously-exposed constant
> > tables.  guc_tables.h can now be considered the associated header.
> > 3. guc_hooks.c: (most of) the per-variable check/assign/show hooks
> > that had been in guc.c.  guc_hooks.h declares these.
> >
> > File sizes are like so:
> >
> > $ wc guc*c
> >   2629   9372  69467 guc-file.c
> >   7422  25136 202284 guc.c
> >    939   2693  22915 guc_hooks.c
> >   4877  13163 126769 guc_tables.c
> >  15867  50364 421435 total
> > $ size guc*o
> >    text    data     bss     dec     hex filename
> >   13653       4     112   13769    35c9 guc-file.o
> >   54953       0     564   55517    d8dd guc.o
> >    6951       0     112    7063    1b97 guc_hooks.o
> >   43570   62998     216  106784   1a120 guc_tables.o
>
> A tad surprised by the text size of guc_tables.o - not that it is a problem,
> just seems a bit odd.

Looks like that's just size misgrouping some section. Built a guc_tables.o
without debug information (that makes the output too complicated):

$ size guc_tables_nodebug.o
   text    data     bss     dec     hex filename
  40044   66868     344  107256   1a2f8 guc_tables_nodebug.o

$ size --format=sysv guc_tables_nodebug.o
guc_tables_nodebug.o  :
section                size   addr
.text                     0      0
.data                    52      0
.bss                    344      0
.rodata               40044      0
.data.rel.ro.local     3720      0
.data.rel.local           8      0
.data.rel             63088      0
.comment                 31      0
.note.GNU-stack           0      0
Total                107287

For some reason size adds .roata to the size for text in the default berkeley
style. Which is even documented:

           The Berkeley style output counts read only data in the "text" column, not in the "data" column, the "dec"
and"hex" columns both display the sum
 
           of the "text", "data", and "bss" columns in decimal and hexadecimal respectively.

Greetings,

Andres Freund

Re: Splitting up guc.c

From

Tom Lane

Date:

10 September 2022, 19:35:05

I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2022-09-10 15:04:59 -0400, Tom Lane wrote:
>>> $ size guc*o
>>> text     data     bss     dec     hex filename
>>> 13653       4     112   13769    35c9 guc-file.o
>>> 54953       0     564   55517    d8dd guc.o
>>>  6951       0     112    7063    1b97 guc_hooks.o
>>> 43570   62998     216  106784   1a120 guc_tables.o

>> A tad surprised by the text size of guc_tables.o - not that it is a problem,
>> just seems a bit odd.

> There's a pretty fair number of constant tables that got moved to there.
> Not to mention all the constant strings.

I forgot to include comparison numbers for HEAD:

$ wc guc*c
  2629   9372  69467 guc-file.c
 13335  41584 356896 guc.c
 15964  50956 426363 total
$ size guc*o
   text    data     bss     dec     hex filename
  13653       4     112   13769    35c9 guc-file.o
 105848   63156     908  169912   297b8 guc.o

This isn't completely apples-to-apples because of the few
hook functions I'd moved to other places in v1, but you can
see that the total text and data sizes didn't change much.
It'd likely indicate a mistake if they had.  (However, v1
does include const-ifying a few options tables that had
somehow escaped being labeled that way, so the total data
size did shrink a small amount.)

            regards, tom lane

Re: Splitting up guc.c

From

Michael Paquier

Date:

11 September 2022, 00:43:08

On Sat, Sep 10, 2022 at 03:04:59PM -0400, Tom Lane wrote:
> Before proceeding further, I wanted to ask for comments on a design
> choice that might be controversial.  Even though I don't want to
> invent guc_hooks.c, I think we *should* invent guc_hooks.h, and
> consolidate all the GUC hook function declarations there.  The
> point would be to not have to #include guc.h in headers of unrelated
> modules.  This is similar to what we've done with utils/fmgrprotos.h,
> though the motivation is different.  I already moved a few declarations
> from guc.h to there (and in consequence had to adjust #includes in
> the modules defining those hooks), but there's a lot more to be done
> if we apply that policy across the board.  Does anybody think that's
> a bad approach, or have a better one?

One part that I have found a bit strange lately about guc.c is that we
have mix the core machinery with the SQL-callable parts.  What do you
think about the addition of a gucfuncs.c in src/backend/utils/adt/ to
split things a bit more?
--
Michael

Attachment

signature.asc

Re: Splitting up guc.c

From

Tom Lane

Date:

11 September 2022, 02:08:40

Michael Paquier <michael@paquier.xyz> writes:
> One part that I have found a bit strange lately about guc.c is that we
> have mix the core machinery with the SQL-callable parts.  What do you
> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to
> split things a bit more?

I might be wrong, but I think the SQL-callable stuff makes use
of some APIs that are currently private in guc.c.  So we'd have
to expose more API to make that possible.  Maybe that wouldn't
be a bad thing, but it seems to be getting beyond the original
idea here.  (Note I already had to expose find_option() in
order to get the wal_consistency_checking stuff moved out.)
It's not clear to me that "move the SQL-callable stuff" will
end with a nice API boundary.

            regards, tom lane

Re: Splitting up guc.c

From

Tom Lane

Date:

11 September 2022, 17:52:55

I wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> One part that I have found a bit strange lately about guc.c is that we
>> have mix the core machinery with the SQL-callable parts.  What do you
>> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to
>> split things a bit more?

> I might be wrong, but I think the SQL-callable stuff makes use
> of some APIs that are currently private in guc.c.  So we'd have
> to expose more API to make that possible.  Maybe that wouldn't
> be a bad thing, but it seems to be getting beyond the original
> idea here.

I tried this just to see, and it worked out better than I thought.
The key extra idea is to also pull out the functions implementing
the SET and SHOW commands, because (unsurprisingly) those are just
about in the same place dependency-wise as the SQL functions, and
they have some common subroutines.

I had to export get_config_unit_name(), config_enum_get_options(),
and _ShowOption() (here renamed to ShowGUCOption()) to make this
work.  That doesn't seem too awful.

v2 attached does this, without any further relocation of hook
functions as yet.  I now see these file sizes:

$ wc guc*c
  2629   9372  69467 guc-file.c
  6425  22282 176816 guc.c
  1048   3005  26962 guc_funcs.c
   939   2693  22915 guc_hooks.c
  4877  13163 126769 guc_tables.c
 15918  50515 422929 total
$ size guc*o
   text    data     bss     dec     hex filename
  13653       4     112   13769    35c9 guc-file.o
  46589       0     564   47153    b831 guc.o
   8509       0       0    8509    213d guc_funcs.o
   6951       0     112    7063    1b97 guc_hooks.o
  43570   62998     216  106784   1a120 guc_tables.o

So this removes just about a thousand more lines from guc.c,
which seems worth doing.

            regards, tom lane

Attachment

split-up-guc-code-2.patch.gz

Re: Splitting up guc.c

From

Tom Lane

Date:

11 September 2022, 22:31:41

Here's a v3 that gets rid of guc_hooks.c in favor of moving the
hook functions to related modules (though some did end up in
variables.c for lack of a better idea).  I also pushed all the
hook function declarations to guc_hooks.h.  Unsurprisingly,
removal of guc.h #includes from header files led to discovery
of some surprising indirect dependencies, notably a lot of places
were evidently depending on indirect inclusions of array.h.

I think this is code-complete at this point.  I'd like to not
sit on it too long, because it'll inevitably get side-swiped
by additions of new GUCs.  On the other hand, pushing it in
the middle of a CF would presumably break other people's patches.
Maybe push it at the end of this CF, to give people a month to
rebase anything that's affected?

            regards, tom lane

Attachment

split-up-guc-code-3.patch.gz

Re: Splitting up guc.c

From

Andres Freund

Date:

12 September 2022, 19:39:28

Hi,

On 2022-09-11 18:31:41 -0400, Tom Lane wrote:
> Here's a v3 that gets rid of guc_hooks.c in favor of moving the
> hook functions to related modules (though some did end up in
> variables.c for lack of a better idea).

- a bit worried that in_hot_standby will be confusing due vs InHotStandby. I
  wonder if we could perhaps get rid of an underlying variable in cases where
  we really just need the GUC entry to trigger the show hook?

- perhaps too annoying, but it'd be easier to review this if the function
  renaming were done in a preparatory patch

- Are all those includes in guc_tables.c still necessary? I'd have assumed
  that more should be obsoleted by the introduction of guc_hooks.h? Although I
  guess many are just there for the variable's declaration?

- It's a bit depressing that the GUC arrays aren't const, . But I guess that's
  better fixed separately.



> I think this is code-complete at this point.  I'd like to not
> sit on it too long, because it'll inevitably get side-swiped
> by additions of new GUCs.  On the other hand, pushing it in
> the middle of a CF would presumably break other people's patches.
> Maybe push it at the end of this CF, to give people a month to
> rebase anything that's affected?

I think this is localized enough that asking people to manually resolve a
conflict around adding a GUC entry wouldn't be asking for that much. And I
think plenty changes might be automatically resolvable, despite the rename.

Greetings,

Andres Freund

Re: Splitting up guc.c

From

Tom Lane

Date:

12 September 2022, 19:46:10

Andres Freund <andres@anarazel.de> writes:
> - a bit worried that in_hot_standby will be confusing due vs InHotStandby. I
>   wonder if we could perhaps get rid of an underlying variable in cases where
>   we really just need the GUC entry to trigger the show hook?

Yeah, that worried me too.  We do need the variable because guc.c checks
it directly, but let's use a less confusing name.  in_hot_standby_guc,
maybe?

> - perhaps too annoying, but it'd be easier to review this if the function
>   renaming were done in a preparatory patch

There were only a couple that I renamed, and I don't think any of them
should be directly referenced by anything else.

> - Are all those includes in guc_tables.c still necessary?

The ones that are still there are necessary.  I believe they're mostly
pulling in variables that are GUC targets.

> - It's a bit depressing that the GUC arrays aren't const, . But I guess that's
>   better fixed separately.

Dunno that it'd be helpful, unless we separate the variable and constant
parts of the structs.

> I think this is localized enough that asking people to manually resolve a
> conflict around adding a GUC entry wouldn't be asking for that much. And I
> think plenty changes might be automatically resolvable, despite the rename.

I wonder whether git will be able to figure out that this is mostly a
code move.  I would expect so for a straight file rename, but will that
work when we're splitting the file 3 ways?

            regards, tom lane

Re: Splitting up guc.c

From

Dagfinn Ilmari Mannsåker

Date:

12 September 2022, 20:12:03

Tom Lane <tgl@sss.pgh.pa.us> writes:

>> I think this is localized enough that asking people to manually resolve a
>> conflict around adding a GUC entry wouldn't be asking for that much. And I
>> think plenty changes might be automatically resolvable, despite the rename.
>
> I wonder whether git will be able to figure out that this is mostly a
> code move.  I would expect so for a straight file rename, but will that
> work when we're splitting the file 3 ways?

Git can detect more complicated code movement (see the `--color-moved`
option to `git diff`), but I'm not sure it's clever enough to realise
that a change modifying a block of code that was moved in the meanwhile
should be applied at the new destination.

>             regards, tom lane

- ilmari

Re: Splitting up guc.c

From

Tom Lane

Date:

12 September 2022, 20:20:26

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
> Git can detect more complicated code movement (see the `--color-moved`
> option to `git diff`), but I'm not sure it's clever enough to realise
> that a change modifying a block of code that was moved in the meanwhile
> should be applied at the new destination.

Yeah, I suspect people will have to manually reapply any changes in
the GUC tables to guc_tables.c.  That'll be the same amount of work
for them whenever we commit this patch (unless theirs lands first,
in which case I have to deal with it).  The issue I think is
whether it's politer to make that happen during a CF or between
CFs.

            regards, tom lane

Re: Splitting up guc.c

From

Andres Freund

Date:

12 September 2022, 20:49:27

Re: Splitting up guc.c

From

Andres Freund

Date:

12 September 2022, 20:50:24

Hi,

On 2022-09-12 21:12:03 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> >> I think this is localized enough that asking people to manually resolve a
> >> conflict around adding a GUC entry wouldn't be asking for that much. And I
> >> think plenty changes might be automatically resolvable, despite the rename.
> >
> > I wonder whether git will be able to figure out that this is mostly a
> > code move.  I would expect so for a straight file rename, but will that
> > work when we're splitting the file 3 ways?
> 
> Git can detect more complicated code movement (see the `--color-moved`
> option to `git diff`), but I'm not sure it's clever enough to realise
> that a change modifying a block of code that was moved in the meanwhile
> should be applied at the new destination.

It sometimes can for large code movements, but not in this case. I think
because guc.c is more self-similar than guc_tables.c.

Greetings,

Andres Freund

Re: Splitting up guc.c

From

Alvaro Herrera

Date:

13 September 2022, 09:10:00

On 2022-Sep-12, Tom Lane wrote:

> Yeah, I suspect people will have to manually reapply any changes in
> the GUC tables to guc_tables.c.  That'll be the same amount of work
> for them whenever we commit this patch (unless theirs lands first,
> in which case I have to deal with it).  The issue I think is
> whether it's politer to make that happen during a CF or between
> CFs.

Personally I would prefer that this kind of thing is done quickly rather
than delay it to some uncertain future.  That way I can deal with it
straight ahead rather than living with the anxiety that it will land
later and I will have to deal with it then.  I see no benefit in
waiting.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Splitting up guc.c

From

Tom Lane

Date:

13 September 2022, 14:05:55

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-Sep-12, Tom Lane wrote:
>> Yeah, I suspect people will have to manually reapply any changes in
>> the GUC tables to guc_tables.c.  That'll be the same amount of work
>> for them whenever we commit this patch (unless theirs lands first,
>> in which case I have to deal with it).  The issue I think is
>> whether it's politer to make that happen during a CF or between
>> CFs.

> Personally I would prefer that this kind of thing is done quickly rather
> than delay it to some uncertain future.  That way I can deal with it
> straight ahead rather than living with the anxiety that it will land
> later and I will have to deal with it then.  I see no benefit in
> waiting.

Fair enough.  I'm also not looking forward to having to rebase my
patch over anybody else's GUC changes -- even just a new GUC would
invalidate a thousand-line diff hunk, and I doubt that "git apply"
would deal with that very helpfully.  I'll go ahead and get this
pushed.

            regards, tom lane