Thread: make MaxBackends available in _PG_init

make MaxBackends available in _PG_init

From
"Wang, Shenhao"
Date:
Hi Hackers:

I find the value in function _PG_init, the value of MaxBackends is 0.
In source, I find that the postmaster will first load library, and then calculate the value of MaxBackends.

In the old version, the MaxBackends was calculated by:
     MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
        GetNumShmemAttachedBgworkers();
Because any extension can register workers which will affect the return value of GetNumShmemAttachedBgworkers.
InitializeMaxBackends must be called after shared_preload_libraries. This is also mentioned in comments.

Now, function GetNumShmemAttachedBgworkers was deleted and replaced by guc max_worker_processes,
so if we changed the calling order like:
    Step1: calling InitializeMaxBackends.
    Step2: calling process_shared_preload_libraries

In this order extension can get the correct value of MaxBackends in _PG_init.

Any thoughts?

Regards



Attachment

Re: make MaxBackends available in _PG_init

From
Bharath Rupireddy
Date:
On Mon, Sep 21, 2020 at 9:14 AM Wang, Shenhao
<wangsh.fnst@cn.fujitsu.com> wrote:
>
> In source, I find that the postmaster will first load library, and then calculate the value of MaxBackends.
>
> In the old version, the MaxBackends was calculated by:
>          MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
>                 GetNumShmemAttachedBgworkers();
> Because any extension can register workers which will affect the return value of GetNumShmemAttachedBgworkers.
> InitializeMaxBackends must be called after shared_preload_libraries. This is also mentioned in comments.
>
> Now, function GetNumShmemAttachedBgworkers was deleted and replaced by guc max_worker_processes,
> so if we changed the calling order like:
>         Step1: calling InitializeMaxBackends.
>         Step2: calling process_shared_preload_libraries
>

Yes, the GetNumShmemAttachedBgworkers() was removed by commit #
dfbba2c86cc8f09cf3ffca3d305b4ce54a7fb49a. ASAICS, changing the order
of InitializeMaxBackends() and process_shared_preload_libraries() has
no problem, as InitializeMaxBackends() doesn't calculate the
MaxBackends based on bgworker infra code, it does calculate based on
GUCs.

Having said that, I'm not quite sure whether any of the bgworker
registration code, for that matter process_shared_preload_libraries()
code path will somewhere use MaxBackends?

>
> In this order extension can get the correct value of MaxBackends in _PG_init.
>

Is there any specific use case that any of the _PG_init will use MaxBackends?

I think the InitializeMaxBackends() function comments still say as shown below.

 * This must be called after modules have had the chance to register background
 * workers in shared_preload_libraries, and before shared memory size is
 * determined.

What happens with your patch in EXEC_BACKEND cases? (On linux
EXEC_BACKEND (include/pg_config_manual.h) can be enabled for testing
purposes).

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 9/21/20, 4:52 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Sep 21, 2020 at 9:14 AM Wang, Shenhao
> <wangsh.fnst@cn.fujitsu.com> wrote:
>>
>> In source, I find that the postmaster will first load library, and then calculate the value of MaxBackends.
>>
>> In the old version, the MaxBackends was calculated by:
>>          MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
>>                 GetNumShmemAttachedBgworkers();
>> Because any extension can register workers which will affect the return value of GetNumShmemAttachedBgworkers.
>> InitializeMaxBackends must be called after shared_preload_libraries. This is also mentioned in comments.
>>
>> Now, function GetNumShmemAttachedBgworkers was deleted and replaced by guc max_worker_processes,
>> so if we changed the calling order like:
>>         Step1: calling InitializeMaxBackends.
>>         Step2: calling process_shared_preload_libraries
>>
>
> Yes, the GetNumShmemAttachedBgworkers() was removed by commit #
> dfbba2c86cc8f09cf3ffca3d305b4ce54a7fb49a. ASAICS, changing the order
> of InitializeMaxBackends() and process_shared_preload_libraries() has
> no problem, as InitializeMaxBackends() doesn't calculate the
> MaxBackends based on bgworker infra code, it does calculate based on
> GUCs.
>
> Having said that, I'm not quite sure whether any of the bgworker
> registration code, for that matter process_shared_preload_libraries()
> code path will somewhere use MaxBackends?
>
>>
>> In this order extension can get the correct value of MaxBackends in _PG_init.
>>
>
> Is there any specific use case that any of the _PG_init will use MaxBackends?

I just encountered the same thing, so I am bumping this thread.  I was
trying to use MaxBackends in a call to RequestAddinShmemSpace() in a
_PG_init() function for a module, but since MaxBackends is not yet
initialized, you essentially need to open-code InitializeMaxBackends()
instead.

I think the comments about needing to register background workers
before initializing MaxBackends have been incorrect since the addition
of max_worker_processes in v9.4 (6bc8ef0b).  Furthermore, I think the
suggested reordering is a good idea because it is not obvious that
MaxBackends will be uninitialized in _PG_init(), and use-cases like
the RequestAddinShmemSpace() one are not guaranteed to fail when
MaxBackends is used incorrectly (presumably due to the 100 KB buffer
added in CreateSharedMemoryAndSemaphores()).

I've attached a new version of the proposed patch with some slight
adjustments and an attempt at a commit message.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Mon, Aug 2, 2021 at 2:18 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> I just encountered the same thing, so I am bumping this thread.  I was
> trying to use MaxBackends in a call to RequestAddinShmemSpace() in a
> _PG_init() function for a module, but since MaxBackends is not yet
> initialized, you essentially need to open-code InitializeMaxBackends()
> instead.
>
> I think the comments about needing to register background workers
> before initializing MaxBackends have been incorrect since the addition
> of max_worker_processes in v9.4 (6bc8ef0b).  Furthermore, I think the
> suggested reordering is a good idea because it is not obvious that
> MaxBackends will be uninitialized in _PG_init(), and use-cases like
> the RequestAddinShmemSpace() one are not guaranteed to fail when
> MaxBackends is used incorrectly (presumably due to the 100 KB buffer
> added in CreateSharedMemoryAndSemaphores()).
>
> I've attached a new version of the proposed patch with some slight
> adjustments and an attempt at a commit message.

I think this is a good idea. Anyone object?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Alvaro Herrera
Date:
On 2021-Aug-02, Robert Haas wrote:

> On Mon, Aug 2, 2021 at 2:18 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> >
> > I think the comments about needing to register background workers
> > before initializing MaxBackends have been incorrect since the addition
> > of max_worker_processes in v9.4 (6bc8ef0b).

> I think this is a good idea. Anyone object?

No objection here.  AFAICS Nathan is correct.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)



Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2021-08-02 16:06:15 -0400, Robert Haas wrote:
> On Mon, Aug 2, 2021 at 2:18 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> > I just encountered the same thing, so I am bumping this thread.  I was
> > trying to use MaxBackends in a call to RequestAddinShmemSpace() in a
> > _PG_init() function for a module, but since MaxBackends is not yet
> > initialized, you essentially need to open-code InitializeMaxBackends()
> > instead.
> >
> > I think the comments about needing to register background workers
> > before initializing MaxBackends have been incorrect since the addition
> > of max_worker_processes in v9.4 (6bc8ef0b).  Furthermore, I think the
> > suggested reordering is a good idea because it is not obvious that
> > MaxBackends will be uninitialized in _PG_init(), and use-cases like
> > the RequestAddinShmemSpace() one are not guaranteed to fail when
> > MaxBackends is used incorrectly (presumably due to the 100 KB buffer
> > added in CreateSharedMemoryAndSemaphores()).
> >
> > I've attached a new version of the proposed patch with some slight
> > adjustments and an attempt at a commit message.
> 
> I think this is a good idea. Anyone object?

I'm not so sure it's a good idea. I've seen several shared_preload_library
using extensions that adjust some GUCs (e.g. max_prepared_transactions)
because they need some more resources internally - that's perhaps not a great
idea, but there's also not an obviously better way.

ISTM this would better be solved with making the hook or config logic for
libraries a bit more elaborate (e.g. having a hook you can register for that's
called once all libraries are loaded).

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/2/21, 1:37 PM, "Andres Freund" <andres@anarazel.de> wrote:
> On 2021-08-02 16:06:15 -0400, Robert Haas wrote:
>> I think this is a good idea. Anyone object?
>
> I'm not so sure it's a good idea. I've seen several shared_preload_library
> using extensions that adjust some GUCs (e.g. max_prepared_transactions)
> because they need some more resources internally - that's perhaps not a great
> idea, but there's also not an obviously better way.

Interesting.  I hadn't heard of extensions adjusting GUCs in
_PG_init() before.  I think the other way to handle that scenario is
to check the GUCs in _PG_init() and fail startup if necessary.
However, while it's probably good to avoid changing GUCs from what
users specified, failing startup isn't exactly user-friendly, either.
In any case, changing a GUC in _PG_init() seems quite risky.  You're
effectively bypassing all of the usual checks.

> ISTM this would better be solved with making the hook or config logic for
> libraries a bit more elaborate (e.g. having a hook you can register for that's
> called once all libraries are loaded).

My worry is that this requires even more specialized knowledge to get
things right.  If I need to do anything based on the value of a GUC,
I have to register another hook.  In this new hook, we'd likely want
to somehow prevent modules from further adjusting GUCs.  We'd also
need modules to check the GUCs they care about again in case another
module's _PG_init() adjusted it in an incompatible way.  If you detect
a problem in this new hook, you probably need to fail startup.

Perhaps I am making a mountain out of a molehill and the modules that
are adjusting GUCs in _PG_init() are doing so in a generally safe way,
but IMO it ought to ordinarily be discouraged.

Nathan


Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2021-08-02 21:57:18 +0000, Bossart, Nathan wrote:
> On 8/2/21, 1:37 PM, "Andres Freund" <andres@anarazel.de> wrote:
> > On 2021-08-02 16:06:15 -0400, Robert Haas wrote:
> >> I think this is a good idea. Anyone object?
> >
> > I'm not so sure it's a good idea. I've seen several shared_preload_library
> > using extensions that adjust some GUCs (e.g. max_prepared_transactions)
> > because they need some more resources internally - that's perhaps not a great
> > idea, but there's also not an obviously better way.
> 
> Interesting.  I hadn't heard of extensions adjusting GUCs in
> _PG_init() before.  I think the other way to handle that scenario is
> to check the GUCs in _PG_init() and fail startup if necessary.

The problem is that that makes it hard to configure things for the users own
needs. If e.g. the user's workload needs a certain number of prepared
transactions and the extension internally as well (as e.g. citus does), the
extension can't know if the configured number is "aimed" to be for the
extension, or for the user's own needs. If you instead just increase
max_prepared_transactions in _PG_init(), the story is different.


> However, while it's probably good to avoid changing GUCs from what
> users specified, failing startup isn't exactly user-friendly, either.
> In any case, changing a GUC in _PG_init() seems quite risky.  You're
> effectively bypassing all of the usual checks.

It doesn't need to bypass all - you can override them using GUC mechanisms at
that point.


> > ISTM this would better be solved with making the hook or config logic for
> > libraries a bit more elaborate (e.g. having a hook you can register for that's
> > called once all libraries are loaded).
> 
> My worry is that this requires even more specialized knowledge to get
> things right.  If I need to do anything based on the value of a GUC,
> I have to register another hook.  In this new hook, we'd likely want
> to somehow prevent modules from further adjusting GUCs.  We'd also
> need modules to check the GUCs they care about again in case another
> module's _PG_init() adjusted it in an incompatible way.

I think this is overblown. We already size resources *after*
shared_preload_libraries' _PG_init() runs, because that's the whole point of
shared_preload_libraries. What's proposed in this thread is to *disallow*
increasing resource usage in s_p_l _PG_init(), to make one specific case
simpler - but it'll actually also make things more complicated, because other
resources will still only be sized after all of s_p_l has been processed.


> If you detect a problem in this new hook, you probably need to fail startup.

Same is true for _PG_init().


> Perhaps I am making a mountain out of a molehill and the modules that
> are adjusting GUCs in _PG_init() are doing so in a generally safe way,
> but IMO it ought to ordinarily be discouraged.

It's not something I would recommend doing blithely - but I also don't think
we have a better answer for some cases.

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/2/21, 3:12 PM, "Andres Freund" <andres@anarazel.de> wrote:
> I think this is overblown. We already size resources *after*
> shared_preload_libraries' _PG_init() runs, because that's the whole point of
> shared_preload_libraries. What's proposed in this thread is to *disallow*
> increasing resource usage in s_p_l _PG_init(), to make one specific case
> simpler - but it'll actually also make things more complicated, because other
> resources will still only be sized after all of s_p_l has been processed.

True.  Perhaps the comments should reference the possibility that a
library will adjust resource usage to explain why
InitializeMaxBackends() is where it is.

Nathan


Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2021-08-02 22:35:13 +0000, Bossart, Nathan wrote:
> On 8/2/21, 3:12 PM, "Andres Freund" <andres@anarazel.de> wrote:
> > I think this is overblown. We already size resources *after*
> > shared_preload_libraries' _PG_init() runs, because that's the whole point of
> > shared_preload_libraries. What's proposed in this thread is to *disallow*
> > increasing resource usage in s_p_l _PG_init(), to make one specific case
> > simpler - but it'll actually also make things more complicated, because other
> > resources will still only be sized after all of s_p_l has been processed.
> 
> True.  Perhaps the comments should reference the possibility that a
> library will adjust resource usage to explain why
> InitializeMaxBackends() is where it is.

I've wondered, independent of this thread, about not making MaxBackends
externally visible, and requiring a function call to access it. It should be
easier to find cases of misuse if we errored out when accessed at the wrong
time. And we could use that opportunity to add flags that determine which
types of backends are included (e.g. not including autovac, or additionally
including aux workers or prepared xacts).

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/2/21, 3:42 PM, "Andres Freund" <andres@anarazel.de> wrote:
> I've wondered, independent of this thread, about not making MaxBackends
> externally visible, and requiring a function call to access it. It should be
> easier to find cases of misuse if we errored out when accessed at the wrong
> time. And we could use that opportunity to add flags that determine which
> types of backends are included (e.g. not including autovac, or additionally
> including aux workers or prepared xacts).

I'm not opposed to this.  I can work on putting a patch together if no
opposition materializes.

Nathan


RE: make MaxBackends available in _PG_init

From
"wangsh.fnst@fujitsu.com"
Date:
Hi,

Bossart, Nathan <bossartn@amazon.com> wrote:

> I just encountered the same thing, so I am bumping this thread.
Thank you for bumping this thread.

> > I've wondered, independent of this thread, about not making MaxBackends
> > externally visible, and requiring a function call to access it. It should be
> > easier to find cases of misuse if we errored out when accessed at the wrong
> > time. 

> I'm not opposed to this.  I can work on putting a patch together if no
> opposition materializes.

I think this is a good idea.

Shenhao Wang
Best regards

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Mon, Aug 2, 2021 at 4:37 PM Andres Freund <andres@anarazel.de> wrote:
> I'm not so sure it's a good idea. I've seen several shared_preload_library
> using extensions that adjust some GUCs (e.g. max_prepared_transactions)
> because they need some more resources internally - that's perhaps not a great
> idea, but there's also not an obviously better way.

Blech.

I'm fine with solving the problem some other way, but I say again, blech.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/2/21, 4:02 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 8/2/21, 3:42 PM, "Andres Freund" <andres@anarazel.de> wrote:
>> I've wondered, independent of this thread, about not making MaxBackends
>> externally visible, and requiring a function call to access it. It should be
>> easier to find cases of misuse if we errored out when accessed at the wrong
>> time. And we could use that opportunity to add flags that determine which
>> types of backends are included (e.g. not including autovac, or additionally
>> including aux workers or prepared xacts).
>
> I'm not opposed to this.  I can work on putting a patch together if no
> opposition materializes.

Here is a first attempt.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/3/21, 4:14 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 8/2/21, 4:02 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>> On 8/2/21, 3:42 PM, "Andres Freund" <andres@anarazel.de> wrote:
>>> I've wondered, independent of this thread, about not making MaxBackends
>>> externally visible, and requiring a function call to access it. It should be
>>> easier to find cases of misuse if we errored out when accessed at the wrong
>>> time. And we could use that opportunity to add flags that determine which
>>> types of backends are included (e.g. not including autovac, or additionally
>>> including aux workers or prepared xacts).
>>
>> I'm not opposed to this.  I can work on putting a patch together if no
>> opposition materializes.
>
> Here is a first attempt.

Here is a rebased version of the patch.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
Greg Sabino Mullane
Date:
On Sat, Aug 7, 2021 at 2:01 PM Bossart, Nathan <bossartn@amazon.com> wrote:
Here is a rebased version of the patch.

Giving this a review.

Patch applies cleanly and `make check` works as of e12694523e7e4482a052236f12d3d8b58be9a22c

Overall looks very nice and tucks MaxBackends safely away.
I have a few suggestions:

> + size = add_size(size, mul_size(GetMaxBackends(0), sizeof(BTOneVacInfo)));

The use of GetMaxBackends(0) looks weird - can we add another constant in there
for the "default" case? Or just have GetMaxBackends() work?


> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
s/include/add in/


>  +typedef enum GMBOption
> +{
> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
> + GMB_NUM_AUXILIARY_PROCS = 1 << 1 /* include NUM_AUXILIARY_PROCS */
> +} GMBOption;

Is a typedef enum really needed? As opposed to something like this style:

#define WL_LATCH_SET         (1 << 0)
#define WL_SOCKET_READABLE   (1 << 1)
#define WL_SOCKET_WRITEABLE  (1 << 2)


> -  (MaxBackends + max_prepared_xacts + 1));
> +  (GetMaxBackends(GMB_MAX_PREPARED_XACTS) + 1));

This is a little confusing - there is no indication to the reader that this is an additive function. Perhaps a little more intuitive name:

> +  (GetMaxBackends(GMB_PLUS_MAX_PREPARED_XACTS) + 1));

However, the more I went through this patch, the more the GetMaxBackends(0) nagged at me. The vast majority of the calls are with "0". I'd argue for just having no arguments at all, which removes a bit of code and actually makes things like this easier to read:

Original change:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends(GMB_NUM_AUXILIARY_PROCS | GMB_MAX_PREPARED_XACTS);

Versus:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends() + NUM_AUXILIARY_PROCS + max_prepared_xacts;


> + * This must be called after modules have had the change to alter GUCs in
> + * shared_preload_libraries, and before shared memory size is determined.
s/change/chance/;


> +void
> +SetMaxBackends(int max_backends)
> +{
> + if (MaxBackendsInitialized)
> +  elog(ERROR, "MaxBackends already initialized");

Is this going to get tripped by a call from restore_backend_variables?

Cheers,
Greg
 

Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/9/21, 1:14 PM, "Greg Sabino Mullane" <htamfids@gmail.com> wrote:
> Giving this a review.

Thanks for your review.

> However, the more I went through this patch, the more the GetMaxBackends(0) nagged at me. The vast majority of the
callsare with "0". I'd argue for just having no arguments at all, which removes a bit of code and actually makes things
likethis easier to read:
 

I agree.  The argument is nonzero in less than half of the calls to
GetMaxBackends(), and I'm not sure it adds a whole lot of value in the
first place.  I removed the argument in v3.

>> + * This must be called after modules have had the change to alter GUCs in
>> + * shared_preload_libraries, and before shared memory size is determined.
> s/change/chance/;

I fixed this in v3.

>> +void
>> +SetMaxBackends(int max_backends)
>> +{
>> + if (MaxBackendsInitialized)
>> +  elog(ERROR, "MaxBackends already initialized");
>
> Is this going to get tripped by a call from restore_backend_variables?

I ran 'make check-world' with EXEC_BACKEND with no problems, so I
don't think so.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
Greg Sabino Mullane
Date:
On Mon, Aug 9, 2021 at 8:22 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Is this going to get tripped by a call from restore_backend_variables?

I ran 'make check-world' with EXEC_BACKEND with no problems, so I
don't think so.

v3 looks good, but I'm still not sure how to test the bit mentioned above. I'm not familiar with this part of the code (SubPostmasterMain etc.), but running make check-world with EXEC_BACKEND does not seem to execute that code, as I added exit(1) to restore_backend_variables() and the tests still ran fine. Further digging shows that even though the #ifdef EXEC_BACKEND path is triggered, no --fork argument was being passed. Is there something else one needs to provide to force that --fork (see line 189 of src/backend/main/main.c) when testing?

Cheers,
Greg


Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Greg Sabino Mullane <htamfids@gmail.com> writes:
> v3 looks good, but I'm still not sure how to test the bit mentioned above.
> I'm not familiar with this part of the code (SubPostmasterMain etc.), but
> running make check-world with EXEC_BACKEND does not seem to execute that
> code, as I added exit(1) to restore_backend_variables() and the tests still
> ran fine.

You must not have enabled EXEC_BACKEND properly.  It's a compile-time
#define that affects multiple modules, so it's easy to get wrong.
The way I usually turn it on is

    make distclean
    ./configure ... options of choice ...
    edit src/include/pg_config.h, add "#define EXEC_BACKEND" line
    make, install, test

In this way the setting is persistent till the next distclean/configure
cycle.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Greg Sabino Mullane
Date:
On Wed, Aug 11, 2021 at 10:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
You must not have enabled EXEC_BACKEND properly.  It's a compile-time
#define that affects multiple modules, so it's easy to get wrong.
The way I usually turn it on is

Thank you. I was able to get it all working, and withdraw any objections to that bit of the patch. :)

Cheers,
Greg
 

RE: make MaxBackends available in _PG_init

From
"wangsh.fnst@fujitsu.com"
Date:
Hi,

I noticed that in v3-0001-Disallow-external-access-to-MaxBackends.patch, there are some modifications like:

> -        for (int i = 0; i <= MaxBackends; i++)
> +        for (int i = 0; i <= GetMaxBackends(); i++)

I don't think calling function GetMaxBackends() in the for loop is a good idea.  
How about use a temp variable to save the return value of function GetMaxBackends() ?

Regards,
Shenhao Wang

Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 8/15/21, 1:05 AM, "wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com> wrote:
> I don't think calling function GetMaxBackends() in the for loop is a good idea.
> How about use a temp variable to save the return value of function GetMaxBackends() ?

I did this in v4.  There may be a couple of remaining places that call
GetMaxBackends() several times, but the function should be relatively
inexpensive.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
Fujii Masao
Date:

On 2021/08/16 13:02, Bossart, Nathan wrote:
> On 8/15/21, 1:05 AM, "wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com> wrote:
>> I don't think calling function GetMaxBackends() in the for loop is a good idea.
>> How about use a temp variable to save the return value of function GetMaxBackends() ?
> 
> I did this in v4.  There may be a couple of remaining places that call
> GetMaxBackends() several times, but the function should be relatively
> inexpensive.

The patch handles only MaxBackends. But isn't there other variable having the same issue?

It seems overkill to remove "extern" from MaxBackends and replace MaxBackends with GetMaxBackends() in the existing
PostgreSQLcodes. I'm not sure how much it's actually worth doing that.  Instead, isn't it enough to just add the
commentlike "Use GetMaxBackends() if you want to treat the lookup for uninitialized MaxBackends as an error" in the
definitionof MaxBackends?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 1/7/22, 8:54 AM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:
> The patch handles only MaxBackends. But isn't there other variable having the same issue?

I wouldn't be surprised to learn of other cases, but I've only
encountered this specific issue with MaxBackends.  I think MaxBackends
is notable because it is more likely to be used by preloaded libraries
but is intentionally initialized after loading them.  As noted in
an earlier message on this thread [0], using MaxBackends in a call to
RequestAddinShmemSpace() in _PG_init() may not reliably cause
problems, too.

> It seems overkill to remove "extern" from MaxBackends and replace MaxBackends with GetMaxBackends() in the existing
PostgreSQLcodes. I'm not sure how much it's actually worth doing that.  Instead, isn't it enough to just add the
commentlike "Use GetMaxBackends() if you want to treat the lookup for uninitialized MaxBackends as an error" in the
definitionof MaxBackends?
 

While that approach would provide a way to safely retrieve the value,
I think it would do little to prevent the issue in practice.  If the
size of the patch is a concern, we could also convert MaxBackends into
a macro for calling GetMaxBackends().  This could also be a nice way
to avoid breaking extensions that are using it correctly while
triggering ERRORs for extensions that are using it incorrectly.  I've
attached a new version of the patch that does it this way.

Nathan

[0] https://postgr.es/m/8499D41B-628A-4CE0-9139-00D718F9D06B%40amazon.com


Attachment

Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 1/7/22, 10:12 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> While that approach would provide a way to safely retrieve the value,
> I think it would do little to prevent the issue in practice.  If the
> size of the patch is a concern, we could also convert MaxBackends into
> a macro for calling GetMaxBackends().  This could also be a nice way
> to avoid breaking extensions that are using it correctly while
> triggering ERRORs for extensions that are using it incorrectly.  I've
> attached a new version of the patch that does it this way.

v5 didn't work with EXEC_BACKEND.  Here is a new revision.

Nathan


Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, Jan 7, 2022 at 1:09 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> I wouldn't be surprised to learn of other cases, but I've only
> encountered this specific issue with MaxBackends.  I think MaxBackends
> is notable because it is more likely to be used by preloaded libraries
> but is intentionally initialized after loading them.  As noted in
> an earlier message on this thread [0], using MaxBackends in a call to
> RequestAddinShmemSpace() in _PG_init() may not reliably cause
> problems, too.

Yes, I think MaxBackends is a particularly severe case. I've seen this
problem with that GUC multiple times, and never with any other one.

> > It seems overkill to remove "extern" from MaxBackends and replace MaxBackends with GetMaxBackends() in the existing
PostgreSQLcodes. I'm not sure how much it's actually worth doing that.  Instead, isn't it enough to just add the
commentlike "Use GetMaxBackends() if you want to treat the lookup for uninitialized MaxBackends as an error" in the
definitionof MaxBackends? 
>
> While that approach would provide a way to safely retrieve the value,
> I think it would do little to prevent the issue in practice.  If the
> size of the patch is a concern, we could also convert MaxBackends into
> a macro for calling GetMaxBackends().  This could also be a nice way
> to avoid breaking extensions that are using it correctly while
> triggering ERRORs for extensions that are using it incorrectly.  I've
> attached a new version of the patch that does it this way.

That's too magical for my taste.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 1/7/22, 12:27 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Fri, Jan 7, 2022 at 1:09 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> While that approach would provide a way to safely retrieve the value,
>> I think it would do little to prevent the issue in practice.  If the
>> size of the patch is a concern, we could also convert MaxBackends into
>> a macro for calling GetMaxBackends().  This could also be a nice way
>> to avoid breaking extensions that are using it correctly while
>> triggering ERRORs for extensions that are using it incorrectly.  I've
>> attached a new version of the patch that does it this way.
>
> That's too magical for my taste.

Fair point.  v4 [0] is the less magical version.

Nathan

[0] https://postgr.es/m/attachment/125445/v4-0001-Disallow-external-access-to-MaxBackends.patch


Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Mon, Jan 10, 2022 at 07:01:32PM +0000, Bossart, Nathan wrote:
> On 1/7/22, 12:27 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 7, 2022 at 1:09 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>>> While that approach would provide a way to safely retrieve the value,
>>> I think it would do little to prevent the issue in practice.  If the
>>> size of the patch is a concern, we could also convert MaxBackends into
>>> a macro for calling GetMaxBackends().  This could also be a nice way
>>> to avoid breaking extensions that are using it correctly while
>>> triggering ERRORs for extensions that are using it incorrectly.  I've
>>> attached a new version of the patch that does it this way.
>>
>> That's too magical for my taste.
>
> Fair point.  v4 [0] is the less magical version.

So, where are we on this patch?  It looks like there is an agreement
that MaxBackends is used widely enough that it justifies the use of a
separate function to set and get a better value computed.  There may
be other parameters that could use a brush up, but most known cases
would be addressed here.  v4 looks rather straight-forward, at quick
glance.

(I'd also prefer the less magical version.)
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
"Bossart, Nathan"
Date:
On 1/25/22, 12:01 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> So, where are we on this patch?  It looks like there is an agreement
> that MaxBackends is used widely enough that it justifies the use of a
> separate function to set and get a better value computed.  There may
> be other parameters that could use a brush up, but most known cases
> would be addressed here.  v4 looks rather straight-forward, at quick
> glance.

I think the patch is in decent shape.  There may be a few remaining
places where GetMaxBackends() is called repeatedly in the same
function, but IIRC v4 already clears up the obvious ones.  I don't
know if this is worth worrying about too much, but I can create a new
version if you think it is important.

Nathan


Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Tue, Jan 25, 2022 at 07:30:33PM +0000, Bossart, Nathan wrote:
> I think the patch is in decent shape.  There may be a few remaining
> places where GetMaxBackends() is called repeatedly in the same
> function, but IIRC v4 already clears up the obvious ones.  I don't
> know if this is worth worrying about too much, but I can create a new
> version if you think it is important.

There are such cases in FindLockCycleRecurse(), GetLockConflicts(),
GetLockStatusData() and InitProcGlobal(), as far as I can see.

Hmm.  I have been looking at this patch, and the lack of centralized
solution that could be used for other GUCs worries me like Fujii-san,
even if this would prevent an incorrect use of MaxBackends in contexts
where it should not be used because it is not initialized yet.  I
don't think it is a good idea in the long-term to apply this as-is.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Thu, Jan 27, 2022 at 09:56:04AM +0900, Michael Paquier wrote:
> Hmm.  I have been looking at this patch, and the lack of centralized
> solution that could be used for other GUCs worries me like Fujii-san,
> even if this would prevent an incorrect use of MaxBackends in contexts
> where it should not be used because it is not initialized yet.  I
> don't think it is a good idea in the long-term to apply this as-is.

Alright.  I think the comment adjustments still apply, so I split those out
to a new patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Thu, Jan 27, 2022 at 10:18:15AM -0800, Nathan Bossart wrote:
> On Thu, Jan 27, 2022 at 09:56:04AM +0900, Michael Paquier wrote:
>> Hmm.  I have been looking at this patch, and the lack of centralized
>> solution that could be used for other GUCs worries me like Fujii-san,
>> even if this would prevent an incorrect use of MaxBackends in contexts
>> where it should not be used because it is not initialized yet.  I
>> don't think it is a good idea in the long-term to apply this as-is.
>
> Alright.  I think the comment adjustments still apply, so I split those out
> to a new patch.

No objections to this part from here, though I have not checked if
there are other areas that may require such an adjustment.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Thu, Jan 27, 2022 at 10:18:15AM -0800, Nathan Bossart wrote:
> Alright.  I think the comment adjustments still apply, so I split those out
> to a new patch.

Looks fine after a second look, so applied.

As of the issues of this thread, we really have two things to think
about:
1) How do we want to control the access of some parameters in a
context or another?  One idea would be more control through GUCs, say
with a set of context-related flags that prevent the read of some
variables until they are set.  We could encourage the use of
GetConfigOption() for that.  For MaxBackends, we could add a read-only
GUC for this purpose.  That's what Andres hinted at upthread, I
guess.
2) How do we deal with unwanted access of shared parameters?  This one
is not really controllable, is it?  And we are talking about much more
than MaxBackends.  This could perhaps be addressed with more
documentation in the headers for the concerned variables, as a first
step.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Sat, Jan 29, 2022 at 11:19:12AM +0900, Michael Paquier wrote:
> On Thu, Jan 27, 2022 at 10:18:15AM -0800, Nathan Bossart wrote:
>> Alright.  I think the comment adjustments still apply, so I split those out
>> to a new patch.
> 
> Looks fine after a second look, so applied.

Thanks!

> As of the issues of this thread, we really have two things to think
> about:
> 1) How do we want to control the access of some parameters in a
> context or another?  One idea would be more control through GUCs, say
> with a set of context-related flags that prevent the read of some
> variables until they are set.  We could encourage the use of
> GetConfigOption() for that.  For MaxBackends, we could add a read-only
> GUC for this purpose.  That's what Andres hinted at upthread, I
> guess.
> 2) How do we deal with unwanted access of shared parameters?  This one
> is not really controllable, is it?  And we are talking about much more
> than MaxBackends.  This could perhaps be addressed with more
> documentation in the headers for the concerned variables, as a first
> step.

Hm.  Perhaps we should understand the full scope of the problem first.
What else besides MaxBackends and the shared memory GUCs won't be properly
initialized when the shared_preload_libraries' _PG_init() functions are
called?  MaxBackends seems to be the only one that folks have experienced
problems with, which is why I initially zeroed in on it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Fri, Jan 28, 2022 at 08:33:42PM -0800, Nathan Bossart wrote:
> Hm.  Perhaps we should understand the full scope of the problem first.
> What else besides MaxBackends and the shared memory GUCs won't be properly
> initialized when the shared_preload_libraries' _PG_init() functions are
> called?  MaxBackends seems to be the only one that folks have experienced
> problems with, which is why I initially zeroed in on it.

Yeah, it would be good to know the scope before defining the limits
of what could be done.  Another thing may be the interactions with
session_preload_libraries and local_preload_libraries.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Sat, Jan 29, 2022 at 02:24:24PM +0900, Michael Paquier wrote:
> Yeah, it would be good to know the scope before defining the limits
> of what could be done.  Another thing may be the interactions with
> session_preload_libraries and local_preload_libraries.

Worth noting that I have marked marked this patch as committed in the
CF app, as there is nothing else to do for now.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, Jan 28, 2022 at 9:19 PM Michael Paquier <michael@paquier.xyz> wrote:
> As of the issues of this thread, we really have two things to think
> about:
> 1) How do we want to control the access of some parameters in a
> context or another?  One idea would be more control through GUCs, say
> with a set of context-related flags that prevent the read of some
> variables until they are set.  We could encourage the use of
> GetConfigOption() for that.  For MaxBackends, we could add a read-only
> GUC for this purpose.  That's what Andres hinted at upthread, I
> guess.
> 2) How do we deal with unwanted access of shared parameters?  This one
> is not really controllable, is it?  And we are talking about much more
> than MaxBackends.  This could perhaps be addressed with more
> documentation in the headers for the concerned variables, as a first
> step.

I think you are mischaracterizing the problem. MaxBackends is not
max_connections. It is a value which is computed based on the value of
max_connections and various other GUCs. And that's the problem. If you
have a C variable whose value should depend on a single GUC, the GUC
system will generally do what you want automatically. If it doesn't,
you can use the GUC check hook/assign hook machinery to update as many
global variables as you like whenever the GUC itself is updated.
Moreover, a GUC always has some legal and thus halfway reasonable
value. We read postgresql.conf super-early in the startup sequence,
because we know GUCs are super-important, and even before that
settings have bootstrap values which are not usually totally insane.
The bottom line is that if you have a global variable whose value
depends on one GUC, you can probably just read that global variable
and call it good.

The main reason why this doesn't work for MaxBackends is that
MaxBackends depends on the values of multiple GUCs. There is a further
wrinkle too, which is that none of those GUCs can change, and
therefore code does things with the resulting value on the assumption
that they won't change, like size shared-memory data structures.
Therefore, if you read the wrong value, you've got a big problem. So
the real issues here IMHO are about the difficulty of making sure that
(1) when a GUC changes, we update all of the things that depend on it
including things which may also depend on other GUCs and (2) making
sure that values which can't ever change are computed before they are
used.

I don't know what the solution to problem #1 is, but the solution to
problem #2 is simple: make people call a function to get the value
rather than just reading a bare variable. GetConfigOption() is not a
good solution for people aiming to write C code that does useful
things, because it delivers the value as a string, and that is not
what you want. But an accessor function like GetMaxBackends() for a
quantity of this type is wonderful. Depending on the situation, you
might choose to have the accessor function [a] fail an assertion if
the value is not available yet or [b] compute the value if they value
has not yet been computed or [c] do the latter if possible, otherwise
the former. But the fact that you are making code call a function
rather than just read a variable gives you a very strong tool to make
sure that someone can't blindly read a 0 or whatever instead of the
real value.

Therefore, it is my opinion that the proposed patch was pretty much
right on the money and that we ought to get it committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Mon, Jan 31, 2022 at 09:32:21AM -0500, Robert Haas wrote:
> The main reason why this doesn't work for MaxBackends is that
> MaxBackends depends on the values of multiple GUCs. There is a further
> wrinkle too, which is that none of those GUCs can change, and
> therefore code does things with the resulting value on the assumption
> that they won't change, like size shared-memory data structures.
> Therefore, if you read the wrong value, you've got a big problem. So
> the real issues here IMHO are about the difficulty of making sure that
> (1) when a GUC changes, we update all of the things that depend on it
> including things which may also depend on other GUCs and (2) making
> sure that values which can't ever change are computed before they are
> used.
> 
> I don't know what the solution to problem #1 is, but the solution to
> problem #2 is simple: make people call a function to get the value
> rather than just reading a bare variable. GetConfigOption() is not a
> good solution for people aiming to write C code that does useful
> things, because it delivers the value as a string, and that is not
> what you want. But an accessor function like GetMaxBackends() for a
> quantity of this type is wonderful. Depending on the situation, you
> might choose to have the accessor function [a] fail an assertion if
> the value is not available yet or [b] compute the value if they value
> has not yet been computed or [c] do the latter if possible, otherwise
> the former. But the fact that you are making code call a function
> rather than just read a variable gives you a very strong tool to make
> sure that someone can't blindly read a 0 or whatever instead of the
> real value.

+1

I can work on a new patch if this is the direction we want to go.  There
were a couple of functions that called GetMaxBackends() repetitively that I
should probably fix before the patch should be seriously considered.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Feb 1, 2022 at 5:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I can work on a new patch if this is the direction we want to go.  There
> were a couple of functions that called GetMaxBackends() repetitively that I
> should probably fix before the patch should be seriously considered.

Sure, that sort of thing should be tidied up. It's unlikely to make
any real difference, but as a fairly prominent PostgreSQL hacker once
said, a cycle saved is a cycle earned.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, Feb 02, 2022 at 08:15:02AM -0500, Robert Haas wrote:
> On Tue, Feb 1, 2022 at 5:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I can work on a new patch if this is the direction we want to go.  There
>> were a couple of functions that called GetMaxBackends() repetitively that I
>> should probably fix before the patch should be seriously considered.
> 
> Sure, that sort of thing should be tidied up. It's unlikely to make
> any real difference, but as a fairly prominent PostgreSQL hacker once
> said, a cycle saved is a cycle earned.

Here is a new patch with fewer calls to GetMaxBackends().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 12:55 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Here is a new patch with fewer calls to GetMaxBackends().

For multixact.c, I think you should invent GetMaxOldestSlot() to avoid
confusion. Maybe it could be a static inline rather than a macro.

Likewise, I think PROCARRAY_MAXPROCS, NumProcSignalSlots, and
NumBackendStatSlots should be replaced with things that look more like
function calls.

Apart from that, I think this looks pretty good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Fri, Feb 04, 2022 at 08:46:39AM -0500, Robert Haas wrote:
> For multixact.c, I think you should invent GetMaxOldestSlot() to avoid
> confusion. Maybe it could be a static inline rather than a macro.
> 
> Likewise, I think PROCARRAY_MAXPROCS, NumProcSignalSlots, and
> NumBackendStatSlots should be replaced with things that look more like
> function calls.

Sorry, I did notice that it looked odd, and I should've done this in v8 and
saved a round trip.  Here's a new revision with those macros converted to
inline functions.  I've also dialed things back a little in some places
where a new variable felt excessive.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 3:13 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Fri, Feb 04, 2022 at 08:46:39AM -0500, Robert Haas wrote:
> > For multixact.c, I think you should invent GetMaxOldestSlot() to avoid
> > confusion. Maybe it could be a static inline rather than a macro.
> >
> > Likewise, I think PROCARRAY_MAXPROCS, NumProcSignalSlots, and
> > NumBackendStatSlots should be replaced with things that look more like
> > function calls.
>
> Sorry, I did notice that it looked odd, and I should've done this in v8 and
> saved a round trip.  Here's a new revision with those macros converted to
> inline functions.  I've also dialed things back a little in some places
> where a new variable felt excessive.

Great. I'll take a look at this next week, but not right now, mostly
because it's Friday afternoon and if I commit it and anything breaks I
don't want to end up having to fix it on the weekend if I can avoid
it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 3:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Great. I'll take a look at this next week, but not right now, mostly
> because it's Friday afternoon and if I commit it and anything breaks I
> don't want to end up having to fix it on the weekend if I can avoid
> it.

After some investigation I've determined that it's no longer Friday
afternoon. I also spent time investigating whether the patch had
problems that would make me uncomfortable with the idea of committing
it, and I did not find any. So, I committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Tue, Feb 08, 2022 at 04:12:26PM -0500, Robert Haas wrote:
> After some investigation I've determined that it's no longer Friday
> afternoon. I also spent time investigating whether the patch had
> problems that would make me uncomfortable with the idea of committing
> it, and I did not find any. So, I committed it.

@@ -1641,8 +1642,8 @@ SignalBackends(void)
      * XXX in principle these pallocs could fail, which would be bad. Maybe
      * preallocate the arrays?  They're not that large, though.
      */
-    pids = (int32 *) palloc(MaxBackends * sizeof(int32));
-    ids = (BackendId *) palloc(MaxBackends * sizeof(BackendId));
+    pids = (int32 *) palloc(GetMaxBackends() * sizeof(int32));
+    ids = (BackendId *) palloc(GetMaxBackends() * sizeof(BackendId));

You could have optimized this one, while on it, as well as the ones in
pgstat_beinit() and pg_safe_snapshot_blocking_pids().  It is not hot,
but you did that for all the other callers of GetMaxBackends().  Just
saying..
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Feb 8, 2022 at 9:38 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Feb 08, 2022 at 04:12:26PM -0500, Robert Haas wrote:
> > After some investigation I've determined that it's no longer Friday
> > afternoon. I also spent time investigating whether the patch had
> > problems that would make me uncomfortable with the idea of committing
> > it, and I did not find any. So, I committed it.
>
> @@ -1641,8 +1642,8 @@ SignalBackends(void)
>          * XXX in principle these pallocs could fail, which would be bad. Maybe
>          * preallocate the arrays?  They're not that large, though.
>          */
> -       pids = (int32 *) palloc(MaxBackends * sizeof(int32));
> -       ids = (BackendId *) palloc(MaxBackends * sizeof(BackendId));
> +       pids = (int32 *) palloc(GetMaxBackends() * sizeof(int32));
> +       ids = (BackendId *) palloc(GetMaxBackends() * sizeof(BackendId));
>
> You could have optimized this one, while on it, as well as the ones in
> pgstat_beinit() and pg_safe_snapshot_blocking_pids().  It is not hot,
> but you did that for all the other callers of GetMaxBackends().  Just
> saying..

Well I didn't do anything myself except review and commit Nathan's
patch, so I suppose you mean he could have done that, but fair enough.
I don't mind if you want to change it around.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Tue, Feb 08, 2022 at 10:57:37PM -0500, Robert Haas wrote:
> Well I didn't do anything myself except review and commit Nathan's
> patch, so I suppose you mean he could have done that, but fair enough.
> I don't mind if you want to change it around.

Okay, I'd rather apply the same rule everywhere for consistency, then,
like in the attached.  That's minimal, still.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, Feb 9, 2022 at 12:18 AM Michael Paquier <michael@paquier.xyz> wrote:
> Okay, I'd rather apply the same rule everywhere for consistency, then,
> like in the attached.  That's minimal, still.

That's fine with me. In the interest of full disclosure, I did kind of
notice this when reviewing the patch, though perhaps not every
instance, and just decided that it didn't seem important enough to
worry about. I'm totally OK with you thinking otherwise, though,
especially since you also volunteered to do the work thus generated.
:-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Feb 08, 2022 at 04:12:26PM -0500, Robert Haas wrote:
> After some investigation I've determined that it's no longer Friday
> afternoon.

This matches my analysis.

> I also spent time investigating whether the patch had
> problems that would make me uncomfortable with the idea of committing
> it, and I did not find any. So, I committed it.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, Feb 09, 2022 at 12:31:16PM -0500, Robert Haas wrote:
> On Wed, Feb 9, 2022 at 12:18 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Okay, I'd rather apply the same rule everywhere for consistency, then,
>> like in the attached.  That's minimal, still.
> 
> That's fine with me. In the interest of full disclosure, I did kind of
> notice this when reviewing the patch, though perhaps not every
> instance, and just decided that it didn't seem important enough to
> worry about. I'm totally OK with you thinking otherwise, though,
> especially since you also volunteered to do the work thus generated.
> :-)

This is fine with me as well.  I only left these out because the extra
variable felt unnecessary to me for these functions.

While you are at it, would you mind fixing the misspelling of
OldestVisibleMXactId in multixact.c (as per the attached)?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Wed, Feb 09, 2022 at 09:53:38AM -0800, Nathan Bossart wrote:
> On Wed, Feb 09, 2022 at 12:31:16PM -0500, Robert Haas wrote:
>> On Wed, Feb 9, 2022 at 12:18 AM Michael Paquier <michael@paquier.xyz> wrote:
>>> Okay, I'd rather apply the same rule everywhere for consistency, then,
>>> like in the attached.  That's minimal, still.
>>
>> That's fine with me. In the interest of full disclosure, I did kind of
>> notice this when reviewing the patch, though perhaps not every
>> instance, and just decided that it didn't seem important enough to
>> worry about. I'm totally OK with you thinking otherwise, though,
>> especially since you also volunteered to do the work thus generated.
>> :-)

I guessed so :p

> This is fine with me as well.  I only left these out because the extra
> variable felt unnecessary to me for these functions.

Okay, done, then.

> While you are at it, would you mind fixing the misspelling of
> OldestVisibleMXactId in multixact.c (as per the attached)?

Indeed.  Fixed as well.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Thu, Feb 10, 2022 at 10:47:39AM +0900, Michael Paquier wrote:
> On Wed, Feb 09, 2022 at 09:53:38AM -0800, Nathan Bossart wrote:
>> This is fine with me as well.  I only left these out because the extra
>> variable felt unnecessary to me for these functions.
> 
> Okay, done, then.
> 
>> While you are at it, would you mind fixing the misspelling of
>> OldestVisibleMXactId in multixact.c (as per the attached)?
> 
> Indeed.  Fixed as well.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
Hi,

Sorry for showing up this late, but I'm a bit confused with the new situation.

Unless I'm missing something, the new situation is that the system is supposed
to prevent access to MaxBackends during s_p_l_pg_init, for reasons I totally
agree with, but without doing anything for extensions that actually need to
access it at that time.  So what are extensions supposed to do now if they do
need the information during their _PG_init() / RequestAddinShmemSpace()?

Note that I have two of such extensions.  They actually only need it to give
access to the queryid in the background workers since it's otherwise not
available, but there's at least pg_wait_sampling extension that needs the value
to request shmem for other needs.

One funny note is that my extensions aren't using MaxBackends but instead
compute it based on MaxConnections, autovacuum_max_workers and so on.  So those
two extensions aren't currently broken, or more accurately not more than they
previously were.  Is there any reasoning to not protect all the variables that
contribute to MaxBackends?



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, Mar 23, 2022 at 12:53 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Unless I'm missing something, the new situation is that the system is supposed
> to prevent access to MaxBackends during s_p_l_pg_init, for reasons I totally
> agree with, but without doing anything for extensions that actually need to
> access it at that time.  So what are extensions supposed to do now if they do
> need the information during their _PG_init() / RequestAddinShmemSpace()?

Well, the conclusion upthread was that extensions might change the
values of those GUCs from _PG_init(). If that's a real thing, then
what you're asking for here is impossible, because the final value is
indeterminate until all such extensions have finished twiddling those
the GUCs. On the other hand, it's definitely intended that extensions
should RequestAddinShmemSpace() from _PG_init(), and wanting to size
that memory based on MaxBackends is totally reasonable. Do we need to
add another function, alongside _PG_init(), that gets called after
MaxBackends is determined and before it's too late to
RequestAddinShmemSpace()?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Wed, Mar 23, 2022 at 08:32:39AM -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 12:53 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Unless I'm missing something, the new situation is that the system is supposed
> > to prevent access to MaxBackends during s_p_l_pg_init, for reasons I totally
> > agree with, but without doing anything for extensions that actually need to
> > access it at that time.  So what are extensions supposed to do now if they do
> > need the information during their _PG_init() / RequestAddinShmemSpace()?
> 
> Well, the conclusion upthread was that extensions might change the
> values of those GUCs from _PG_init(). If that's a real thing, then
> what you're asking for here is impossible, because the final value is
> indeterminate until all such extensions have finished twiddling those
> the GUCs. On the other hand, it's definitely intended that extensions
> should RequestAddinShmemSpace() from _PG_init(), and wanting to size
> that memory based on MaxBackends is totally reasonable. Do we need to
> add another function, alongside _PG_init(), that gets called after
> MaxBackends is determined and before it's too late to
> RequestAddinShmemSpace()?

Yes, I don't see how we can support such extensions without an additional hook,
probably called right after InitializeMaxBackends() since at least
InitializeShmemGUCs() will need to know about those extra memory needs.

I'm not sure how to prevent third party code from messing with the gucs in it,
but a clear indication in the hook comment should probably be enough.  It's not
like it's hard for third-party code author to break something anyway.

And should we do something about MaxConnections, autovacuum_max_workers,
max_worker_processes and max_wal_senders too?  It's unlikely that I'm the only
one who's computing my own MaxBackends to workaround the previous 0 value that
was available during _PG_init, and I'm not sure that everyone will notice such
a new hook.



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, Mar 23, 2022 at 09:03:18PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 23, 2022 at 08:32:39AM -0400, Robert Haas wrote:
>> Well, the conclusion upthread was that extensions might change the
>> values of those GUCs from _PG_init(). If that's a real thing, then
>> what you're asking for here is impossible, because the final value is
>> indeterminate until all such extensions have finished twiddling those
>> the GUCs. On the other hand, it's definitely intended that extensions
>> should RequestAddinShmemSpace() from _PG_init(), and wanting to size
>> that memory based on MaxBackends is totally reasonable. Do we need to
>> add another function, alongside _PG_init(), that gets called after
>> MaxBackends is determined and before it's too late to
>> RequestAddinShmemSpace()?
> 
> Yes, I don't see how we can support such extensions without an additional hook,
> probably called right after InitializeMaxBackends() since at least
> InitializeShmemGUCs() will need to know about those extra memory needs.

Another possibility could be to add a hook that is called _before_
_PG_init() where libraries are permitted to adjust GUCs.  After the library
is loaded, we first call this _PG_change_GUCs() function, then we
initialize MaxBackends, and then we finally call _PG_init().  This way,
extensions would have access to MaxBackends within _PG_init(), and if an
extension really needed to alter GUCs, іt could define this new function.

> I'm not sure how to prevent third party code from messing with the gucs in it,
> but a clear indication in the hook comment should probably be enough.  It's not
> like it's hard for third-party code author to break something anyway.

ERROR-ing in SetConfigOption() might be another way to dissuade folks from
messing with GUCs.  This is obviously not a perfect solution.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Thu, Mar 24, 2022 at 4:20 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Another possibility could be to add a hook that is called _before_
> _PG_init() where libraries are permitted to adjust GUCs.  After the library
> is loaded, we first call this _PG_change_GUCs() function, then we
> initialize MaxBackends, and then we finally call _PG_init().  This way,
> extensions would have access to MaxBackends within _PG_init(), and if an
> extension really needed to alter GUCs, іt could define this new function.

Yeah, I think this might be better.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Thu, Mar 24, 2022 at 04:27:36PM -0400, Robert Haas wrote:
> On Thu, Mar 24, 2022 at 4:20 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > Another possibility could be to add a hook that is called _before_
> > _PG_init() where libraries are permitted to adjust GUCs.  After the library
> > is loaded, we first call this _PG_change_GUCs() function, then we
> > initialize MaxBackends, and then we finally call _PG_init().  This way,
> > extensions would have access to MaxBackends within _PG_init(), and if an
> > extension really needed to alter GUCs, іt could define this new function.
> 
> Yeah, I think this might be better.

Well, if it's before _PG_init() then it won't be a new hook but a new symbol
that has to be handled with dlsym.

But it seems to be that the bigger problem is that this approach won't fix
anything unless we can prevent third-party code from messing with GUCs after
that point as we could otherwise have discrepancies between GetMaxBackends()
and the underlying GUCs until every single extension that wants to change GUC
is modified uses this new symbol.  And I also don't see how we could force that
unless we have a better GUC API that doesn't entirely relies on strings,
otherwise a simple thing like "allow 5 extra bgworkers" is going to be really
painful.

A new hook after _PG_init() sure leaves the burden to the few extension that
needs to allocate shmem based on MaxBackends, but there probably isn't that
much (I have a few dozens locally cloned and found only 1 apart from mine, and
I would be happy to take care of those), and it also means that they have a
chance to do something correct even if other extensions messing with GUCs
aren't fixed.

Arguably, you could certainly try to change GUCs in that new hook, but it
wouldn't be any different from doing so in shmem_startup_hook so I don't think
it's really a problem.



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Fri, Mar 25, 2022 at 10:39:51AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 24, 2022 at 04:27:36PM -0400, Robert Haas wrote:
> > On Thu, Mar 24, 2022 at 4:20 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > > Another possibility could be to add a hook that is called _before_
> > > _PG_init() where libraries are permitted to adjust GUCs.  After the library
> > > is loaded, we first call this _PG_change_GUCs() function, then we
> > > initialize MaxBackends, and then we finally call _PG_init().  This way,
> > > extensions would have access to MaxBackends within _PG_init(), and if an
> > > extension really needed to alter GUCs, іt could define this new function.
> > 
> > Yeah, I think this might be better.
> 
> Well, if it's before _PG_init() then it won't be a new hook but a new symbol
> that has to be handled with dlsym.
> 
> But it seems to be that the bigger problem is that this approach won't fix
> anything unless we can prevent third-party code from messing with GUCs after
> that point as we could otherwise have discrepancies between GetMaxBackends()
> and the underlying GUCs until every single extension that wants to change GUC
> is modified uses this new symbol.  And I also don't see how we could force that
> unless we have a better GUC API that doesn't entirely relies on strings,
> otherwise a simple thing like "allow 5 extra bgworkers" is going to be really
> painful.
> 
> A new hook after _PG_init() sure leaves the burden to the few extension that
> needs to allocate shmem based on MaxBackends, but there probably isn't that
> much (I have a few dozens locally cloned and found only 1 apart from mine, and
> I would be happy to take care of those), and it also means that they have a
> chance to do something correct even if other extensions messing with GUCs
> aren't fixed.
> 
> Arguably, you could certainly try to change GUCs in that new hook, but it
> wouldn't be any different from doing so in shmem_startup_hook so I don't think
> it's really a problem.

As an example, here's a POC for a new shmem_request_hook hook after _PG_init().
With it I could easily fix pg_wait_sampling shmem allocation (and checked that
it's indeed requesting the correct size).

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Fri, Mar 25, 2022 at 11:11:46AM +0800, Julien Rouhaud wrote:
> As an example, here's a POC for a new shmem_request_hook hook after _PG_init().
> With it I could easily fix pg_wait_sampling shmem allocation (and checked that
> it's indeed requesting the correct size).

Are you sure that the end of a release cycle is the good moment to
begin designing new hooks?  Anything added is something we are going
to need supporting moving forward.  My brain is telling me that we
ought to revisit the business with GetMaxBackends() properly instead,
and perhaps revert that.

This solution makes me uneasy from the start (already stated
upthread), because it is not a solution to the original problem, just
a safeguard that handles one small-ish portion of the whole parameter
calculation cycle.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Fri, Mar 25, 2022 at 02:08:09PM +0900, Michael Paquier wrote:
> On Fri, Mar 25, 2022 at 11:11:46AM +0800, Julien Rouhaud wrote:
> > As an example, here's a POC for a new shmem_request_hook hook after _PG_init().
> > With it I could easily fix pg_wait_sampling shmem allocation (and checked that
> > it's indeed requesting the correct size).
> 
> Are you sure that the end of a release cycle is the good moment to
> begin designing new hooks?  Anything added is something we are going
> to need supporting moving forward.  My brain is telling me that we
> ought to revisit the business with GetMaxBackends() properly instead,
> and perhaps revert that.

I agree, and as I mentioned in my original email I don't think that the
committed patch is actually adding something on which we can really build on.
So I'm also in favor of reverting, as it seems like be a better option in the
long run to have a clean and broader solution.



Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2022-03-25 14:35:42 +0800, Julien Rouhaud wrote:
> On Fri, Mar 25, 2022 at 02:08:09PM +0900, Michael Paquier wrote:
> > On Fri, Mar 25, 2022 at 11:11:46AM +0800, Julien Rouhaud wrote:
> > > As an example, here's a POC for a new shmem_request_hook hook after _PG_init().
> > > With it I could easily fix pg_wait_sampling shmem allocation (and checked that
> > > it's indeed requesting the correct size).
> > 
> > Are you sure that the end of a release cycle is the good moment to
> > begin designing new hooks?  Anything added is something we are going
> > to need supporting moving forward.  My brain is telling me that we
> > ought to revisit the business with GetMaxBackends() properly instead,
> > and perhaps revert that.
> 
> I agree, and as I mentioned in my original email I don't think that the
> committed patch is actually adding something on which we can really build on.
> So I'm also in favor of reverting, as it seems like be a better option in the
> long run to have a clean and broader solution.

I don't really understand. The issue that started this thread was bugs in
extensions due to accessing MaxBackends before it is initialized - which the
patch prevents. The stuff that you're complaining about / designing here
doesn't seem related to that. I like the idea of the hooks etc, but I fail to
see why we "ought to revisit the business with GetMaxBackends()"?

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
Hi,

On Fri, Mar 25, 2022 at 03:23:17PM -0700, Andres Freund wrote:
>
> I don't really understand. The issue that started this thread was bugs in
> extensions due to accessing MaxBackends before it is initialized - which the
> patch prevents.

Well, the patch prevents accessing a 0-valued MaxBackends but doesn't do
anything to solve the original complaint.  It's not like extensions won't need
to access that information during _PG_init anymore.

> The stuff that you're complaining about / designing here
> doesn't seem related to that. I like the idea of the hooks etc, but I fail to
> see why we "ought to revisit the business with GetMaxBackends()"?

Because this GetMaxBackends() doesn't solve the problem nor brings any
infrastructure that could be reused to solve it.

I think that the root issue could be rephrased with:

Can we initialize MaxBackends earlier so that _PG_init() can see it because
maintaining MaxConnections + autovacuum_max_workers + 1 + max_worker_processes
+ max_wal_senders is troublesome.


And indeed, any third party code that previously needed to access what
MaxBackends is supposed to store should already be using that formula, and
the new GetMaxBackends() doesn't do anything about it.  So all extensions will
be as broken as before, except the few that were using MaxBackends without
realizing it's 0.  And if those exist (there's actually one) they're not that
broken, probably because MaxBackend is only used to request additional shmem,
with wanted value small enough so that it's compensated by the extra 100kB
shmem postgres allocates.

Since all those underlying GUCs shouldn't be accessible, we need some more
general infrastructure that would work for those too on top of a way to access
MaxBackends when extensions needs it.

Note that the only use case I'm aware of is for RequestAddinShmemSpace, so a
hook after _PG_init like the example shmem_request_hook would be enough for the
latter, but maybe there are more use cases for which it wouldn't.



Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2022-03-26 15:22:03 +0800, Julien Rouhaud wrote:
> On Fri, Mar 25, 2022 at 03:23:17PM -0700, Andres Freund wrote:
> >
> > I don't really understand. The issue that started this thread was bugs in
> > extensions due to accessing MaxBackends before it is initialized - which the
> > patch prevents.
> 
> Well, the patch prevents accessing a 0-valued MaxBackends but doesn't do
> anything to solve the original complaint.  It's not like extensions won't need
> to access that information during _PG_init anymore.

It resolves the pretty common bug that an extension breaks once it's used via
s_p_l instead of loaded on-demand because MaxBackends isn't initialized in the
s_p_l case.


> And indeed, any third party code that previously needed to access what
> MaxBackends is supposed to store should already be using that formula, and
> the new GetMaxBackends() doesn't do anything about it.

It couldn't rely on MaxBackends before. It can't rely on GetMaxBackends()
now. You can see why I think that what you want is unrelated to the
introduction of GetMaxBackends().

If we introduce a separate hook that allows to influence things like
max_connections or whatnot we'd *even more* need a way to verify whether it's
legal to access MaxBackends in that moment.


> So all extensions will be as broken as before, except the few that were
> using MaxBackends without realizing it's 0. And if those exist (there's
> actually one) they're not that broken, probably because MaxBackend is only
> used to request additional shmem, with wanted value small enough so that
> it's compensated by the extra 100kB shmem postgres allocates.

I don't think it's rare at all.

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Sat, Mar 26, 2022 at 10:23:16AM -0700, Andres Freund wrote:
>
> On 2022-03-26 15:22:03 +0800, Julien Rouhaud wrote:
> > On Fri, Mar 25, 2022 at 03:23:17PM -0700, Andres Freund wrote:
> > >
> > > I don't really understand. The issue that started this thread was bugs in
> > > extensions due to accessing MaxBackends before it is initialized - which the
> > > patch prevents.
> > 
> > Well, the patch prevents accessing a 0-valued MaxBackends but doesn't do
> > anything to solve the original complaint.  It's not like extensions won't need
> > to access that information during _PG_init anymore.
> 
> It resolves the pretty common bug that an extension breaks once it's used via
> s_p_l instead of loaded on-demand because MaxBackends isn't initialized in the
> s_p_l case.

I can hear the argument.  However I don't know any extension that relies on
MaxBackends and doesn't need to be in s_p_l, and unless I'm missing something
no one provided such an example, only people needing the value for
RequestAddinShmemSpace().

> > And indeed, any third party code that previously needed to access what
> > MaxBackends is supposed to store should already be using that formula, and
> > the new GetMaxBackends() doesn't do anything about it.
> 
> It couldn't rely on MaxBackends before. It can't rely on GetMaxBackends()
> now. You can see why I think that what you want is unrelated to the
> introduction of GetMaxBackends().

Sure, but code also couldn't really rely on MaxConnections or any other similar
GUCs and yet nothing is done for that, and the chosen approach doesn't help for
that.

The only difference is that those GUCs are less broken, as in the value is
likely to be valid more often, and closer to the final value when it's not.
But still broken.

I think GetMaxBackends() is more likely to force authors to rely on computing
the value using the underlying GUCs, since there's nothing else that can be
done.  And if that's an acceptable answer, why aren't we computing MaxBackends
before and after processing s_p_l?

> If we introduce a separate hook that allows to influence things like
> max_connections or whatnot we'd *even more* need a way to verify whether it's
> legal to access MaxBackends in that moment.

Again, I'm not opposed to validating that MaxBackends is valid when third-party
code is accessing it, quite the opposite.  I'm opposed to do so *only* for
MaxBackends, and without providing a way for third-party code to access that
value when they need it, which is for all the cases I know (after more digging
that's now 5 extensions) for RequestAddinShmemSpace().



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Sat, Mar 26, 2022 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
> > And indeed, any third party code that previously needed to access what
> > MaxBackends is supposed to store should already be using that formula, and
> > the new GetMaxBackends() doesn't do anything about it.
>
> It couldn't rely on MaxBackends before. It can't rely on GetMaxBackends()
> now. You can see why I think that what you want is unrelated to the
> introduction of GetMaxBackends().

It's not, though, because the original proposal was to change things
around so that the value of MaxBackends would have been reliable in
_PG_init(). If we'd done that, then extensions that are using it in
_PG_init() would have gone from being buggy to being not-buggy. But
since you advocated against that change, we instead committed
something that caused them to go from being buggy to failing outright.
That's pretty painful for people with such extensions. And IMHO, it's
*much* more legitimate to want to size a data structure based on the
value of MaxBackends than it is for extensions to override GUC values.
If we can make the latter use case work in a sane way, OK, although I
have my doubts about how sane it really is, but it can't be at the
expense of telling extensions that have been (incorrectly) using
MaxBackends in _PG_init() that we're throwing them under the bus.

IMHO, the proper thing to do if certain GUC values are required for an
extension to work is to put that information in the documentation and
error out at an appropriate point if the user does not follow the
directions. Then this issue does not arise. But there's no reasonable
workaround for being unable to size data structures based on
MaxBackends.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Mar 29, 2022 at 12:22:21PM -0400, Robert Haas wrote:
> It's not, though, because the original proposal was to change things
> around so that the value of MaxBackends would have been reliable in
> _PG_init(). If we'd done that, then extensions that are using it in
> _PG_init() would have gone from being buggy to being not-buggy. But
> since you advocated against that change, we instead committed
> something that caused them to go from being buggy to failing outright.
> That's pretty painful for people with such extensions. And IMHO, it's
> *much* more legitimate to want to size a data structure based on the
> value of MaxBackends than it is for extensions to override GUC values.
> If we can make the latter use case work in a sane way, OK, although I
> have my doubts about how sane it really is, but it can't be at the
> expense of telling extensions that have been (incorrectly) using
> MaxBackends in _PG_init() that we're throwing them under the bus.
> 
> IMHO, the proper thing to do if certain GUC values are required for an
> extension to work is to put that information in the documentation and
> error out at an appropriate point if the user does not follow the
> directions. Then this issue does not arise. But there's no reasonable
> workaround for being unable to size data structures based on
> MaxBackends.

FWIW I would be on board with reverting all the GetMaxBackends() stuff if
we made the value available in _PG_init() and stopped supporting GUC
overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
process_shared_preload_libraries_in_progress is true).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Wed, Mar 30, 2022 at 09:30:51AM -0700, Nathan Bossart wrote:
> On Tue, Mar 29, 2022 at 12:22:21PM -0400, Robert Haas wrote:
> > It's not, though, because the original proposal was to change things
> > around so that the value of MaxBackends would have been reliable in
> > _PG_init(). If we'd done that, then extensions that are using it in
> > _PG_init() would have gone from being buggy to being not-buggy. But
> > since you advocated against that change, we instead committed
> > something that caused them to go from being buggy to failing outright.
> > That's pretty painful for people with such extensions. And IMHO, it's
> > *much* more legitimate to want to size a data structure based on the
> > value of MaxBackends than it is for extensions to override GUC values.
> > If we can make the latter use case work in a sane way, OK, although I
> > have my doubts about how sane it really is, but it can't be at the
> > expense of telling extensions that have been (incorrectly) using
> > MaxBackends in _PG_init() that we're throwing them under the bus.
> > 
> > IMHO, the proper thing to do if certain GUC values are required for an
> > extension to work is to put that information in the documentation and
> > error out at an appropriate point if the user does not follow the
> > directions. Then this issue does not arise. But there's no reasonable
> > workaround for being unable to size data structures based on
> > MaxBackends.
> 
> FWIW I would be on board with reverting all the GetMaxBackends() stuff if
> we made the value available in _PG_init() and stopped supporting GUC
> overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
> process_shared_preload_libraries_in_progress is true).

Yeah I would prefer this approach too, although it couldn't prevent extension
from directly modifying the underlying variables so I don't know how effective
it would be.

On the bright side, I see that citus is using SetConfigOption() to increase
max_prepared_transactions [1].  That's the only extension mentioned in that
thread that does modify some GUC, and this GUC was already marked as
PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
compatibility reason.

In the meantime, should we add an open item?

[1]
https://github.com/citusdata/citus/blob/master/src/backend/distributed/transaction/transaction_management.c#L656-L657



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Sat, Apr 09, 2022 at 09:24:42PM +0800, Julien Rouhaud wrote:
> Yeah I would prefer this approach too, although it couldn't prevent extension
> from directly modifying the underlying variables so I don't know how effective
> it would be.
>
> On the bright side, I see that citus is using SetConfigOption() to increase
> max_prepared_transactions [1].  That's the only extension mentioned in that
> thread that does modify some GUC, and this GUC was already marked as
> PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
> compatibility reason.
>
> In the meantime, should we add an open item?

Yes, I think that we need to discuss more the matter here, so added one.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > FWIW I would be on board with reverting all the GetMaxBackends() stuff if
> > we made the value available in _PG_init() and stopped supporting GUC
> > overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
> > process_shared_preload_libraries_in_progress is true).
>
> Yeah I would prefer this approach too, although it couldn't prevent extension
> from directly modifying the underlying variables so I don't know how effective
> it would be.

I think I also prefer this approach. I am willing to be convinced
that's the wrong idea, but right now I favor it.

> On the bright side, I see that citus is using SetConfigOption() to increase
> max_prepared_transactions [1].  That's the only extension mentioned in that
> thread that does modify some GUC, and this GUC was already marked as
> PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
> compatibility reason.

I don't quite understand this, but I think if we want to support this
kind of thing it needs a separate hook.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
Hi,

On Mon, Apr 11, 2022 at 10:47:17AM -0400, Robert Haas wrote:
> On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > On the bright side, I see that citus is using SetConfigOption() to increase
> > max_prepared_transactions [1].  That's the only extension mentioned in that
> > thread that does modify some GUC, and this GUC was already marked as
> > PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
> > compatibility reason.
>
> I don't quite understand this, but I think if we want to support this
> kind of thing it needs a separate hook.

My point was that even if we say we don't support this, we have no way to
actually fully enforce it, as the variables are exported.  So it's good that
the only know case is using the GUC API, since we can enforce this one.



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Mon, Apr 11, 2022 at 10:47:17AM -0400, Robert Haas wrote:
> On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> > FWIW I would be on board with reverting all the GetMaxBackends() stuff if
>> > we made the value available in _PG_init() and stopped supporting GUC
>> > overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
>> > process_shared_preload_libraries_in_progress is true).
>>
>> Yeah I would prefer this approach too, although it couldn't prevent extension
>> from directly modifying the underlying variables so I don't know how effective
>> it would be.
> 
> I think I also prefer this approach. I am willing to be convinced
> that's the wrong idea, but right now I favor it.

Here are some patches.  0001 reverts all the recent commits in this area,
0002 is the patch I posted in August, and 0003 is an attempt at blocking
GUC changes in preloaded libraries' _PG_init() functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Mon, Apr 11, 2022 at 12:44 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> Here are some patches.  0001 reverts all the recent commits in this area,
> 0002 is the patch I posted in August, and 0003 is an attempt at blocking
> GUC changes in preloaded libraries' _PG_init() functions.

If we throw an error while defining_custom_guc is true, how will it
ever again become false?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> On Mon, Apr 11, 2022 at 12:44 PM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>> Here are some patches.  0001 reverts all the recent commits in this area,
>> 0002 is the patch I posted in August, and 0003 is an attempt at blocking
>> GUC changes in preloaded libraries' _PG_init() functions.
> 
> If we throw an error while defining_custom_guc is true, how will it
> ever again become false?

Ah, I knew I was forgetting something this morning.

It won't, but the only place it is presently needed is when loading
shared_preload_libraries, so I believe startup will fail anyway.  However,
I can see defining_custom_guc being used elsewhere, so that is probably not
good enough.  Another approach could be to add a static
set_config_option_internal() function with a boolean argument to indicate
whether it is being used while defining a custom GUC.  I'll adjust 0003
with that approach unless a better idea prevails.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
>> If we throw an error while defining_custom_guc is true, how will it
>> ever again become false?
> 
> Ah, I knew I was forgetting something this morning.
> 
> It won't, but the only place it is presently needed is when loading
> shared_preload_libraries, so I believe startup will fail anyway.  However,
> I can see defining_custom_guc being used elsewhere, so that is probably not
> good enough.  Another approach could be to add a static
> set_config_option_internal() function with a boolean argument to indicate
> whether it is being used while defining a custom GUC.  I'll adjust 0003
> with that approach unless a better idea prevails.

Here's a new patch set.  I've only changed 0003 as described above.  My
apologies for the unnecessary round trip.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
Hi,

On Mon, Apr 11, 2022 at 02:14:35PM -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> > On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> >> If we throw an error while defining_custom_guc is true, how will it
> >> ever again become false?
> > 
> > Ah, I knew I was forgetting something this morning.
> > 
> > It won't, but the only place it is presently needed is when loading
> > shared_preload_libraries, so I believe startup will fail anyway.  However,
> > I can see defining_custom_guc being used elsewhere, so that is probably not
> > good enough.  Another approach could be to add a static
> > set_config_option_internal() function with a boolean argument to indicate
> > whether it is being used while defining a custom GUC.  I'll adjust 0003
> > with that approach unless a better idea prevails.
> 
> Here's a new patch set.  I've only changed 0003 as described above.  My
> apologies for the unnecessary round trip.

It looks sensible to me, although I think I would apply 0003 before 0002.

+   if (process_shared_preload_libraries_in_progress &&
+       !allow_when_loading_preload_libs)
+       ereport(ERROR,
+               (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
+                errmsg("cannot change parameters while loading "
+                       "\"shared_preload_libraries\"")));
+

I think the error message should mention at least which GUC is being modified.



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 01:08:35PM +0800, Julien Rouhaud wrote:
> It looks sensible to me, although I think I would apply 0003 before 0002.

Great.

> +   if (process_shared_preload_libraries_in_progress &&
> +       !allow_when_loading_preload_libs)
> +       ereport(ERROR,
> +               (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
> +                errmsg("cannot change parameters while loading "
> +                       "\"shared_preload_libraries\"")));
> +
> 
> I think the error message should mention at least which GUC is being modified.

My intent was to make it clear that no parameters can be updated while
loading s_p_l, but I agree that we should mention the specific GUC
somewhere.  Maybe this could go in a hint?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Andres Freund
Date:
Hi,

On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> > On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> >> If we throw an error while defining_custom_guc is true, how will it
> >> ever again become false?
> > 
> > Ah, I knew I was forgetting something this morning.
> > 
> > It won't, but the only place it is presently needed is when loading
> > shared_preload_libraries, so I believe startup will fail anyway.  However,
> > I can see defining_custom_guc being used elsewhere, so that is probably not
> > good enough.  Another approach could be to add a static
> > set_config_option_internal() function with a boolean argument to indicate
> > whether it is being used while defining a custom GUC.  I'll adjust 0003
> > with that approach unless a better idea prevails.
> 
> Here's a new patch set.  I've only changed 0003 as described above.  My
> apologies for the unnecessary round trip.

ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..

Greetings,

Andres Freund



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote:
> On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
>> Here's a new patch set.  I've only changed 0003 as described above.  My
>> apologies for the unnecessary round trip.
> 
> ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..

That seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 2:03 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote:
> > On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
> >> Here's a new patch set.  I've only changed 0003 as described above.  My
> >> apologies for the unnecessary round trip.
> >
> > ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..
>
> That seems reasonable to me.

Gah, I hate putting this off for another year, but I guess I'm also
not convinced that 0003 has the right idea, so maybe it's for the
best. Here's what's bugging me: 0003 decrees that _PG_init() is the
Wrong Place to Adjust GUC Settings. Now, that begs the question: what
is the right place to adjust GUC settings? I argued upthread that
extensions shouldn't be overriding values from postgresql.conf at all,
but on further reflection I think there's at least one legitimate use
case for this sort of thing: some kind of auto-tuning module that, for
example, looks at how much memory you have and sets shared_buffers or
work_mem or something based on the result. It's arguable how well
something like this can actually work, but we probably shouldn't try
to prevent people from doing it. I'm a little less clear why Citus
thinks this is an appropriate thing for it to do, vs. telling the user
to configure a useful value, and I'd be curious to hear an explanation
around that.

But if there's even one use case where adjusting GUCs at this phase is
reasonable, then 0003 isn't really good enough. We need an 0004 that
provides a new hook in a place where such changes can safely be made.

Meanwhile, committed 0001.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Gah, I hate putting this off for another year, but I guess I'm also
> not convinced that 0003 has the right idea, so maybe it's for the
> best. Here's what's bugging me: 0003 decrees that _PG_init() is the
> Wrong Place to Adjust GUC Settings. Now, that begs the question: what
> is the right place to adjust GUC settings? I argued upthread that
> extensions shouldn't be overriding values from postgresql.conf at all,
> but on further reflection I think there's at least one legitimate use
> case for this sort of thing: some kind of auto-tuning module that, for
> example, looks at how much memory you have and sets shared_buffers or
> work_mem or something based on the result.

Yeah.  It's a very long way from "changing shared memory sizes here
doesn't work" to "no extension is allowed to change any GUC at load
time".  A counterexample is that it's not clear why an extension
shouldn't be allowed to define a GUC using DefineCustomXXXVariable
and then immediately set it to some other value.  I think trying to
forbid that across-the-board is going to mostly result in breaking
a lot of cases that are perfectly safe.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 3:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  It's a very long way from "changing shared memory sizes here
> doesn't work" to "no extension is allowed to change any GUC at load
> time".  A counterexample is that it's not clear why an extension
> shouldn't be allowed to define a GUC using DefineCustomXXXVariable
> and then immediately set it to some other value.  I think trying to
> forbid that across-the-board is going to mostly result in breaking
> a lot of cases that are perfectly safe.

Hmm, that's an interesting case which I hadn't considered. It seems
like sort of a lame thing to do, because why wouldn't you just create
the GUC with the right initial value instead of modifying it after the
fact? But maybe there's some use case in which it makes sense to do it
that way. A read-only GUC that advertises some calculated value,
perhaps?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
> But if there's even one use case where adjusting GUCs at this phase is
> reasonable, then 0003 isn't really good enough. We need an 0004 that
> provides a new hook in a place where such changes can safely be made.

I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
that is called before _PG_init().  The other option is to add a hook called
after _PG_init() where MaxBackends is available (in which case we likely
want GetMaxBackends() again).  Thoughts?

> Meanwhile, committed 0001.

Thanks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 03:30:23PM -0400, Robert Haas wrote:
> On Tue, Apr 12, 2022 at 3:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  It's a very long way from "changing shared memory sizes here
>> doesn't work" to "no extension is allowed to change any GUC at load
>> time".  A counterexample is that it's not clear why an extension
>> shouldn't be allowed to define a GUC using DefineCustomXXXVariable
>> and then immediately set it to some other value.  I think trying to
>> forbid that across-the-board is going to mostly result in breaking
>> a lot of cases that are perfectly safe.
> 
> Hmm, that's an interesting case which I hadn't considered. It seems
> like sort of a lame thing to do, because why wouldn't you just create
> the GUC with the right initial value instead of modifying it after the
> fact? But maybe there's some use case in which it makes sense to do it
> that way. A read-only GUC that advertises some calculated value,
> perhaps?

I think it'd be reasonable to allow changing custom GUCs in _PG_init().
Those aren't going to impact things like MaxBackends.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
>> But if there's even one use case where adjusting GUCs at this phase is
>> reasonable, then 0003 isn't really good enough. We need an 0004 that
>> provides a new hook in a place where such changes can safely be made.

> I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
> that is called before _PG_init().  The other option is to add a hook called
> after _PG_init() where MaxBackends is available (in which case we likely
> want GetMaxBackends() again).  Thoughts?

I like the second option.  Calling into a module before we've called its
_PG_init function is just weird, and will cause no end of confusion.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> I think it'd be reasonable to allow changing custom GUCs in _PG_init().
> Those aren't going to impact things like MaxBackends.

It's reasonable to allow changing *most* GUCs in _PG_init(); let us
please not break cases that work fine today.

It seems after a bit of reflection that what we ought to do is identify
the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
mark those with a new GUC flag, and not allow them to be changed after
shared memory is set up.  (An alternative to a new flag bit is to
subdivide the PGC_POSTMASTER context into two values, but that seems
like it'd be far more invasive.  A flag bit needn't be user-visible.)
Then, what we'd need to do is arrange for shared-preload modules to
receive their _PG_init call before shared memory creation (when they
can still twiddle such GUCs) and then another callback later.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 04:33:39PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
>>> But if there's even one use case where adjusting GUCs at this phase is
>>> reasonable, then 0003 isn't really good enough. We need an 0004 that
>>> provides a new hook in a place where such changes can safely be made.
> 
>> I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
>> that is called before _PG_init().  The other option is to add a hook called
>> after _PG_init() where MaxBackends is available (in which case we likely
>> want GetMaxBackends() again).  Thoughts?
> 
> I like the second option.  Calling into a module before we've called its
> _PG_init function is just weird, and will cause no end of confusion.

Alright.  I think that is basically what Julien recommended a few weeks ago
[0].  Besides possibly reintroducing GetMaxBackends(), we might also want
to block SetConfigOption() when called in this new hook.

[0] https://postgr.es/m/20220325031146.cd23gaak5qlzdy6l%40jrouhaud

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 04:58:42PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> I think it'd be reasonable to allow changing custom GUCs in _PG_init().
>> Those aren't going to impact things like MaxBackends.
> 
> It's reasonable to allow changing *most* GUCs in _PG_init(); let us
> please not break cases that work fine today.

Right, this is a fair point.

> It seems after a bit of reflection that what we ought to do is identify
> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
> mark those with a new GUC flag, and not allow them to be changed after
> shared memory is set up.  (An alternative to a new flag bit is to
> subdivide the PGC_POSTMASTER context into two values, but that seems
> like it'd be far more invasive.  A flag bit needn't be user-visible.)
> Then, what we'd need to do is arrange for shared-preload modules to
> receive their _PG_init call before shared memory creation (when they
> can still twiddle such GUCs) and then another callback later.

If we allow changing GUCs in _PG_init() and provide another hook where
MaxBackends will be initialized, do we need to introduce another GUC flag,
or can we get away with just blocking all GUC changes when the new hook is
called?  I'm slightly hesitant to add a GUC flag that will need to manually
maintained.  Wouldn't it be easily forgotten?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 4:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's reasonable to allow changing *most* GUCs in _PG_init(); let us
> please not break cases that work fine today.
>
> It seems after a bit of reflection that what we ought to do is identify
> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
> mark those with a new GUC flag, and not allow them to be changed after
> shared memory is set up.  (An alternative to a new flag bit is to
> subdivide the PGC_POSTMASTER context into two values, but that seems
> like it'd be far more invasive.  A flag bit needn't be user-visible.)
> Then, what we'd need to do is arrange for shared-preload modules to
> receive their _PG_init call before shared memory creation (when they
> can still twiddle such GUCs) and then another callback later.

I dunno, I feel like this is over-engineered. An extra hook wouldn't
really affect anyone who doesn't need to use it, and I doubt that
adapting to it would create much pain for anyone who is knowledgeable
enough to be writing these kinds of extensions in the first place.
This design is something everyone who adds a GUC from now until the
end of the project needs to know about for the benefit of a tiny
number of extension authors who weren't really going to have a big
problem anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Apr 12, 2022 at 4:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It seems after a bit of reflection that what we ought to do is identify
>> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
>> mark those with a new GUC flag, and not allow them to be changed after
>> shared memory is set up.

> I dunno, I feel like this is over-engineered.

It probably is.  I'm just offering this as a solution if people want to
insist on a mechanism to prevent unsafe GUC changes.  If we drop the
idea of trying to forcibly prevent extensions from Doing It Wrong,
then we don't need this, only the extra hook.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> If we allow changing GUCs in _PG_init() and provide another hook where
> MaxBackends will be initialized, do we need to introduce another GUC flag,
> or can we get away with just blocking all GUC changes when the new hook is
> called?  I'm slightly hesitant to add a GUC flag that will need to manually
> maintained.  Wouldn't it be easily forgotten?

On the whole I think Robert's got the right idea: we do not really
need an enforcement mechanism at all.  People who are writing
extensions that do this sort of thing will learn how to do it right.
(It's not like there's not a thousand other things they have to get
right.)

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 12, 2022 at 05:46:36PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> If we allow changing GUCs in _PG_init() and provide another hook where
>> MaxBackends will be initialized, do we need to introduce another GUC flag,
>> or can we get away with just blocking all GUC changes when the new hook is
>> called?  I'm slightly hesitant to add a GUC flag that will need to manually
>> maintained.  Wouldn't it be easily forgotten?
> 
> On the whole I think Robert's got the right idea: we do not really
> need an enforcement mechanism at all.  People who are writing
> extensions that do this sort of thing will learn how to do it right.
> (It's not like there's not a thousand other things they have to get
> right.)

Okay.  So maybe we only need the attached patches.  0001 is just 5ecd018
un-reverted, and 0002 is Julien's patch from a few weeks ago with some
minor edits.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Tue, Apr 12, 2022 at 05:43:17PM -0400, Tom Lane wrote:
> It probably is.  I'm just offering this as a solution if people want to
> insist on a mechanism to prevent unsafe GUC changes.  If we drop the
> idea of trying to forcibly prevent extensions from Doing It Wrong,
> then we don't need this, only the extra hook.

FWIW, I'd be fine with a simple solution like what Julien has been
proposing with the extra hook, rather than a GUC to enforce all that.
That may be nice in the long-run, but the potential benefits may not
be completely clear, either, after a closer read of this thread.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 6:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Okay.  So maybe we only need the attached patches.  0001 is just 5ecd018
> un-reverted, and 0002 is Julien's patch from a few weeks ago with some
> minor edits.

Hmm. I suppose I was thinking that we'd go the other way around: move
the call to InitializeMaxBackends() earlier, as proposed previously,
and add a hook to allow extensions to get control before that point.
The reason that I like that approach is that I suspect that it's more
common for extensions to want to size shared memory data structures
than it is for them to want to change GUCs, and therefore I thought
that approach would fix things for the most amount of people with the
least amount of code change. But it seems like maybe Tom thinks I'm
incorrect about the relative frequency of those things, so I don't
know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Hmm. I suppose I was thinking that we'd go the other way around: move
> the call to InitializeMaxBackends() earlier, as proposed previously,
> and add a hook to allow extensions to get control before that point.
> The reason that I like that approach is that I suspect that it's more
> common for extensions to want to size shared memory data structures
> than it is for them to want to change GUCs, and therefore I thought
> that approach would fix things for the most amount of people with the
> least amount of code change. But it seems like maybe Tom thinks I'm
> incorrect about the relative frequency of those things, so I don't
> know.

Maybe I'm missing something, but I figured we'd keep the _PG_init
calls where they are to minimize side-effects, and then add an optional
hook just before/after shared memory size is determined.  Cases that
work well now continue to work well, and cases that don't work so
well can be fixed by making use of the hook.  In particular you
can still do RequestAddinShmemSpace() in _PG_init as long as the
request size doesn't depend on factors that other extensions might
change.  If you're doing something funny then you might need to
postpone RequestAddinShmemSpace till the new hook call.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, Apr 13, 2022 at 9:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Hmm. I suppose I was thinking that we'd go the other way around: move
> > the call to InitializeMaxBackends() earlier, as proposed previously,
> > and add a hook to allow extensions to get control before that point.
> > The reason that I like that approach is that I suspect that it's more
> > common for extensions to want to size shared memory data structures
> > than it is for them to want to change GUCs, and therefore I thought
> > that approach would fix things for the most amount of people with the
> > least amount of code change. But it seems like maybe Tom thinks I'm
> > incorrect about the relative frequency of those things, so I don't
> > know.
>
> Maybe I'm missing something, but I figured we'd keep the _PG_init
> calls where they are to minimize side-effects, and then add an optional
> hook just before/after shared memory size is determined.  Cases that
> work well now continue to work well, and cases that don't work so
> well can be fixed by making use of the hook.  In particular you
> can still do RequestAddinShmemSpace() in _PG_init as long as the
> request size doesn't depend on factors that other extensions might
> change.  If you're doing something funny then you might need to
> postpone RequestAddinShmemSpace till the new hook call.

I wouldn't say that approach is wrong. I think we're just prioritizing
different things. I think that the pattern of wanting to request
shared memory in _PG_init() is common, and there are lots of
extensions that already do it that way, and if we pursue this plan,
then all of those extensions really ought to be updated to use the new
hook, including pg_prewarm and pg_stat_statements. They may continue
to work if they don't, but they'll be doing something that is not
really best practice any more but will happen to work until you make
your request for shared memory dependent on the wrong thing, or until
you load up another module at the same time that tweaks some GUC upon
which your calculation depends (imagine an autotuner that adjusts
pg_stat_statements.max at startup time). So I think we'll be
inflicting subtle bugs on extension authors that will never really get
fully sorted out.

Now, on the other hand, I think that the pattern of changing GUCs in
_PG_init() is comparatively uncommon. I don't believe that we have any
in-core code that does it, and the only out-of-core code that does to
my knowledge is Citus. So if we subtly change the best practices
around setting GUCs in _PG_init() hooks, I think that's going to
affect a vastly smaller number of extensions. Either way, we're
changing best practices without really creating a hard break, but I'd
rather move the wood for the small number of extensions that are
tweaking GUCs in _PG_init() than the larger number of extensions (or
so I suppose) that are requesting shared memory there.

Now, we could also decide to create a hard compatibility break to
force extension authors to update. I think that could be done in one
of two ways. One possibility is to refuse SetConfigOption() in
_PG_init(); anyone who wants to do that has to use the new set-a-GUC
hook we simultaneously add. The other is to refuse
RequestAddinShmemSpace() and RequestNamedLWLockTranche() in
_PG_init(); anyone who wants to do that has to use the new
request-shmem-resources hook that we simultaneously add. I tend to
think that the latter is a slightly more principled option, because I
doubt that refusing SetConfigOption() is a bullet-proof defense
against, well, I don't know, anything really. You can hackily modify
GUCs by other means, and you can do bad math with things that aren't
GUCs. I am however pretty sure that if we refuse to provide shmem
space unless you request it from a new add-shmem hook, that will be
pretty water-tight, and it doesn't seem like it should be a big issue
for extension authors to move that code to some new hook we invent.

So in the end I see basically four possibilities here:

A. No hard compatibility break, but invent a set-a-GUC hook that runs earlier.
B. No hard compatibility break, but invent a request-shmem hook that runs later.
C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in _PG_init.
D. Invent a request-shmem hook that runs later AND refuse to accept
shmem requests in _PG_init.

My preferred options are A or D for the reasons explained above. It
seems like you prefer B.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I wouldn't say that approach is wrong. I think we're just prioritizing
> different things. I think that the pattern of wanting to request
> shared memory in _PG_init() is common, and there are lots of
> extensions that already do it that way, and if we pursue this plan,
> then all of those extensions really ought to be updated to use the new
> hook, including pg_prewarm and pg_stat_statements. They may continue
> to work if they don't, but they'll be doing something that is not
> really best practice any more but will happen to work until you make
> your request for shared memory dependent on the wrong thing, or until
> you load up another module at the same time that tweaks some GUC upon
> which your calculation depends (imagine an autotuner that adjusts
> pg_stat_statements.max at startup time). So I think we'll be
> inflicting subtle bugs on extension authors that will never really get
> fully sorted out.

Yeah, there is something to be said for preventing subtle interactions
between extensions.

> So in the end I see basically four possibilities here:

> A. No hard compatibility break, but invent a set-a-GUC hook that runs earlier.
> B. No hard compatibility break, but invent a request-shmem hook that runs later.
> C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in _PG_init.
> D. Invent a request-shmem hook that runs later AND refuse to accept
> shmem requests in _PG_init.

I dislike A for the reason I already stated: _PG_init should be the
first code we run in an extension.  Not doing that is just too hacky
for words.  While B breaks the least stuff in the short run, I agree
that it leaves extension authors on the hook to avoid unpleasant
interactions, and that the only way they can be sure to do so is
to move their shmem requests to the new hook.  So if we're willing
to accept a hard compatibility break to prevent such bugs, then
I too prefer D to C.  What I'm not quite convinced about is whether
the problem is big enough to warrant a compatibility break.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, Apr 13, 2022 at 10:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, there is something to be said for preventing subtle interactions
> between extensions.
>
> > So in the end I see basically four possibilities here:
>
> > A. No hard compatibility break, but invent a set-a-GUC hook that runs earlier.
> > B. No hard compatibility break, but invent a request-shmem hook that runs later.
> > C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in _PG_init.
> > D. Invent a request-shmem hook that runs later AND refuse to accept
> > shmem requests in _PG_init.
>
> I dislike A for the reason I already stated: _PG_init should be the
> first code we run in an extension.  Not doing that is just too hacky
> for words.

That seems like a fair position, even if I don't really understand why
it would be as bad as "too hacky for words."

> While B breaks the least stuff in the short run, I agree
> that it leaves extension authors on the hook to avoid unpleasant
> interactions, and that the only way they can be sure to do so is
> to move their shmem requests to the new hook.  So if we're willing
> to accept a hard compatibility break to prevent such bugs, then
> I too prefer D to C.  What I'm not quite convinced about is whether
> the problem is big enough to warrant a compatibility break.

It's sort of a philosophical question. How do you measure the size of
such a problem? What units do you even use for such a size
measurement? How big does it have to be to justify a compatibility
break? Presumably it depends on the size of the compatibility break,
which is also not subject to any sort of objective measurement.

I guess my feeling about this is that _PG_init() is sort of a grab
bag, and that's not very good. You're supposed to register new GUCs
there, and request shared memory space, and install any hook functions
you want to use, and maybe some other stuff. But why is it appropriate
for all of those things to happen at the same time? I think it pretty
clearly isn't, and that's why you see _PG_init() functions testing
process_shared_preload_libraries_in_progress, and why
RequestAddinShmemSpace() is a no-op if it's not the right time for
such things. To me, all of that is just a sign that the system is
badly designed. Imagine if someone proposed to make XLogInsert() or
SetConfigOption() or LockBuffer() sometimes just return without doing
anything. We would just call those functions in places where those
actions weren't appropriate, and the function would just do nothing
silently without signalling an error. Surely such a proposal would be
shot down as an awful idea, and the only reason the _PG_init() case is
any different is because it's not new. But it doesn't seem to me that
it's any better of an idea here than it would be there.

And under proposal D we'd actually be fixing that, because we'd have a
hook that is the right place to request shared memory and we'd
complain if those functions were called from anywhere else.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I guess my feeling about this is that _PG_init() is sort of a grab
> bag, and that's not very good. You're supposed to register new GUCs
> there, and request shared memory space, and install any hook functions
> you want to use, and maybe some other stuff. But why is it appropriate
> for all of those things to happen at the same time? I think it pretty
> clearly isn't, and that's why you see _PG_init() functions testing
> process_shared_preload_libraries_in_progress, and why
> RequestAddinShmemSpace() is a no-op if it's not the right time for
> such things. To me, all of that is just a sign that the system is
> badly designed.

Hmm, that's a good point.  If we can replace "RequestAddinShmemSpace
does nothing if called at the wrong time" with "RequestAddinShmemSpace
throws error if called at the wrong time", that seems like a pretty
clear improvement in robustness.

Is there anything else typically done in _PG_init that has to be
conditional on process_shared_preload_libraries_in_progress?  I recall
something about reserving LWLocks, which probably should get the same
treatment.  If we can get to a point where _PG_init shouldn't need to
care about process_shared_preload_libraries_in_progress, because all
the stuff that would care is moved to this new hook, then that would
be very clear cleanup.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, Apr 13, 2022 at 11:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there anything else typically done in _PG_init that has to be
> conditional on process_shared_preload_libraries_in_progress?  I recall
> something about reserving LWLocks, which probably should get the same
> treatment.  If we can get to a point where _PG_init shouldn't need to
> care about process_shared_preload_libraries_in_progress, because all
> the stuff that would care is moved to this new hook, then that would
> be very clear cleanup.

What's a little wonky right now is that it's fairly common for
extensions to just return straight-off without doing *anything at all*
if !process_shared_preload_libraries_in_progress. See
pg_stat_statements for example. It seems kind of strange to me to make
registering GUCs dependent on
process_shared_preload_libraries_in_progress; why shouldn't that just
happen always? It's understandable that we don't want to install the
hook functions if we're not being loaded from shared_preload_libaries,
though.

It may be too much to hope that we're going to completely get rid of
process_shared_preload_libraries_in_progress tests. But even given
that, I think having a request-shared-mem hook makes sense and would
make things cleaner than they are today. Maybe in the future we'll end
up with other hooks as well, like a "this is the place to register
GUCs" hook and a perhaps a "this is the place to register you custom
rmgr" hook. I'm not really sure that we need those and I don't want to
make things complicated just for the heck of it, but it's not shocking
that different operations need to happen at different times in the
startup sequence, and I don't think we should be afraid to split up
the monolithic _PG_init() into as many separate things as are required
to make it work right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> What's a little wonky right now is that it's fairly common for
> extensions to just return straight-off without doing *anything at all*
> if !process_shared_preload_libraries_in_progress. See
> pg_stat_statements for example. It seems kind of strange to me to make
> registering GUCs dependent on
> process_shared_preload_libraries_in_progress; why shouldn't that just
> happen always? It's understandable that we don't want to install the
> hook functions if we're not being loaded from shared_preload_libaries,
> though.

Yeah, I was just investigating that.  The problem that pg_stat_statements
has is that it wants to create PGC_POSTMASTER GUCs, which is something
that guc.c specifically forbids:

    /*
     * Only allow custom PGC_POSTMASTER variables to be created during shared
     * library preload; any later than that, we can't ensure that the value
     * doesn't change after startup.  This is a fatal elog if it happens; just
     * erroring out isn't safe because we don't know what the calling loadable
     * module might already have hooked into.
     */
    if (context == PGC_POSTMASTER &&
        !process_shared_preload_libraries_in_progress)
        elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");

The key reason guc.c does that is that if it allowed the case, then there
would be no guarantee that a "PGC_POSTMASTER" GUC has the same value in
every backend of the cluster, which'd likely break most use-cases for
such a GUC (it'd certainly break pg_stat_statements, which assumes that
the local setting of that GUC reflects the size of its shmem area).
Perhaps we can improve on that situation with some more thought, but
I'm not very clear on how.

> It may be too much to hope that we're going to completely get rid of
> process_shared_preload_libraries_in_progress tests.

Perhaps, but this is definitely an area that could stand to have some
serious design thought put into it.  You're quite right that it's
largely a mess today, but I think proposals like "forbid setting GUCs
during _PG_init" would add to the mess rather than clean things up.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, Apr 13, 2022 at 12:05:08PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It may be too much to hope that we're going to completely get rid of
>> process_shared_preload_libraries_in_progress tests.
> 
> Perhaps, but this is definitely an area that could stand to have some
> serious design thought put into it.  You're quite right that it's
> largely a mess today, but I think proposals like "forbid setting GUCs
> during _PG_init" would add to the mess rather than clean things up.

+1.  The simplest design might be to just make a separate preload hook.
_PG_init() would install hooks, and then this preload hook would do
anything that needs to happen when the library is loaded via s_p_l.
However, I think we still want a new shmem request hook where MaxBackends
is initialized.

If we do move forward with the shmem request hook, do we want to disallow
shmem requests anywhere else, or should we just leave it up to extension
authors to do the right thing?  Many shmem requests in _PG_init() are
probably okay unless they depend on MaxBackends or another GUC that someone
might change.  Given that, I think I currently prefer the latter (option B
from upthread).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Wed, Apr 13, 2022 at 11:30:40AM -0700, Nathan Bossart wrote:
> On Wed, Apr 13, 2022 at 12:05:08PM -0400, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> It may be too much to hope that we're going to completely get rid of
> >> process_shared_preload_libraries_in_progress tests.
> > 
> > Perhaps, but this is definitely an area that could stand to have some
> > serious design thought put into it.  You're quite right that it's
> > largely a mess today, but I think proposals like "forbid setting GUCs
> > during _PG_init" would add to the mess rather than clean things up.
> 
> +1.  The simplest design might be to just make a separate preload hook.
> _PG_init() would install hooks, and then this preload hook would do
> anything that needs to happen when the library is loaded via s_p_l.
> However, I think we still want a new shmem request hook where MaxBackends
> is initialized.

Yeah this one seems needed no matter what.

> If we do move forward with the shmem request hook, do we want to disallow
> shmem requests anywhere else, or should we just leave it up to extension
> authors to do the right thing?  Many shmem requests in _PG_init() are
> probably okay unless they depend on MaxBackends or another GUC that someone
> might change.  Given that, I think I currently prefer the latter (option B
> from upthread).

I'd be in favor of a hard break.  There are already multiple extensions that
relies on non final value of GUCs to size their shmem request.  And as an
extension author it can be hard to realize that, as those extensions work just
fine until someone wants to try it with some other extension that changes some
GUC.  Forcing shmem request in a new hook will make sure that it's *always*
correct, and that only requires very minimal work on the extension side.



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Thu, Apr 14, 2022 at 01:50:24PM +0800, Julien Rouhaud wrote:
> On Wed, Apr 13, 2022 at 11:30:40AM -0700, Nathan Bossart wrote:
>> If we do move forward with the shmem request hook, do we want to disallow
>> shmem requests anywhere else, or should we just leave it up to extension
>> authors to do the right thing?  Many shmem requests in _PG_init() are
>> probably okay unless they depend on MaxBackends or another GUC that someone
>> might change.  Given that, I think I currently prefer the latter (option B
>> from upthread).
> 
> I'd be in favor of a hard break.  There are already multiple extensions that
> relies on non final value of GUCs to size their shmem request.  And as an
> extension author it can be hard to realize that, as those extensions work just
> fine until someone wants to try it with some other extension that changes some
> GUC.  Forcing shmem request in a new hook will make sure that it's *always*
> correct, and that only requires very minimal work on the extension side.

Yeah, this is a good point.  If we're okay with breaking existing
extensions like this, I will work on a patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Thu, Apr 14, 2022 at 12:22 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> > I'd be in favor of a hard break.  There are already multiple extensions that
> > relies on non final value of GUCs to size their shmem request.  And as an
> > extension author it can be hard to realize that, as those extensions work just
> > fine until someone wants to try it with some other extension that changes some
> > GUC.  Forcing shmem request in a new hook will make sure that it's *always*
> > correct, and that only requires very minimal work on the extension side.
>
> Yeah, this is a good point.  If we're okay with breaking existing
> extensions like this, I will work on a patch.

I tend to think it's a good idea. It's not great to inconvenience
extension authors, but I think that with the kind of design we are
talking about here it will be pretty straightforward to see how to
update your extension: move the code that request shmem resources into
the new hook. If you want to be compatible with older PostgreSQL
releases by conditional compilation, conditionally call that function
from _PG_init() when PG_VERSION_NUM < whatever.

Compare that with the current situation, where things seem to mostly
work but sometimes don't, and it's not quite clear what to do about
it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 14, 2022 at 12:22 PM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>>> I'd be in favor of a hard break.

>> Yeah, this is a good point.  If we're okay with breaking existing
>> extensions like this, I will work on a patch.

> I tend to think it's a good idea.

I've come around to that view as well.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Thu, Apr 14, 2022 at 12:39:46PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 14, 2022 at 12:22 PM Nathan Bossart
>> <nathandbossart@gmail.com> wrote:
>>>> I'd be in favor of a hard break.
> 
>>> Yeah, this is a good point.  If we're okay with breaking existing
>>> extensions like this, I will work on a patch.
> 
>> I tend to think it's a good idea.
> 
> I've come around to that view as well.

Here is a new patch set that introduces the aforementioned "hard break."

I noticed that requests for more LWLocks follow a similar pattern as
regular shared memory requests, and I figured that we would want to do
something similar for those, but I wasn't sure exactly how to proceed.  I
saw two options: 1) use shmem_request_hook for both regular requests and
LWLock requests or 2) introduce an lwlock_request_hook.  My instinct was
that option 1 was preferable, but AFAICT this requires introducing a new
external variable for inspecting whether the request is made at a valid
time.  This would be similar to
process_shared_preload_libraries_in_progress, which I believe means a
determined extension author could easily hack around the request
restrictions.  I thought option 2 added too much machinery to work around
this problem.  For now, I haven't made any changes for LWLock requests.
What are your thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> I noticed that requests for more LWLocks follow a similar pattern as
> regular shared memory requests, and I figured that we would want to do
> something similar for those, but I wasn't sure exactly how to proceed.  I
> saw two options: 1) use shmem_request_hook for both regular requests and
> LWLock requests or 2) introduce an lwlock_request_hook.  My instinct was
> that option 1 was preferable,

Yeah, I agree, which says that maybe the hook name needs to be something
else (not that I have a good proposal).

> but AFAICT this requires introducing a new
> external variable for inspecting whether the request is made at a valid
> time.

Uh, why?  It'd be the core code's responsibility to place the hook
call at a point where you could do both.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Mon, Apr 18, 2022 at 07:33:54PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> I noticed that requests for more LWLocks follow a similar pattern as
>> regular shared memory requests, and I figured that we would want to do
>> something similar for those, but I wasn't sure exactly how to proceed.  I
>> saw two options: 1) use shmem_request_hook for both regular requests and
>> LWLock requests or 2) introduce an lwlock_request_hook.  My instinct was
>> that option 1 was preferable,
> 
> Yeah, I agree, which says that maybe the hook name needs to be something
> else (not that I have a good proposal).
> 
>> but AFAICT this requires introducing a new
>> external variable for inspecting whether the request is made at a valid
>> time.
> 
> Uh, why?  It'd be the core code's responsibility to place the hook
> call at a point where you could do both.

I'm looking for a clean way to ERROR if someone attempts to call
RequestAddinShmemSpace() or RequestNamedLWLockTranche() outside of the
hook.  Currently, we are using static variables in ipci.c and lwlock.c to
silently ignore invalid requests.  I could add a new 'extern bool' called
'process_shmem_requests_in_progress', but extensions could easily hack
around that to allow requests in _PG_init().  Maybe I am overthinking all
this and that is good enough.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> I'm looking for a clean way to ERROR if someone attempts to call
> RequestAddinShmemSpace() or RequestNamedLWLockTranche() outside of the
> hook.  Currently, we are using static variables in ipci.c and lwlock.c to
> silently ignore invalid requests.  I could add a new 'extern bool' called
> 'process_shmem_requests_in_progress', but extensions could easily hack
> around that to allow requests in _PG_init().  Maybe I am overthinking all
> this and that is good enough.

If they do that and it breaks something, that's their fault not ours.
(It's not like there's not $BIGNUM ways for a C-language module to
break the backend, anyway.)

BTW, I'd make such errors FATAL, as it's unlikely that we can recover
cleanly from an error during initialization of a loadable module.
The module's likely to be only partially initialized/hooked in.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
Hi,

On Mon, Apr 18, 2022 at 08:17:04PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
> > I'm looking for a clean way to ERROR if someone attempts to call
> > RequestAddinShmemSpace() or RequestNamedLWLockTranche() outside of the
> > hook.  Currently, we are using static variables in ipci.c and lwlock.c to
> > silently ignore invalid requests.  I could add a new 'extern bool' called
> > 'process_shmem_requests_in_progress', but extensions could easily hack
> > around that to allow requests in _PG_init().  Maybe I am overthinking all
> > this and that is good enough.
> 
> If they do that and it breaks something, that's their fault not ours.
> (It's not like there's not $BIGNUM ways for a C-language module to
> break the backend, anyway.)

Agreed.  Similarly the process_shared_preload_libraries_in_progress flag could
be modified by extension, and that wouldn't be any better.

> BTW, I'd make such errors FATAL, as it's unlikely that we can recover
> cleanly from an error during initialization of a loadable module.
> The module's likely to be only partially initialized/hooked in.

While at it, should we make process_shmem_requests_in_progress true when the
new hook is called?  The hook should only be called when that's the case, and
extension authors may feel like asserting it.



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, Apr 19, 2022 at 05:49:13PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 18, 2022 at 08:17:04PM -0400, Tom Lane wrote:
>> Nathan Bossart <nathandbossart@gmail.com> writes:
>> > I'm looking for a clean way to ERROR if someone attempts to call
>> > RequestAddinShmemSpace() or RequestNamedLWLockTranche() outside of the
>> > hook.  Currently, we are using static variables in ipci.c and lwlock.c to
>> > silently ignore invalid requests.  I could add a new 'extern bool' called
>> > 'process_shmem_requests_in_progress', but extensions could easily hack
>> > around that to allow requests in _PG_init().  Maybe I am overthinking all
>> > this and that is good enough.
>> 
>> If they do that and it breaks something, that's their fault not ours.
>> (It's not like there's not $BIGNUM ways for a C-language module to
>> break the backend, anyway.)
> 
> Agreed.  Similarly the process_shared_preload_libraries_in_progress flag could
> be modified by extension, and that wouldn't be any better.
> 
>> BTW, I'd make such errors FATAL, as it's unlikely that we can recover
>> cleanly from an error during initialization of a loadable module.
>> The module's likely to be only partially initialized/hooked in.
> 
> While at it, should we make process_shmem_requests_in_progress true when the
> new hook is called?  The hook should only be called when that's the case, and
> extension authors may feel like asserting it.

Okay, I did it this way in v5.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
"Anton A. Melnikov"
Date:
Hello!

On 19.04.2022 18:46, Nathan Bossart wrote:
> Okay, I did it this way in v5.
> 

Recently i ran into a problem that it would be preferable in our 
extended pg_stat_statements to use MaxBackEnds in _PG_Init() but it's 
equal to zero here.
And i was very happy to find this patch and this thread.  It was not 
only very interesting and informative for me but also solves the current 
problem. Patch is applied cleanly on current master. Tests under Linux 
and Windows were successful.
I've tried this patch with our extended pg_stat_statements and checked 
that MaxBackends has the correct value during extra shmem allocating. 
Thanks a lot!

+1 for this patch.

I have a little doubt about the comment in postmaster.c: "Now that 
loadable modules have had their chance to alter any GUCs". Perhaps this 
comment is too general. Not sure that it is possible to change any 
arbitrary GUC here.
And maybe not bad to add Assert(size > 0) in RequestAddinShmemSpace() to 
see that the size calculation in contrib was wrong.


With best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Tue, Apr 19, 2022 at 11:47 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> Okay, I did it this way in v5.

I pushed 0001. Regarding 0002, I think there's no point in adding a
_PG_fini(). The code to call _PG_fini() has been surrounded by #ifdef
NOT_USED forever, and that seems unlikely to change any time soon as
long as we stick with these stupid hook designs where there's
effectively a linked list of hooks, but you can't actually access the
list because spread across a bunch of random static variables in each
module. I think we ought to go through all the hooks that are being
used in this kind of way and replace them with a system where there's
an explicit List of functions to call, and instead of this lame stuff
like:

+       prev_shmem_request_hook = shmem_request_hook;
+       shmem_request_hook = autoprewarm_shmem_request;

You instead do:

shmem_request_functions = lappend(shmem_request_functions,
autoprewarm_shmem_request);

Or maybe wrap that up in an API, like:

add_shmem_request_function(autoprewarm_shmem_request);

Then it would be easy to have corresponding a corresponding remove function.

For shmem request, it would probably make sense to ditch the part
where each hook function calls the next hook function and instead just
have the calling code loop over the list and call them one by one. For
things like the executor start/run/end hooks, that wouldn't work, but
you could have something like
invoke_next_executor_start_hook_function(args).

I don't think we should try to make this kind of change now - it seems
to me that what you've done here is consistent with existing practice
and we might as well commit it more or less like this for now. But
perhaps for v16 or some future release we can think about redoing a
bunch of hooks this way. There would be some migration pain for
extension authors for sure, but then we might get to a point where
extensions can be cleanly unloaded in at least some circumstances,
instead of continuing to write silly _PG_fini() functions that
couldn't possibly ever work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> perhaps for v16 or some future release we can think about redoing a
> bunch of hooks this way. There would be some migration pain for
> extension authors for sure, but then we might get to a point where
> extensions can be cleanly unloaded in at least some circumstances,
> instead of continuing to write silly _PG_fini() functions that
> couldn't possibly ever work.

I agree that _PG_fini functions as they stand are worthless.
What I'm not getting is why we should care enough about that
to break just about everybody's extension.  Even if unloading
extensions were feasible, who would bother?

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, May 6, 2022 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > perhaps for v16 or some future release we can think about redoing a
> > bunch of hooks this way. There would be some migration pain for
> > extension authors for sure, but then we might get to a point where
> > extensions can be cleanly unloaded in at least some circumstances,
> > instead of continuing to write silly _PG_fini() functions that
> > couldn't possibly ever work.
>
> I agree that _PG_fini functions as they stand are worthless.
> What I'm not getting is why we should care enough about that
> to break just about everybody's extension.  Even if unloading
> extensions were feasible, who would bother?

Well, if we think that, then we ought to remove the NOT_USED code and
all the random _PG_fini() stuff that's still floating around.

I don't actually have a clear answer to whether it's a useful thing to
be able to unload modules. If the module is just providing a bunch of
SQL-callable functions, it probably isn't. If it's modifying the
behavior of your session, it might be. Now, it could also have an
"off" switch in the form of a GUC, and probably should - but you'd
probably save at least a few cycles by detaching from the hooks rather
than just still getting called and doing nothing, and maybe that's
worth something. Whether it's worth enough to justify making changes
that will affect extensions, I'm not sure.

IOW, I don't really know what we ought to do here, but I think that
maintaining a vestigial system that has never worked and can never
work is not it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 6, 2022 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I agree that _PG_fini functions as they stand are worthless.
>> What I'm not getting is why we should care enough about that
>> to break just about everybody's extension.  Even if unloading
>> extensions were feasible, who would bother?

> Well, if we think that, then we ought to remove the NOT_USED code and
> all the random _PG_fini() stuff that's still floating around.

I think that's exactly what we should do, if it bugs you that stuff
is just sitting there.  I see no prospect that we'll ever make it
work, because the question of unhooking from hooks is just the tip
of the iceberg.  As an example, what should happen with any custom
GUCs the module has defined?  Dropping their values might not be
very nice, but if we leave them around then the next LOAD (if any)
will see a conflict.  Another fun question is whether it's ever
safe to unload a module that was preloaded by the postmaster.

In short, this seems like a can of very wriggly worms, with not
a lot of benefit that would ensue from opening it.

            regards, tom lane



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Fri, May 06, 2022 at 10:43:21AM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, May 6, 2022 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I agree that _PG_fini functions as they stand are worthless.
>>> What I'm not getting is why we should care enough about that
>>> to break just about everybody's extension.  Even if unloading
>>> extensions were feasible, who would bother?
> 
>> Well, if we think that, then we ought to remove the NOT_USED code and
>> all the random _PG_fini() stuff that's still floating around.
> 
> I think that's exactly what we should do, if it bugs you that stuff
> is just sitting there.  I see no prospect that we'll ever make it
> work, because the question of unhooking from hooks is just the tip
> of the iceberg.  As an example, what should happen with any custom
> GUCs the module has defined?  Dropping their values might not be
> very nice, but if we leave them around then the next LOAD (if any)
> will see a conflict.  Another fun question is whether it's ever
> safe to unload a module that was preloaded by the postmaster.
> 
> In short, this seems like a can of very wriggly worms, with not
> a lot of benefit that would ensue from opening it.

+1, I'll put together a new patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote:
> +1, I'll put together a new patch set.

As promised...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Fri, May 6, 2022 at 5:15 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote:
> > +1, I'll put together a new patch set.
>
> As promised...

Looks reasonable to me.

Anyone else have thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Fri, May 06, 2022 at 07:51:43PM -0400, Robert Haas wrote:
> On Fri, May 6, 2022 at 5:15 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote:
>> > +1, I'll put together a new patch set.
>>
>> As promised...
>
> Looks reasonable to me.

0001 looks sensible seen from here with this approach.

> Anyone else have thoughts?

I agree that removing support for the unloading part would be nice to
clean up now on HEAD.  Note that 0002 is missing the removal of one
reference to _PG_fini in xfunc.sgml (<primary> markup).
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Tue, May 10, 2022 at 05:55:12PM +0900, Michael Paquier wrote:
> I agree that removing support for the unloading part would be nice to
> clean up now on HEAD.  Note that 0002 is missing the removal of one
> reference to _PG_fini in xfunc.sgml (<primary> markup).

Oops, sorry about that.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Tue, May 10, 2022 at 08:56:28AM -0700, Nathan Bossart wrote:
> On Tue, May 10, 2022 at 05:55:12PM +0900, Michael Paquier wrote:
>> I agree that removing support for the unloading part would be nice to
>> clean up now on HEAD.  Note that 0002 is missing the removal of one
>> reference to _PG_fini in xfunc.sgml (<primary> markup).
>
> Oops, sorry about that.

0002 looks pretty good from here.  If it were me, I would apply 0002
first to avoid the extra tweak in pg_stat_statements with _PG_fini in
0001.

Regarding 0001, I have spotted an extra issue in the documentation.
xfunc.sgml has a section called "Shared Memory and LWLocks" about the
use of RequestNamedLWLockTranche() and RequestAddinShmemSpace() called
from _PG_init(), which would now be wrong as we'd need the hook.  IMO,
the docs should mention the registration of the hook in _PG_init(),
the APIs involved, and should mention to look at pg_stat_statements.c
for a code sample (we tend to avoid the addition of sample code in the
docs lately as we habe no way to check their compilation
automatically).

Robert, are you planning to look at what's proposed and push these?
FWIW, I am familiar enough with the problems at hand to do this in
time for beta1 (aka by tomorrow to give it room in the buildfarm), but
I don't want to step on your feet, either.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, May 11, 2022 at 11:18:29AM +0900, Michael Paquier wrote:
> 0002 looks pretty good from here.  If it were me, I would apply 0002
> first to avoid the extra tweak in pg_stat_statements with _PG_fini in
> 0001.

I swapped 0001 and 0002 in v8.

> Regarding 0001, I have spotted an extra issue in the documentation.
> xfunc.sgml has a section called "Shared Memory and LWLocks" about the
> use of RequestNamedLWLockTranche() and RequestAddinShmemSpace() called
> from _PG_init(), which would now be wrong as we'd need the hook.  IMO,
> the docs should mention the registration of the hook in _PG_init(),
> the APIs involved, and should mention to look at pg_stat_statements.c
> for a code sample (we tend to avoid the addition of sample code in the
> docs lately as we habe no way to check their compilation
> automatically).

Ah, good catch.  I adjusted the docs as you suggested.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Wed, May 11, 2022 at 12:12 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> I swapped 0001 and 0002 in v8.
>
> Ah, good catch.  I adjusted the docs as you suggested.

OK, I have committed 0001 now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Wed, May 11, 2022 at 04:18:10PM -0400, Robert Haas wrote:
> OK, I have committed 0001 now.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
"Anton A. Melnikov"
Date:
On 12.05.2022 00:01, Nathan Bossart wrote:
> On Wed, May 11, 2022 at 04:18:10PM -0400, Robert Haas wrote:
>> OK, I have committed 0001 now.
> 
> Thanks!
> 

Maybe remove the first part from the patchset?
Because now the Patch Tester is giving apply error for the first part 
and can't test the other.
http://cfbot.cputube.org/patch_38_3614.log



With best regards,
-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: make MaxBackends available in _PG_init

From
Michael Paquier
Date:
On Thu, May 12, 2022 at 06:51:59PM +0300, Anton A. Melnikov wrote:
> Maybe remove the first part from the patchset?
> Because now the Patch Tester is giving apply error for the first part and
> can't test the other.
> http://cfbot.cputube.org/patch_38_3614.log

Yep.  As simple as the attached.
--
Michael

Attachment

Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Thu, May 12, 2022 at 8:15 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, May 12, 2022 at 06:51:59PM +0300, Anton A. Melnikov wrote:
> > Maybe remove the first part from the patchset?
> > Because now the Patch Tester is giving apply error for the first part and
> > can't test the other.
> > http://cfbot.cputube.org/patch_38_3614.log
>
> Yep.  As simple as the attached.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: make MaxBackends available in _PG_init

From
Nathan Bossart
Date:
On Fri, May 13, 2022 at 09:49:54AM -0400, Robert Haas wrote:
> Committed.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: make MaxBackends available in _PG_init

From
Julien Rouhaud
Date:
On Fri, May 13, 2022 at 09:49:54AM -0400, Robert Haas wrote:
>
> Committed.

Thanks!

For the record I submitted patches or pull requests this weekend for all
repositories I was aware of to fix the compatibility with this patch, hoping
that it will save some time to the packagers when they will take care of the
beta.



Re: make MaxBackends available in _PG_init

From
Robert Haas
Date:
On Sun, May 15, 2022 at 8:58 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> For the record I submitted patches or pull requests this weekend for all
> repositories I was aware of to fix the compatibility with this patch, hoping
> that it will save some time to the packagers when they will take care of the
> beta.

Nice!

-- 
Robert Haas
EDB: http://www.enterprisedb.com