Thread: Re: Estimating HugePages Requirements?

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
moving to pgsql-hackers@

On 6/9/21, 9:41 AM, "Don Seiler" <don@seiler.us> wrote:
> I'm trying to set up a chef recipe to reserve enough HugePages on a
> linux system for our PG servers. A given VM will only host one PG
> cluster and that will be the only thing on that host that uses
> HugePages. Blogs that I've seen suggest that it would be as simple
> as taking the shared_buffers setting and dividing that by 2MB (huge
> page size), however I found that I needed some more.
>
> In my test case, shared_buffers is set to 4003MB (calculated by
> chef) but PG failed to start until I reserved a few hundred more MB.
> When I checked VmPeak, it was 4321MB, so I ended up having to
> reserve over 2161 huge pages, over a hundred more than I had
> originally thought.
>
> I'm told other factors contribute to this additional memory
> requirement, such as max_connections, wal_buffers, etc. I'm
> wondering if anyone has been able to come up with a reliable method
> for determining the HugePages requirements for a PG cluster based on
> the GUC values (that would be known at deployment time).

In RDS, we've added a pg_ctl option that returns the amount of shared
memory required.  Basically, we start up postmaster just enough to get
an accurate value from CreateSharedMemoryAndSemaphores() and then shut
down.  The patch is quite battle-tested at this point (we first
started using it in 2017, and we've been enabling huge pages by
default since v10).  I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.

Nathan


Re: Estimating HugePages Requirements?

From
Mark Dilger
Date:

> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>
> I'd be happy to clean it up and submit it for
> discussion in pgsql-hackers@ if there is interest.

Yes, I'd like to see it.  Thanks for offering.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>
>> I'd be happy to clean it up and submit it for
>> discussion in pgsql-hackers@ if there is interest.
>
> Yes, I'd like to see it.  Thanks for offering.

Here's the general idea.  It still needs a bit of polishing, but I'm
hoping this is enough to spark some discussion on the approach.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
>>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>>
>>> I'd be happy to clean it up and submit it for
>>> discussion in pgsql-hackers@ if there is interest.
>>
>> Yes, I'd like to see it.  Thanks for offering.
>
> Here's the general idea.  It still needs a bit of polishing, but I'm
> hoping this is enough to spark some discussion on the approach.

Here's a rebased version of the patch.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Zhihong Yu
Date:


On Mon, Aug 9, 2021 at 3:57 PM Bossart, Nathan <bossartn@amazon.com> wrote:
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
>>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>>
>>> I'd be happy to clean it up and submit it for
>>> discussion in pgsql-hackers@ if there is interest.
>>
>> Yes, I'd like to see it.  Thanks for offering.
>
> Here's the general idea.  It still needs a bit of polishing, but I'm
> hoping this is enough to spark some discussion on the approach.

Here's a rebased version of the patch.

Nathan

Hi,

-extern void CreateSharedMemoryAndSemaphores(void);
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);

Should the parameter be enum / bitmask so that future addition would not change the method signature ?

Cheers 

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/9/21, 4:05 PM, "Zhihong Yu" <zyu@yugabyte.com> wrote:
> -extern void CreateSharedMemoryAndSemaphores(void);
> +extern Size CreateSharedMemoryAndSemaphores(bool size_only);
>
> Should the parameter be enum / bitmask so that future addition would not change the method signature ?

I don't have a strong opinion about this.  I don't feel that it's
really necessary, but if reviewers want a bitmask instead, I can
change it.

Nathan


Re: Estimating HugePages Requirements?

From
Andres Freund
Date:
Hi,

On 2021-08-09 22:57:18 +0000, Bossart, Nathan wrote:

> @@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[])
>       */
>      InitializeMaxBackends();
>  
> +    if (output_shmem)
> +    {
> +        char output[64];
> +        Size size;
> +
> +        size = CreateSharedMemoryAndSemaphores(true);
> +        sprintf(output, "%zu", size);
> +
> +        puts(output);
> +        ExitPostmaster(0);
> +    }

I don't like putting this into PostmasterMain(). Either BootstrapMain()
(specifically checker mode) or GucInfoMain() seem like better places.


> -void
> -CreateSharedMemoryAndSemaphores(void)
> +Size
> +CreateSharedMemoryAndSemaphores(bool size_only)
>  {
>      PGShmemHeader *shim = NULL;
>  
> @@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void)
>          /* might as well round it off to a multiple of a typical page size */
>          size = add_size(size, 8192 - (size % 8192));
>  
> +        if (size_only)
> +            return size;
> +
>          elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
>  
>          /*
> @@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void)
>       */
>      if (shmem_startup_hook)
>          shmem_startup_hook();
> +
> +    return 0;
>  }

That seems like an ugly API to me. Why don't we split the size
determination and shmem creation functions into two?


Greetings,

Andres Freund



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/9/21, 8:43 PM, "Andres Freund" <andres@anarazel.de> wrote:
> I don't like putting this into PostmasterMain(). Either BootstrapMain()
> (specifically checker mode) or GucInfoMain() seem like better places.

I think BootstrapModeMain() makes the most sense.  It fits in nicely
with the --check logic that's already there.  With v3, the following
command can be used to retrieve the amount of shared memory required.

        postgres --output-shmem -D dir

While testing this new option, I noticed that you can achieve similar
results today with the following command, although this one will
actually try to create the shared memory, too.

        postgres --check -D dir -c log_min_messages=debug3 2> >(grep IpcMemoryCreate)

IMO the new option is still handy, but I can see the argument that it
might not be necessary.

> That seems like an ugly API to me. Why don't we split the size
> determination and shmem creation functions into two?

I did it this way in v3.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
> I think BootstrapModeMain() makes the most sense.  It fits in nicely
> with the --check logic that's already there.  With v3, the following
> command can be used to retrieve the amount of shared memory required.
>
>         postgres --output-shmem -D dir
>
> While testing this new option, I noticed that you can achieve similar
> results today with the following command, although this one will
> actually try to create the shared memory, too.

That may not be the best option.

> IMO the new option is still handy, but I can see the argument that it
> might not be necessary.

A separate option looks handy.  Wouldn't it be better to document it
in postgres-ref.sgml then?
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:
On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
> > I think BootstrapModeMain() makes the most sense.  It fits in nicely
> > with the --check logic that's already there.  With v3, the following
> > command can be used to retrieve the amount of shared memory required.
> >
> >         postgres --output-shmem -D dir
> >
> > While testing this new option, I noticed that you can achieve similar
> > results today with the following command, although this one will
> > actually try to create the shared memory, too.
>
> That may not be the best option.

I would say that can be a disastrous option.

First of all it would probably not work if you already have something
running -- especially when using huge pages. And if it does work, in
that or other scenarios, it can potentially have significant impact on
a running cluster to suddenly allocate many GB of more memory...


> > IMO the new option is still handy, but I can see the argument that it
> > might not be necessary.
>
> A separate option looks handy.  Wouldn't it be better to document it
> in postgres-ref.sgml then?

I'd say a lot more than just handy. I don't think the workaround is
really all that useful.

(haven't looked at the actual patch yet, just commenting on the principle)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/27/21, 7:41 AM, "Magnus Hagander" <magnus@hagander.net> wrote:
> On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
>> > While testing this new option, I noticed that you can achieve similar
>> > results today with the following command, although this one will
>> > actually try to create the shared memory, too.
>>
>> That may not be the best option.
>
> I would say that can be a disastrous option.
>
> First of all it would probably not work if you already have something
> running -- especially when using huge pages. And if it does work, in
> that or other scenarios, it can potentially have significant impact on
> a running cluster to suddenly allocate many GB of more memory...

The v3 patch actually didn't work if the server was already running.
I removed that restriction in v4.

>> > IMO the new option is still handy, but I can see the argument that it
>> > might not be necessary.
>>
>> A separate option looks handy.  Wouldn't it be better to document it
>> in postgres-ref.sgml then?
>
> I'd say a lot more than just handy. I don't think the workaround is
> really all that useful.

I added some documentation in v4.

Nathan  


Attachment

Re: Estimating HugePages Requirements?

From
Andres Freund
Date:
On 2021-08-27 16:40:27 +0200, Magnus Hagander wrote:
> On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
> I'd say a lot more than just handy. I don't think the workaround is
> really all that useful.

+1



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/27/21, 11:16 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> I added some documentation in v4.

I realized that my attempt at documenting this new option was missing
some important context about the meaning of the return value when used
against a running server.  I added that in v5.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Andres Freund
Date:
Hi,

On 2021-08-27 19:27:18 +0000, Bossart, Nathan wrote:
> +     <varlistentry>
> +      <term><option>--output-shmem</option></term>
> +      <listitem>
> +       <para>
> +        Prints the amount of shared memory required for the given
> +        configuration and exits.  This can be used on a running server, but
> +        the return value reflects the amount of shared memory needed based
> +        on the current invocation.  It does not return the amount of shared
> +        memory in use by the running server.  This must be the first
> +        argument on the command line.
> +       </para>
> +
> +       <para>
> +        This option is useful for determining the number of huge pages
> +        needed for the server.  For more information, see
> +        <xref linkend="linux-huge-pages"/>.
> +       </para>
> +      </listitem>
> +     </varlistentry>
> +

One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?

I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...


> diff --git a/src/backend/main/main.c b/src/backend/main/main.c
> index 3a2a0d598c..c141ae3d1c 100644
> --- a/src/backend/main/main.c
> +++ b/src/backend/main/main.c
> @@ -182,9 +182,11 @@ main(int argc, char *argv[])
>       */
>  
>      if (argc > 1 && strcmp(argv[1], "--check") == 0)
> -        BootstrapModeMain(argc, argv, true);
> +        BootstrapModeMain(argc, argv, true, false);
> +    else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
> +        BootstrapModeMain(argc, argv, false, true);
>      else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
> -        BootstrapModeMain(argc, argv, false);
> +        BootstrapModeMain(argc, argv, false, false);
>  #ifdef EXEC_BACKEND
>      else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
>          SubPostmasterMain(argc, argv);

help() needs an update too.


> diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
> index 3e4ec53a97..b225b1ee70 100644
> --- a/src/backend/storage/ipc/ipci.c
> +++ b/src/backend/storage/ipc/ipci.c
> @@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
>      total_addin_request = add_size(total_addin_request, size);
>  }
>  
> +/*
> + * CalculateShmemSize
> + *        Calculates the amount of shared memory and number of semaphores needed.
> + *
> + * If num_semaphores is not NULL, it will be set to the number of semaphores
> + * required.
> + *
> + * Note that this function freezes the additional shared memory request size
> + * from loadable modules.
> + */
> +Size
> +CalculateShmemSize(int *num_semaphores)
> +{

Can you split this into a separate commit? It feels fairy uncontroversial to
me, so I think we could just apply it soon?

Greetings,

Andres Freund



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
> One thing I wonder is if this wouldn't better be dealt with in a more generic
> way. While this is the most problematic runtime computed GUC, it's not the
> only one. What if we introduced a new shared_memory_size GUC, and made
> --describe-config output it? Perhaps adding --describe-config=guc-name?
>
> I also wonder if we should output the number of hugepages needed instead of
> the "raw" bytes of shared memory. The whole business about figuring out the
> huge page size, dividing the shared memory size by that and then rounding up
> could be removed in that case. Due to huge_page_size it's not even immediately
> obvious which huge page size one should use...

I like both of these ideas.

> Can you split this into a separate commit? It feels fairy uncontroversial to
> me, so I think we could just apply it soon?

I attached a patch for just the uncontroversial part, which is
unfortunately all I have time for today.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
> On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
>> One thing I wonder is if this wouldn't better be dealt with in a more generic
>> way. While this is the most problematic runtime computed GUC, it's not the
>> only one. What if we introduced a new shared_memory_size GUC, and made
>> --describe-config output it? Perhaps adding --describe-config=guc-name?
>>
>> I also wonder if we should output the number of hugepages needed instead of
>> the "raw" bytes of shared memory. The whole business about figuring out the
>> huge page size, dividing the shared memory size by that and then rounding up
>> could be removed in that case. Due to huge_page_size it's not even immediately
>> obvious which huge page size one should use...
>
> I like both of these ideas.

That pretty much looks like -C in concept, isn't it?  Except that you
cannot get the actual total shared memory value because we'd do this
operation before loading shared_preload_libraries and miss any amount
asked by extensions.  There is a problem similar when attempting to do
postgres -C data_checksums, for example, which would output an
incorrect value even if the cluster has data checksums enabled.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Justin Pryzby
Date:
On Sat, Aug 28, 2021 at 11:00:11AM +0900, Michael Paquier wrote:
> On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
> > On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
> >> One thing I wonder is if this wouldn't better be dealt with in a more generic
> >> way. While this is the most problematic runtime computed GUC, it's not the
> >> only one. What if we introduced a new shared_memory_size GUC, and made
> >> --describe-config output it? Perhaps adding --describe-config=guc-name?
> >>
> >> I also wonder if we should output the number of hugepages needed instead of
> >> the "raw" bytes of shared memory. The whole business about figuring out the
> >> huge page size, dividing the shared memory size by that and then rounding up
> >> could be removed in that case. Due to huge_page_size it's not even immediately
> >> obvious which huge page size one should use...
> > 
> > I like both of these ideas.
> 
> That pretty much looks like -C in concept, isn't it?  Except that you
> cannot get the actual total shared memory value because we'd do this
> operation before loading shared_preload_libraries and miss any amount
> asked by extensions.  There is a problem similar when attempting to do
> postgres -C data_checksums, for example, which would output an
> incorrect value even if the cluster has data checksums enabled.

Since we don't want to try to allocate the huge pages, and we also don't want
to compute based on shared_buffers alone, did anyone consider if pg_controldata
is the right place to put this ?

It includes a lot of related stuff:

max_connections setting:              100
max_worker_processes setting:         8
 - (added in 2013: 6bc8ef0b7f1f1df3998745a66e1790e27424aa0c)
max_wal_senders setting:              10
max_prepared_xacts setting:           2
max_locks_per_xact setting:           64

I'm not sure if there's any reason these aren't also shown (?)
autovacuum_max_workers - added in 2007: e2a186b03
max_predicate_locks_per_xact - added in 2011: dafaa3efb
max_logical_replication_workers
max_replication_slots

-- 
Justin



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/27/21, 7:01 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
>> On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
>>> One thing I wonder is if this wouldn't better be dealt with in a more generic
>>> way. While this is the most problematic runtime computed GUC, it's not the
>>> only one. What if we introduced a new shared_memory_size GUC, and made
>>> --describe-config output it? Perhaps adding --describe-config=guc-name?
>>>
>>> I also wonder if we should output the number of hugepages needed instead of
>>> the "raw" bytes of shared memory. The whole business about figuring out the
>>> huge page size, dividing the shared memory size by that and then rounding up
>>> could be removed in that case. Due to huge_page_size it's not even immediately
>>> obvious which huge page size one should use...
>> 
>> I like both of these ideas.
>
> That pretty much looks like -C in concept, isn't it?  Except that you
> cannot get the actual total shared memory value because we'd do this
> operation before loading shared_preload_libraries and miss any amount
> asked by extensions.  There is a problem similar when attempting to do
> postgres -C data_checksums, for example, which would output an
> incorrect value even if the cluster has data checksums enabled.

Attached is a hacky attempt at adding a shared_memory_size GUC in a
way that could be used with -C.  This should include the amount of
shared memory requested by extensions, too.  As long as huge_page_size
is nonzero, it seems easy enough to provide the number of huge pages
needed as well.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Sat, Aug 28, 2021 at 05:36:37AM +0000, Bossart, Nathan wrote:
> Attached is a hacky attempt at adding a shared_memory_size GUC in a
> way that could be used with -C.  This should include the amount of
> shared memory requested by extensions, too.  As long as huge_page_size
> is nonzero, it seems easy enough to provide the number of huge pages
> needed as well.

Yes, the implementation is not good.  The key thing is that by wanting
to support shared_memory_size with the -C switch of postgres, we need
to call process_shared_preload_libraries before output_config_variable.
This additionally means to call ApplyLauncherRegister() before that so
as all the bgworker slots are not taken first.  Going through
_PG_init() also means that we'd better use ChangeToDataDir()
beforehand.

Attached is a WIP to show how the order of the operations could be
changed, as that's easier to grasp.  Even if we don't do that, having
the GUC and the refactoring of CalculateShmemSize() would still be
useful, as one could just query an existing instance for an estimation
of huge pages for a cloned one.

The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and
GUC_DISALLOW_IN_FILE, with some documentation, of course.  I added the
flags to the GUC, not the docs.   The code setting up the GUC is not
good either.  It would make sense to just have that in a small wrapper
of ipci.c, perhaps.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/30/21, 12:29 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Attached is a WIP to show how the order of the operations could be
> changed, as that's easier to grasp.  Even if we don't do that, having
> the GUC and the refactoring of CalculateShmemSize() would still be
> useful, as one could just query an existing instance for an estimation
> of huge pages for a cloned one.
>
> The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and
> GUC_DISALLOW_IN_FILE, with some documentation, of course.  I added the
> flags to the GUC, not the docs.   The code setting up the GUC is not
> good either.  It would make sense to just have that in a small wrapper
> of ipci.c, perhaps.

I moved the GUC calculation to ipci.c, adjusted the docs, and added a
huge_pages_required GUC.  It's still a little rough around the edges,
and I haven't tested it on Windows, but this seems like the direction
the patch is headed.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Aug 31, 2021 at 05:37:52AM +0000, Bossart, Nathan wrote:
> I moved the GUC calculation to ipci.c, adjusted the docs, and added a
> huge_pages_required GUC.  It's still a little rough around the edges,
> and I haven't tested it on Windows, but this seems like the direction
> the patch is headed.

Hmm.  I am not sure about the addition of huge_pages_required, knowing
that we would have shared_memory_size.  I'd rather let the calculation
part to the user with a scan of /proc/meminfo.

+#elif defined(WIN32)
+   hp_size = GetLargePageMinimum();
+#endif
+
+#if defined(MAP_HUGETLB) || defined(WIN32)
+   hp_required = (size_b / hp_size) + 1;
As of [1], there is the following description:
"If the processor does not support large pages, the return value is
zero."
So there is a problem here.

[1]: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-getlargepageminimum
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Hmm.  I am not sure about the addition of huge_pages_required, knowing
> that we would have shared_memory_size.  I'd rather let the calculation
> part to the user with a scan of /proc/meminfo.

I included this based on some feedback from Andres upthread [0].  I
went ahead and split the patch set into 3 pieces in case we end up
leaving it out.

> +#elif defined(WIN32)
> +   hp_size = GetLargePageMinimum();
> +#endif
> +
> +#if defined(MAP_HUGETLB) || defined(WIN32)
> +   hp_required = (size_b / hp_size) + 1;
> As of [1], there is the following description:
> "If the processor does not support large pages, the return value is
> zero."
> So there is a problem here.

I've fixed this in v4.

Nathan

[0] https://postgr.es/m/20210827193813.oqo5lamvyzahs35o%40alap3.anarazel.de


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Sep 01, 2021 at 06:28:21PM +0000, Bossart, Nathan wrote:
> On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Hmm.  I am not sure about the addition of huge_pages_required, knowing
>> that we would have shared_memory_size.  I'd rather let the calculation
>> part to the user with a scan of /proc/meminfo.
>
> I included this based on some feedback from Andres upthread [0].  I
> went ahead and split the patch set into 3 pieces in case we end up
> leaving it out.

Thanks.  Anyway, we don't really need huge_pages_required on Windows,
do we?  The following docs of Windows tell what do to when using large
pages:
https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support

The backend code does that as in PGSharedMemoryCreate(), now that I
look at it.  And there is no way to change the minimum large page size
there as far as I can see because that's decided by the processor, no?
There is a case for shared_memory_size on Windows to be able to adjust
the sizing of the memory of the host, though.

>> +#elif defined(WIN32)
>> +   hp_size = GetLargePageMinimum();
>> +#endif
>> +
>> +#if defined(MAP_HUGETLB) || defined(WIN32)
>> +   hp_required = (size_b / hp_size) + 1;
>> As of [1], there is the following description:
>> "If the processor does not support large pages, the return value is
>> zero."
>> So there is a problem here.
>
> I've fixed this in v4.

At the end it would be nice to not finish with two GUCs.  Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Thanks.  Anyway, we don't really need huge_pages_required on Windows,
> do we?  The following docs of Windows tell what do to when using large
> pages:
> https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
>
> The backend code does that as in PGSharedMemoryCreate(), now that I
> look at it.  And there is no way to change the minimum large page size
> there as far as I can see because that's decided by the processor, no?
> There is a case for shared_memory_size on Windows to be able to adjust
> the sizing of the memory of the host, though.

Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.

> At the end it would be nice to not finish with two GUCs.  Both depend
> on the reordering of the actions done by the postmaster, so I'd be
> curious to hear the thoughts of others on this particular point.

Of course.  It'd be great to hear others' thoughts on this stuff.

Nathan


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote:
> Yeah, huge_pages_required might not serve much purpose for Windows.
> We could always set it to -1 for Windows if it seems like it'll do
> more harm than good.

I'd be fine with this setup on environments where there is no need for
it.

>> At the end it would be nice to not finish with two GUCs.  Both depend
>> on the reordering of the actions done by the postmaster, so I'd be
>> curious to hear the thoughts of others on this particular point.
>
> Of course.  It'd be great to hear others' thoughts on this stuff.

Just to be clear here, the ordering of HEAD is that for the
postmaster:
- Load configuration.
- Handle -C config_param.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).
- Register apply launcher
- shared_preload_libraries

With 0002, we have that:
- Load configuration.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Register apply launcher
- shared_preload_libraries
- Calculate the shmem GUCs (new step)
- Handle -C config_param.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).

One thing that would be incorrect upon more review is that we'd still
have data_checksums wrong with -C, meaning that the initial read of
the control file should be moved further up, and getting the control
file checks done before registering workers would be better.  Keeping
the lock file at the end would be fine AFAIK, but should we worry
about the interactions with _PG_init() here?

0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Kyotaro Horiguchi
Date:
At Thu, 2 Sep 2021 16:46:56 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > Thanks.  Anyway, we don't really need huge_pages_required on Windows,
> > do we?  The following docs of Windows tell what do to when using large
> > pages:
> > https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
> >
> > The backend code does that as in PGSharedMemoryCreate(), now that I
> > look at it.  And there is no way to change the minimum large page size
> > there as far as I can see because that's decided by the processor, no?
> > There is a case for shared_memory_size on Windows to be able to adjust
> > the sizing of the memory of the host, though.
> 
> Yeah, huge_pages_required might not serve much purpose for Windows.
> We could always set it to -1 for Windows if it seems like it'll do
> more harm than good.

I agreed to this.

> > At the end it would be nice to not finish with two GUCs.  Both depend
> > on the reordering of the actions done by the postmaster, so I'd be
> > curious to hear the thoughts of others on this particular point.
> 
> Of course.  It'd be great to hear others' thoughts on this stuff.

Honestly, I would be satisfied if the following error message
contained required huge pages.

FATAL:  could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap
space,or huge pages. To reduce the request size (currently 148897792 bytes), reduce PostgreSQL's shared memory usage,
perhapsby reducing shared_buffers or max_connections.
 

Or emit a different message if huge_pages=on.

FATAL: could not map anonymous shared memory from huge pages
HINT:  This usually means that PostgreSQL's request for huge pages more than available. The required 2048kB huge pages
forthe required memory size (currently 148897792 bytes) is 71 pages.
 


Returning to this feature, even if I am informed that via GUC, I won't
add memory by looking shared_memory_size.  Anyway since shard_buffers
occupies almost all portion of shared memory allocated to postgres, we
are not supposed to need such a precise adjustment of the required
size of shared memory.  On the other hand available number of huge
pages is configurable and we need to set it as required.  On the other
hand, it might seem to me a bit strange that there's only
huge_page_required and not shared_memory_size in the view of
comprehensiveness or completeness.  So my feeling at this point is "I
need only huge_pages_required but might want shared_memory_size just
for completeness".


By the way I noticed that postgres -C huge_page_size shows 0, which I
think should have the number used for the calculation if we show
huge_page_required.

I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size.  As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote:
>> Yeah, huge_pages_required might not serve much purpose for Windows.
>> We could always set it to -1 for Windows if it seems like it'll do
>> more harm than good.
>
> I'd be fine with this setup on environments where there is no need for
> it.

I did this in v5.

> One thing that would be incorrect upon more review is that we'd still
> have data_checksums wrong with -C, meaning that the initial read of
> the control file should be moved further up, and getting the control
> file checks done before registering workers would be better.  Keeping
> the lock file at the end would be fine AFAIK, but should we worry
> about the interactions with _PG_init() here?

I think we can avoid so much reordering by moving the -C handling
instead.  That should also fix things like data_checksums.  I split
the reordering part out into its own patch in v5.

You bring up an interesting point about _PG_init().  Presently, you
can safely assume that the data directory is locked during _PG_init(),
so there's no need to worry about breaking something on a running
server.  I don't know how important this is.  Most _PG_init()
functions that I've seen will define some GUCs, request some shared
memory, register some background workers, and/or install some hooks.
Those all seem like safe things to do, but I wouldn't be at all
surprised to hear examples to the contrary.  In any case, it looks
like the current ordering of these two steps has been there for 15+
years.

If this is a concern, one option would be to disallow running "-C
shared_memory_size" on running servers.  That would have to extend to
GUCs like data_checksums and huge_pages_required, too.

> 0001, that refactors the calculation of the shmem size into a
> different routine, is fine as-is, so I'd like to move on and apply
> it.

Sounds good to me.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> By the way I noticed that postgres -C huge_page_size shows 0, which I
> think should have the number used for the calculation if we show
> huge_page_required.

I would agree with this if huge_page_size was a runtime-computed GUC,
but since it's intended for users to explicitly request the huge page
size, it might be slightly confusing.  Perhaps another option would be
to create a new GUC for this.  Or maybe it's enough to note that the
value will be changed from 0 at runtime if huge pages are supported.
In any case, it might be best to handle this separately.

> I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
> whereas the error message above showed 148897792 bytes (142MB). So it
> seems that something is forgotten while calculating
> shared_memory_size.  As the consequence, launching postgres setting
> huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
> "could not map anonymous shared memory" error.

Hm.  I'm not seeing this with the v5 patch set, so maybe I
inadvertently fixed it already.  Can you check this again with v5?

Nathan


Re: Estimating HugePages Requirements?

From
Andres Freund
Date:
Hi,

On 2021-09-01 15:53:52 +0900, Michael Paquier wrote:
> Hmm.  I am not sure about the addition of huge_pages_required, knowing
> that we would have shared_memory_size.  I'd rather let the calculation
> part to the user with a scan of /proc/meminfo.

-1. We can easily do better, what do we gain by making the user do this stuff?
Especially because the right value also depends on huge_page_size.

Greetings,

Andres Freund



Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote:
> On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> You bring up an interesting point about _PG_init().  Presently, you
> can safely assume that the data directory is locked during _PG_init(),
> so there's no need to worry about breaking something on a running
> server.  I don't know how important this is.  Most _PG_init()
> functions that I've seen will define some GUCs, request some shared
> memory, register some background workers, and/or install some hooks.
> Those all seem like safe things to do, but I wouldn't be at all
> surprised to hear examples to the contrary.  In any case, it looks
> like the current ordering of these two steps has been there for 15+
> years.

Yeah.  What you are describing here matches what I have seen in the
past and what we do in core for _PG_init().  Now extensions developers
could do more fancy things, like dropping things on-disk to check the
load state, for whatever reasons.  And things could break in such
cases.  Perhaps people should not do that, but it is no fun either to
break code that has been working for years even if that's just a major
upgrade.

+        * We skip this step if we are just going to print a GUC's value and exit
+        * a few steps down.
         */
-       CreateDataDirLockFile(true);
+       if (output_config_variable == NULL)
+               CreateDataDirLockFile(true);

Anyway, 0002 gives me shivers.

> If this is a concern, one option would be to disallow running "-C
> shared_memory_size" on running servers.  That would have to extend to
> GUCs like data_checksums and huge_pages_required, too.

Just noting this bit from 0003 that would break without 0002:
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
+6339

>> 0001, that refactors the calculation of the shmem size into a
>> different routine, is fine as-is, so I'd like to move on and apply
>> it.
>
> Sounds good to me.

Applied this one.

Without concluding on 0002 yet, another thing that we could do is to
just add the GUCs.  These sound rather useful on their own (mixed
feelings about huge_pages_required but I can see why it is useful to
avoid the setup steps and the backend already grabs this information),
particularly when it comes to cloned setups that share a lot of
properties.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote:
>> On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>>> 0001, that refactors the calculation of the shmem size into a
>>> different routine, is fine as-is, so I'd like to move on and apply
>>> it.
>> 
>> Sounds good to me.
>
> Applied this one.

Thanks!

> Without concluding on 0002 yet, another thing that we could do is to
> just add the GUCs.  These sound rather useful on their own (mixed
> feelings about huge_pages_required but I can see why it is useful to
> avoid the setup steps and the backend already grabs this information),
> particularly when it comes to cloned setups that share a lot of
> properties.

I think this is a good starting point, but I'd like to follow up on
making them visible without completely starting the server.  The main
purpose for adding these GUCs is to be able to set up huge pages
before server startup.  Disallowing "-C huge_pages_required" on a
running server to enable this use-case seems like a modest tradeoff.

Anyway, I'll restructure the remaining patches to add the GUCs first
and then address the 0002 business separately.

Nathan


Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/5/21, 9:26 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Without concluding on 0002 yet, another thing that we could do is to
>> just add the GUCs.  These sound rather useful on their own (mixed
>> feelings about huge_pages_required but I can see why it is useful to
>> avoid the setup steps and the backend already grabs this information),
>> particularly when it comes to cloned setups that share a lot of
>> properties.
>
> I think this is a good starting point, but I'd like to follow up on
> making them visible without completely starting the server.  The main
> purpose for adding these GUCs is to be able to set up huge pages
> before server startup.  Disallowing "-C huge_pages_required" on a
> running server to enable this use-case seems like a modest tradeoff.
>
> Anyway, I'll restructure the remaining patches to add the GUCs first
> and then address the 0002 business separately.

Attached is a new patch set.  The first two patches just add the new
GUCs, and the third is an attempt at providing useful values for those
GUCs via -C.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Mon, Sep 06, 2021 at 11:55:42PM +0000, Bossart, Nathan wrote:
> Attached is a new patch set.  The first two patches just add the new
> GUCs, and the third is an attempt at providing useful values for those
> GUCs via -C.

+   sprintf(buf, "%lu MB", size_mb);
+   SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
One small-ish comment about 0002: there is no need to add the unit
into the buffer set as GUC_UNIT_MB would take care of that.  The patch
looks fine.

+#ifndef WIN32
+#include <sys/mman.h>
+#endif
So, this is needed in ipci.c to check for MAP_HUGETLB.  I am not much
a fan of moving around platform-specific checks when these have
remained local to each shmem implementation.  Could it be cleaner to
add GetHugePageSize() to win32_shmem.c and make it always declared in
the SysV implementation?
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Kyotaro Horiguchi
Date:
At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> > By the way I noticed that postgres -C huge_page_size shows 0, which I
> > think should have the number used for the calculation if we show
> > huge_page_required.
> 
> I would agree with this if huge_page_size was a runtime-computed GUC,
> but since it's intended for users to explicitly request the huge page
> size, it might be slightly confusing.  Perhaps another option would be
> to create a new GUC for this.  Or maybe it's enough to note that the
> value will be changed from 0 at runtime if huge pages are supported.
> In any case, it might be best to handle this separately.

(Sorry, I was confused, but) yeah, agreed.

> > I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
> > whereas the error message above showed 148897792 bytes (142MB). So it
> > seems that something is forgotten while calculating
> > shared_memory_size.  As the consequence, launching postgres setting
> > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
> > "could not map anonymous shared memory" error.
> 
> Hm.  I'm not seeing this with the v5 patch set, so maybe I
> inadvertently fixed it already.  Can you check this again with v5?

Thanks! I confirmed that the numbers match with v5.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> +   sprintf(buf, "%lu MB", size_mb);
> +   SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
> One small-ish comment about 0002: there is no need to add the unit
> into the buffer set as GUC_UNIT_MB would take care of that.  The patch
> looks fine.

I fixed this in v7.

> +#ifndef WIN32
> +#include <sys/mman.h>
> +#endif
> So, this is needed in ipci.c to check for MAP_HUGETLB.  I am not much
> a fan of moving around platform-specific checks when these have
> remained local to each shmem implementation.  Could it be cleaner to
> add GetHugePageSize() to win32_shmem.c and make it always declared in
> the SysV implementation?

I don't know if it's really all that much cleaner, but I did it this
way in v7.  IMO it's a little weird that GetHugePageSize() doesn't
return the value from GetLargePageMinimum(), but that's what we'd need
to do to avoid setting huge_pages_required for Windows without any
platform-specific checks.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/6/21, 11:24 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
>> On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
>> > I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
>> > whereas the error message above showed 148897792 bytes (142MB). So it
>> > seems that something is forgotten while calculating
>> > shared_memory_size.  As the consequence, launching postgres setting
>> > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
>> > "could not map anonymous shared memory" error.
>>
>> Hm.  I'm not seeing this with the v5 patch set, so maybe I
>> inadvertently fixed it already.  Can you check this again with v5?
>
> Thanks! I confirmed that the numbers match with v5.

Thanks for confirming.

Nathan


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote:
> On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> +   sprintf(buf, "%lu MB", size_mb);
>> +   SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
>> One small-ish comment about 0002: there is no need to add the unit
>> into the buffer set as GUC_UNIT_MB would take care of that.  The patch
>> looks fine.
>
> I fixed this in v7.

Switched the variable name to shared_memory_size_mb for easier
grepping, moved it to a more correct location with the other read-only
GUCS, and applied 0002.  Well, 0001 here.

>> +#ifndef WIN32
>> +#include <sys/mman.h>
>> +#endif
>> So, this is needed in ipci.c to check for MAP_HUGETLB.  I am not much
>> a fan of moving around platform-specific checks when these have
>> remained local to each shmem implementation.  Could it be cleaner to
>> add GetHugePageSize() to win32_shmem.c and make it always declared in
>> the SysV implementation?
>
> I don't know if it's really all that much cleaner, but I did it this
> way in v7.  IMO it's a little weird that GetHugePageSize() doesn't
> return the value from GetLargePageMinimum(), but that's what we'd need
> to do to avoid setting huge_pages_required for Windows without any
> platform-specific checks.

Thanks.  Keeping MAP_HUGETLB within the SysV portions is an
improvement IMO.  That's subject to one's taste, perhaps.

After sleeping on it, I'd be fine to live with the logic based on the
new GUC flag called GUC_RUNTIME_COMPUTED to control if a parameter can
be looked at either an earlier or a later stage of the startup
sequences, with the later stage meaning that such parameters cannot be
checked if a server is running.  This case was originally broken
anyway, so it does not make it worse, just better.

+        This can be used on a running server for most parameters.  However,
+        the server must be shut down for some runtime-computed parameters
+        (e.g., <xref linkend="guc-huge-pages-required"/>).
Perhaps we should add a couple of extra parameters here, like
shared_memory_size, and perhaps wal_segment_size?  The list does not
have to be complete, just meaningful enough.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Fujii Masao
Date:

On 2021/09/08 12:50, Michael Paquier wrote:
> On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote:
>> On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>>> +   sprintf(buf, "%lu MB", size_mb);
>>> +   SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
>>> One small-ish comment about 0002: there is no need to add the unit
>>> into the buffer set as GUC_UNIT_MB would take care of that.  The patch
>>> looks fine.
>>
>> I fixed this in v7.
> 
> Switched the variable name to shared_memory_size_mb for easier
> grepping, moved it to a more correct location with the other read-only
> GUCS, and applied 0002.  Well, 0001 here.

Thanks for adding useful feature!

+        {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,

When reading the applied code, I found the category of shared_memory_size
is RESOURCES_MEM. Why? This seems right because the parameter is related
to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
is more proper as the category? BTW, the category of any other
PGC_INTERNAL parameters seems to be PRESET_OPTIONS.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/7/21, 8:50 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Switched the variable name to shared_memory_size_mb for easier
> grepping, moved it to a more correct location with the other read-only
> GUCS, and applied 0002.  Well, 0001 here.

Thanks!  And thanks for cleaning up the small mistake in aa37a43.

> +        This can be used on a running server for most parameters.  However,
> +        the server must be shut down for some runtime-computed parameters
> +        (e.g., <xref linkend="guc-huge-pages-required"/>).
> Perhaps we should add a couple of extra parameters here, like
> shared_memory_size, and perhaps wal_segment_size?  The list does not
> have to be complete, just meaningful enough.

Good idea.

Nathan


Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/8/21, 12:11 AM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:
> Thanks for adding useful feature!

:)

> +               {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
>
> When reading the applied code, I found the category of shared_memory_size
> is RESOURCES_MEM. Why? This seems right because the parameter is related
> to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
> is more proper as the category? BTW, the category of any other
> PGC_INTERNAL parameters seems to be PRESET_OPTIONS.

Yeah, I did wonder about this.  We're even listing it in the "Preset
Options" section in the docs.  I updated this in the new patch set,
which is attached.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Sep 08, 2021 at 04:10:41PM +0900, Fujii Masao wrote:
> +        {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
>
> When reading the applied code, I found the category of shared_memory_size
> is RESOURCES_MEM. Why? This seems right because the parameter is related
> to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
> is more proper as the category? BTW, the category of any other
> PGC_INTERNAL parameters seems to be PRESET_OPTIONS.

Yes, that's an oversight from me.  I was looking at that yesterday,
noticed some exceptions in the GUC list with things not allowed in
files and just concluded that RESOURCES_MEM should be fine, but the
docs tell a different story.  Thanks, fixed.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Sep 08, 2021 at 05:52:33PM +0000, Bossart, Nathan wrote:
> Yeah, I did wonder about this.  We're even listing it in the "Preset
> Options" section in the docs.  I updated this in the new patch set,
> which is attached.

I broke that again, so rebased as v9 attached.

FWIW, I don't have an environment at hand these days to test properly
0001, so this will have to wait a bit.  I really like the approach
taken by 0002, and it is independent of the other patch while
extending support for postgres -c to provide the correct runtime
values.  So let's wrap this part first.  No need to send a reorganized
patch set.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Thu, Sep 09, 2021 at 01:19:14PM +0900, Michael Paquier wrote:
> I broke that again, so rebased as v9 attached.

Well.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> FWIW, I don't have an environment at hand these days to test properly
> 0001, so this will have to wait a bit.  I really like the approach
> taken by 0002, and it is independent of the other patch while
> extending support for postgres -c to provide the correct runtime
> values.  So let's wrap this part first.  No need to send a reorganized
> patch set.

Sounds good.

For 0001, the biggest thing on my mind at the moment is the name of
the GUC.  "huge_pages_required" feels kind of ambiguous.  From the
name alone, it could mean either "the number of huge pages required"
or "huge pages are required for the server to run."  Also, the number
of huge pages required is not actually required if you don't want to
run the server with huge pages.  I think it might be clearer to
somehow indicate that the value is essentially the size of the main
shared memory area in terms of the huge page size, but I'm not sure
how to do that concisely.  Perhaps it is enough to just make sure the
description of "huge_pages_required" is detailed enough.

For 0002, I have two small concerns.  My first concern is that it
might be confusing to customers when the runtime GUCs cannot be
returned for a running server.  We have the note in the docs, but if
you're encountering it on the command line, it's not totally clear
what the problem is.

        $ postgres -D . -C log_min_messages
        warning
        $ postgres -D . -C shared_memory_size
        2021-09-09 18:51:21.617 UTC [7924] FATAL:  lock file "postmaster.pid" already exists
        2021-09-09 18:51:21.617 UTC [7924] HINT:  Is another postmaster (PID 7912) running in data directory
"/local/home/bossartn/pgdata"?

My other concern is that by default, viewing the runtime-computed GUCs
will also emit a LOG.

        $ postgres -D . -C shared_memory_size
        142
        2021-09-09 18:53:25.194 UTC [8006] LOG:  database system is shut down

Running these commands with log_min_messages=debug5 emits way more
information for the runtime-computed GUCs than for others, but IMO
that is alright.  However, perhaps we should adjust the logging in
0002 to improve the default user experience.  I attached an attempt at
that.

With the attached patch, trying to view a runtime-computed GUC on a
running server will look like this:

        $ postgres -D . -C shared_memory_size
        2021-09-09 21:24:21.552 UTC [6224] FATAL:  lock file "postmaster.pid" already exists
        2021-09-09 21:24:21.552 UTC [6224] DETAIL:  Runtime-computed GUC "shared_memory_size" cannot be viewed on a
runningserver.
 
        2021-09-09 21:24:21.552 UTC [6224] HINT:  Is another postmaster (PID 3628) running in data directory
"/local/home/bossartn/pgdata"?

And viewing a runtime-computed GUC on a server that is shut down will
look like this:

        $ postgres -D . -C shared_memory_size
        142

I'm not tremendously happy with the patch, but I hope that it at least
helps with the discussion.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Thu, Sep 09, 2021 at 09:53:22PM +0000, Bossart, Nathan wrote:
> For 0002, I have two small concerns.  My first concern is that it
> might be confusing to customers when the runtime GUCs cannot be
> returned for a running server.  We have the note in the docs, but if
> you're encountering it on the command line, it's not totally clear
> what the problem is.

Yeah, that's true.  There are more unlikely-to-happen errors that
could be triggered and prevent the command to work.  I have never
tried using error_context_stack in a code path as early as that, to be
honest.

> Running these commands with log_min_messages=debug5 emits way more
> information for the runtime-computed GUCs than for others, but IMO
> that is alright.  However, perhaps we should adjust the logging in
> 0002 to improve the default user experience.  I attached an attempt at
> that.

Registered bgworkers would generate a DEBUG entry, for one.

> I'm not tremendously happy with the patch, but I hope that it at least
> helps with the discussion.

As far as the behavior is documented, I'd be fine with the approach to
keep the code in its simplest shape.  I agree that the message is
confusing, still it is not wrong either as we try to query a run-time
parameter, but we need the lock.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/9/21, 7:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> As far as the behavior is documented, I'd be fine with the approach to
> keep the code in its simplest shape.  I agree that the message is
> confusing, still it is not wrong either as we try to query a run-time
> parameter, but we need the lock.

That seems alright to me.

Nathan


Re: Estimating HugePages Requirements?

From
Robert Haas
Date:
On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> For 0001, the biggest thing on my mind at the moment is the name of
> the GUC.  "huge_pages_required" feels kind of ambiguous.  From the
> name alone, it could mean either "the number of huge pages required"
> or "huge pages are required for the server to run." Also, the number
> of huge pages required is not actually required if you don't want to
> run the server with huge pages.

+1 to all of that.

> I think it might be clearer to
> somehow indicate that the value is essentially the size of the main
> shared memory area in terms of the huge page size, but I'm not sure
> how to do that concisely.  Perhaps it is enough to just make sure the
> description of "huge_pages_required" is detailed enough.

shared_memory_size_in_huge_pages? It's kinda long, but a long name
that you can understand without reading the docs is better than a
short one where you can't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/10/21, 1:02 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I think it might be clearer to
>> somehow indicate that the value is essentially the size of the main
>> shared memory area in terms of the huge page size, but I'm not sure
>> how to do that concisely.  Perhaps it is enough to just make sure the
>> description of "huge_pages_required" is detailed enough.
>
> shared_memory_size_in_huge_pages? It's kinda long, but a long name
> that you can understand without reading the docs is better than a
> short one where you can't.

I think that's an improvement.  The only other idea I have at the
moment is num_huge_pages_required_for_shared_memory.

Nathan


Re: Estimating HugePages Requirements?

From
Robert Haas
Date:
On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> > shared_memory_size_in_huge_pages? It's kinda long, but a long name
> > that you can understand without reading the docs is better than a
> > short one where you can't.
>
> I think that's an improvement.  The only other idea I have at the
> moment is num_huge_pages_required_for_shared_memory.

Hmm, that to me sounds like maybe only part of shared memory uses huge
pages and maybe we're just giving you the number required for that
part. I realize that it doesn't work that way but I don't know if
everyone will.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/13/21, 8:59 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I think that's an improvement.  The only other idea I have at the
>> moment is num_huge_pages_required_for_shared_memory.
>
> Hmm, that to me sounds like maybe only part of shared memory uses huge
> pages and maybe we're just giving you the number required for that
> part. I realize that it doesn't work that way but I don't know if
> everyone will.

Yeah, I agree.  What about
huge_pages_needed_for_shared_memory_size or
huge_pages_needed_for_main_shared_memory?  I'm still not stoked about
using "required" or "needed" in the name, as it sounds like huge pages
must be allocated for the server to run, which is only true if
huge_pages=on.  I haven't thought of a better word to use, though.

Nathan


Re: Estimating HugePages Requirements?

From
Robert Haas
Date:
On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Yeah, I agree.  What about
> huge_pages_needed_for_shared_memory_size or
> huge_pages_needed_for_main_shared_memory?  I'm still not stoked about
> using "required" or "needed" in the name, as it sounds like huge pages
> must be allocated for the server to run, which is only true if
> huge_pages=on.  I haven't thought of a better word to use, though.

I prefer the first of those to the second. I don't find it
particularly better or worse than my previous suggestion of
shared_memory_size_in_huge_pages.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Estimating HugePages Requirements?

From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes:
> Yeah, I agree.  What about
> huge_pages_needed_for_shared_memory_size or
> huge_pages_needed_for_main_shared_memory?

Seems like "huge_pages_needed_for_shared_memory" would be sufficient.

            regards, tom lane



Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Mon, Sep 13, 2021 at 04:20:00PM -0400, Robert Haas wrote:
> On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> Yeah, I agree.  What about
>> huge_pages_needed_for_shared_memory_size or
>> huge_pages_needed_for_main_shared_memory?  I'm still not stoked about
>> using "required" or "needed" in the name, as it sounds like huge pages
>> must be allocated for the server to run, which is only true if
>> huge_pages=on.  I haven't thought of a better word to use, though.
>
> I prefer the first of those to the second. I don't find it
> particularly better or worse than my previous suggestion of
> shared_memory_size_in_huge_pages.

I am not particularly fond of the use "needed" in this context, so I'd
be fine with your suggestion of "shared_memory_size_in_huge_pages.
Some other ideas I could think of:
- shared_memory_size_as_huge_pages
- huge_pages_for_shared_memory_size

Having shared_memory_size in the GUC name is kind of appealing though
in terms of grepping, and one gets the relationship with
shared_memory_size immediately.  If the consensus is
huge_pages_needed_for_shared_memory_size, I won't fight it, but IMO
that's too long.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Seems like "huge_pages_needed_for_shared_memory" would be sufficient.

I think we are down to either shared_memory_size_in_huge_pages or
huge_pages_needed_for_shared_memory.  Robert's argument against
huge_pages_needed_for_shared_memory was that it might sound like only
part of shared memory uses huge pages and we're only giving the number
required for that.  Speaking of which, isn't that technically true?
For shared_memory_size_in_huge_pages, the intent is to make it sound
like we are providing shared_memory_size in terms of the huge page
size, but I think it could also be interpreted as "the amount of
shared memory that is currently stored in huge pages."

I personally lean towards huge_pages_needed_for_shared_memory because
it feels the most clear and direct to me.  I'm not vehemently opposed
to shared_memory_size_in_huge_pages, though.  I don't think either one
is too misleading.

Nathan


Re: Estimating HugePages Requirements?

From
Kyotaro Horiguchi
Date:
At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> > Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
> 
> I think we are down to either shared_memory_size_in_huge_pages or
> huge_pages_needed_for_shared_memory.  Robert's argument against
> huge_pages_needed_for_shared_memory was that it might sound like only
> part of shared memory uses huge pages and we're only giving the number
> required for that.  Speaking of which, isn't that technically true?
> For shared_memory_size_in_huge_pages, the intent is to make it sound
> like we are providing shared_memory_size in terms of the huge page
> size, but I think it could also be interpreted as "the amount of
> shared memory that is currently stored in huge pages."
> 
> I personally lean towards huge_pages_needed_for_shared_memory because
> it feels the most clear and direct to me.  I'm not vehemently opposed
> to shared_memory_size_in_huge_pages, though.  I don't think either one
> is too misleading.

I like 'in' slightly than 'for' in this context. I stand by Michael
that that name looks somewhat too long especially considering that
that name won't be completed on shell command lines, but won't fight
it, too.  On the other hand the full-spelled name can be thought as
one can spell it out from memory easily than a name halfway shortened.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/13/21, 5:49 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
>> On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> > Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
>>
>> I think we are down to either shared_memory_size_in_huge_pages or
>> huge_pages_needed_for_shared_memory.  Robert's argument against
>> huge_pages_needed_for_shared_memory was that it might sound like only
>> part of shared memory uses huge pages and we're only giving the number
>> required for that.  Speaking of which, isn't that technically true?
>> For shared_memory_size_in_huge_pages, the intent is to make it sound
>> like we are providing shared_memory_size in terms of the huge page
>> size, but I think it could also be interpreted as "the amount of
>> shared memory that is currently stored in huge pages."
>>
>> I personally lean towards huge_pages_needed_for_shared_memory because
>> it feels the most clear and direct to me.  I'm not vehemently opposed
>> to shared_memory_size_in_huge_pages, though.  I don't think either one
>> is too misleading.
>
> I like 'in' slightly than 'for' in this context. I stand by Michael
> that that name looks somewhat too long especially considering that
> that name won't be completed on shell command lines, but won't fight
> it, too.  On the other hand the full-spelled name can be thought as
> one can spell it out from memory easily than a name halfway shortened.

I think I see more support for shared_memory_size_in_huge_pages than
for huge_pages_needed_for_shared_memory at the moment.  I'll update
the patch set in the next day or two to use
shared_memory_size_in_huge_pages unless something changes in the
meantime.

Nathan


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Sep 14, 2021 at 06:00:44PM +0000, Bossart, Nathan wrote:
> I think I see more support for shared_memory_size_in_huge_pages than
> for huge_pages_needed_for_shared_memory at the moment.  I'll update
> the patch set in the next day or two to use
> shared_memory_size_in_huge_pages unless something changes in the
> meantime.

I have been looking at the patch to add the new GUC flag and the new
sequence for postgres -C, and we could have some TAP tests.  There
were two places that made sense to me: pg_checksums/t/002_actions.pl
and recovery/t/017_shm.pl.  I have chosen the former as it has
coverage across more platforms, and used data_checksums for this
purpose, with an extra positive test to check for the case where a GUC
can be queried while the server is running.

There are four parameters that are updated here:
- shared_memory_size
- wal_segment_size
- data_checksums
- data_directory_mode
That looks sensible, after looking at the full set of GUCs.

Attached is a refreshed patch (commit message is the same as v9 for
now), with some minor tweaks and the tests.

Thoughts?
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/14/21, 8:06 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Attached is a refreshed patch (commit message is the same as v9 for
> now), with some minor tweaks and the tests.
>
> Thoughts?

LGTM

+        This can be used on a running server for most parameters.  However,
+        the server must be shut down for some runtime-computed parameters
+        (e.g., <xref linkend="guc-shared-memory-size"/>, and
+        <xref linkend="guc-wal-segment-size"/>).

nitpick: I think you can remove the comma before the "and" in the list
of examples.

Nathan


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Sep 15, 2021 at 10:31:20PM +0000, Bossart, Nathan wrote:
> +        This can be used on a running server for most parameters.  However,
> +        the server must be shut down for some runtime-computed parameters
> +        (e.g., <xref linkend="guc-shared-memory-size"/>, and
> +        <xref linkend="guc-wal-segment-size"/>).
>
> nitpick: I think you can remove the comma before the "and" in the list
> of examples.

Fixed that, and applied.  Could you rebase the last patch with the
name suggested for the last GUC, including the docs?  It looks like we
are heading for shared_memory_size_in_huge_pages.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/15/21, 7:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Fixed that, and applied.  Could you rebase the last patch with the
> name suggested for the last GUC, including the docs?  It looks like we
> are heading for shared_memory_size_in_huge_pages.

Thanks!  And done.

For the huge pages setup documentation, I considered sending stderr to
/dev/null to eliminate the LOG from the output, but I opted against
that.  That would've looked like this:

        postgres -D $PGDATA -C shared_memory_size_in_huge_pages 2> /dev/null

Otherwise, there aren't any significant changes in this version of the
patch besides the name change.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/16/21, 10:17 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:
> + * and the hugepage-related mmap flags to use into *mmap_flags.  If huge pages
> + * is not supported, *hugepagesize and *mmap_flags will be set to 0.
>
> nitpick: *are* not supported, as you say elsewhere.

Updated.  I think I originally chose "is" because I was referring to
Huge Pages as a singular feature, but that sounds a bit awkward, and I
don't think there's any substantial difference either way.

> +                       gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
>
> Maybe this was already discussed, but "main" could be misleading.
>
> To me that sounds like there might be huge pages needed for something other
> than the "main" area and the reported value might turn out to be inadequate,
> (which is exactly the issue these patch are trying to address).
>
> In particular, this sounds like it's just going to report
> shared_buffers/huge_page_size (since shared buffers is usually the "main" use
> of shared memory) - rather than reporting the size of the entire shared memory
> in units of huge_pages.

I'm not sure I agree on this one.  The documentation for huge_pages
[0] and shared_memory_type [1] uses the same phrasing multiple times,
and the new shared_memory_size GUC uses it as well [2].  I don't see
anything in the documentation that suggests that shared_buffers is the
only thing in the main shared memory area, and the documentation for
setting up huge pages makes no mention of any extra memory that needs
to be considered, either.

Furthermore, I'm not sure what else we'd call it.  I don't think it's
accurate to suggest that it's the entirety of shared memory for the
server, as it's possible to dynamically allocate more as needed (which
IIUC won't use any explicitly allocated huge pages).

Nathan

[0] https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-HUGE-PAGES
[1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE
[2] https://www.postgresql.org/docs/devel/runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE


Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Thu, Sep 16, 2021 at 09:26:56PM +0000, Bossart, Nathan wrote:
> I'm not sure I agree on this one.  The documentation for huge_pages
> [0] and shared_memory_type [1] uses the same phrasing multiple times,
> and the new shared_memory_size GUC uses it as well [2].  I don't see
> anything in the documentation that suggests that shared_buffers is the
> only thing in the main shared memory area, and the documentation for
> setting up huge pages makes no mention of any extra memory that needs
> to be considered, either.

Looks rather sane to me, FWIW.  I have not tested on Linux properly
yet (not tempted to take my bets on the buildfarm on a Friday,
either), but I should be able to handle that at the beginning of next
week.

+   GetHugePageSize(&hp_size, &unused);
+   if (hp_size != 0)
I'd rather change GetHugePageSize() to be able to accept NULL for the
parameter values, rather than declaring such variables.

+    To determine the number of huge pages needed, use the
+    <command>postgres</command> command to see the value of
+    <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
We may want to say as well here that the server should be offline?
It would not hurt to duplicate this information with
postgres-ref.sgml.

+        This setting is supported only on Linux.  It is always set to
Nit: This paragraph is missing two <productname>s for Linux.  The docs
are random about that, but these are new entries.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/16/21, 7:21 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> +   GetHugePageSize(&hp_size, &unused);
> +   if (hp_size != 0)
> I'd rather change GetHugePageSize() to be able to accept NULL for the
> parameter values, rather than declaring such variables.

Done.

> +    To determine the number of huge pages needed, use the
> +    <command>postgres</command> command to see the value of
> +    <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
> We may want to say as well here that the server should be offline?
> It would not hurt to duplicate this information with
> postgres-ref.sgml.

Done.

> +        This setting is supported only on Linux.  It is always set to
> Nit: This paragraph is missing two <productname>s for Linux.  The docs
> are random about that, but these are new entries.

Done.

Nathan


Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
Should we also initialize the shared memory GUCs in bootstrap and
single-user mode?  I think I missed this in bd17880.

Nathan

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..4c4cf44871 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -324,6 +324,12 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)

     InitializeMaxBackends();

+    /*
+     * Initialize runtime-computed GUCs that depend on the amount of shared
+     * memory required.
+     */
+    InitializeShmemGUCs();
+
     CreateSharedMemoryAndSemaphores();

     /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0775abe35d..cae0b079b9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3978,6 +3978,12 @@ PostgresSingleUserMain(int argc, char *argv[],
     /* Initialize MaxBackends */
     InitializeMaxBackends();

+    /*
+     * Initialize runtime-computed GUCs that depend on the amount of shared
+     * memory required.
+     */
+    InitializeShmemGUCs();
+
     CreateSharedMemoryAndSemaphores();

     /*


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Fri, Sep 17, 2021 at 04:31:44PM +0000, Bossart, Nathan wrote:
> Done.

Thanks.  I have gone through the last patch this morning, did some
tests on all the platforms I have at hand (including Linux) and
finished by applying it after doing some small tweaks.  First, I have
finished by extending GetHugePageSize() to accept NULL for its first
argument hugepagesize.  A second thing was in the docs, where it is
still useful IMO to keep the reference to /proc/meminfo and
/sys/kernel/mm/hugepages to let users know how the system impacts the
calculation of the new GUC.

Let's see what the buildfarm thinks about it.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote:
> Should we also initialize the shared memory GUCs in bootstrap and
> single-user mode?  I think I missed this in bd17880.

Why would we need that for the bootstrap mode?

While looking at the patch for shared_memory_size, I have looked at
those code paths to note that some of the runtime GUCs would be set
thanks to the load of the control file, but supporting this case
sounded rather limited to me for --single when it came to shared
memory and huge page estimation and we don't load
shared_preload_libraries in this context either, which could lead to
wrong estimations.  Anyway, I am not going to fight hard if people
would like that for the --single mode, even if it may lead to an
underestimation of the shmem allocated.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/20/21, 6:48 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Thanks.  I have gone through the last patch this morning, did some
> tests on all the platforms I have at hand (including Linux) and
> finished by applying it after doing some small tweaks.  First, I have 
> finished by extending GetHugePageSize() to accept NULL for its first
> argument hugepagesize.  A second thing was in the docs, where it is
> still useful IMO to keep the reference to /proc/meminfo and
> /sys/kernel/mm/hugepages to let users know how the system impacts the
> calculation of the new GUC.
>
> Let's see what the buildfarm thinks about it.

Thanks!

Nathan


Re: Estimating HugePages Requirements?

From
"Bossart, Nathan"
Date:
On 9/20/21, 7:29 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote:
>> Should we also initialize the shared memory GUCs in bootstrap and
>> single-user mode?  I think I missed this in bd17880.
>
> Why would we need that for the bootstrap mode?
>
> While looking at the patch for shared_memory_size, I have looked at
> those code paths to note that some of the runtime GUCs would be set
> thanks to the load of the control file, but supporting this case
> sounded rather limited to me for --single when it came to shared
> memory and huge page estimation and we don't load
> shared_preload_libraries in this context either, which could lead to
> wrong estimations.  Anyway, I am not going to fight hard if people
> would like that for the --single mode, even if it may lead to an
> underestimation of the shmem allocated.

I was looking at this from the standpoint of keeping the startup steps
consistent between the modes.  Looking again, I can't think of
a strong reason to add it to bootstrap mode.  I think the case for
adding it to single-user mode is a bit stronger, as commands like
"SHOW shared_memory_size;" currently return 0.  I lean in favor of
adding it for single-user mode, but it's probably fine either way.

Nathan


Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Sep 21, 2021 at 04:06:38PM +0000, Bossart, Nathan wrote:
> I was looking at this from the standpoint of keeping the startup steps
> consistent between the modes.  Looking again, I can't think of
> a strong reason to add it to bootstrap mode.  I think the case for
> adding it to single-user mode is a bit stronger, as commands like
> "SHOW shared_memory_size;" currently return 0.  I lean in favor of
> adding it for single-user mode, but it's probably fine either way.

I am not sure either as that's a tradeoff between an underestimation
and no information.  The argument of consistency indeed matters.
Let's see if others have any opinion to share on this point.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Robert Haas
Date:
On Tue, Sep 21, 2021 at 11:53 PM Michael Paquier <michael@paquier.xyz> wrote:
> I am not sure either as that's a tradeoff between an underestimation
> and no information.  The argument of consistency indeed matters.
> Let's see if others have any opinion to share on this point.

Well, if we think the information won't be safe to use, it's better to
report nothing than a wrong value, I think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:
On Thu, Sep 9, 2021 at 11:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > FWIW, I don't have an environment at hand these days to test properly
> > 0001, so this will have to wait a bit.  I really like the approach
> > taken by 0002, and it is independent of the other patch while
> > extending support for postgres -c to provide the correct runtime
> > values.  So let's wrap this part first.  No need to send a reorganized
> > patch set.
>
> Sounds good.
>
> For 0001, the biggest thing on my mind at the moment is the name of
> the GUC.  "huge_pages_required" feels kind of ambiguous.  From the
> name alone, it could mean either "the number of huge pages required"
> or "huge pages are required for the server to run."  Also, the number
> of huge pages required is not actually required if you don't want to
> run the server with huge pages.  I think it might be clearer to
> somehow indicate that the value is essentially the size of the main
> shared memory area in terms of the huge page size, but I'm not sure
> how to do that concisely.  Perhaps it is enough to just make sure the
> description of "huge_pages_required" is detailed enough.
>
> For 0002, I have two small concerns.  My first concern is that it
> might be confusing to customers when the runtime GUCs cannot be
> returned for a running server.  We have the note in the docs, but if
> you're encountering it on the command line, it's not totally clear
> what the problem is.
>
>         $ postgres -D . -C log_min_messages
>         warning
>         $ postgres -D . -C shared_memory_size
>         2021-09-09 18:51:21.617 UTC [7924] FATAL:  lock file "postmaster.pid" already exists
>         2021-09-09 18:51:21.617 UTC [7924] HINT:  Is another postmaster (PID 7912) running in data directory
"/local/home/bossartn/pgdata"?
>
> My other concern is that by default, viewing the runtime-computed GUCs
> will also emit a LOG.
>
>         $ postgres -D . -C shared_memory_size
>         142
>         2021-09-09 18:53:25.194 UTC [8006] LOG:  database system is shut down
>
> Running these commands with log_min_messages=debug5 emits way more
> information for the runtime-computed GUCs than for others, but IMO
> that is alright.  However, perhaps we should adjust the logging in
> 0002 to improve the default user experience.  I attached an attempt at
> that.
>
> With the attached patch, trying to view a runtime-computed GUC on a
> running server will look like this:
>
>         $ postgres -D . -C shared_memory_size
>         2021-09-09 21:24:21.552 UTC [6224] FATAL:  lock file "postmaster.pid" already exists
>         2021-09-09 21:24:21.552 UTC [6224] DETAIL:  Runtime-computed GUC "shared_memory_size" cannot be viewed on a
runningserver.
 
>         2021-09-09 21:24:21.552 UTC [6224] HINT:  Is another postmaster (PID 3628) running in data directory
"/local/home/bossartn/pgdata"?
>
> And viewing a runtime-computed GUC on a server that is shut down will
> look like this:
>
>         $ postgres -D . -C shared_memory_size
>         142

Nothing fixing this ended up actually getting committed, right? That
is, we still get the extra log output?

And in fact, the command documented on
https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
actually produce the output that the docs show, it also shows the log
line, in the default config? If we can't fix the extra logging we
should at least have our docs represent reality -- maybe by adding a
"2>/dev/null" entry? But it'd be better to have it not output that log
in the first place...

(Of course what I'd really want is to be able to run it on a cluster
that's running, but that was discussed downthread so I'm not bringing
that one up for changes now)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
> Nothing fixing this ended up actually getting committed, right? That
> is, we still get the extra log output?

Correct.

> And in fact, the command documented on
> https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
> actually produce the output that the docs show, it also shows the log
> line, in the default config? If we can't fix the extra logging we
> should at least have our docs represent reality -- maybe by adding a
> "2>/dev/null" entry? But it'd be better to have it not output that log
> in the first place...

I attached a patch to adjust the documentation for now.  This apparently
crossed my mind earlier [0], but I didn't follow through with it for some
reason.

> (Of course what I'd really want is to be able to run it on a cluster
> that's running, but that was discussed downthread so I'm not bringing
> that one up for changes now)

I think it is worth revisiting the extra logging and the ability to view
runtime-computed GUCs on a running server.  Should this be an open item for
v15, or do you think it's alright to leave it for the v16 development
cycle?

[0] https://postgr.es/m/C45224E1-29C8-414C-A8E6-B718C07ACB94%40amazon.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote:
> On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
>> And in fact, the command documented on
>> https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
>> actually produce the output that the docs show, it also shows the log
>> line, in the default config? If we can't fix the extra logging we
>> should at least have our docs represent reality -- maybe by adding a
>> "2>/dev/null" entry? But it'd be better to have it not output that log
>> in the first place...
>
> I attached a patch to adjust the documentation for now.  This apparently
> crossed my mind earlier [0], but I didn't follow through with it for some
> reason.

Another thing that we can add is -c log_min_messages=fatal, but my
method is more complicated than a simple redirection, of course :)

>> (Of course what I'd really want is to be able to run it on a cluster
>> that's running, but that was discussed downthread so I'm not bringing
>> that one up for changes now)
>
> I think it is worth revisiting the extra logging and the ability to view
> runtime-computed GUCs on a running server.  Should this be an open item for
> v15, or do you think it's alright to leave it for the v16 development
> cycle?

Well, this is a completely new problem as it opens the door of
potential concurrent access issues with the data directory lock file
while reading values from the control file.  And that's not mandatory
to be able to get those estimations without having to allocate a large
chunk of memory, which was the primary goal discussed upthread as far
as I recall.  So I would leave that as an item to potentially tackle
in future versions.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:
On Tue, Mar 15, 2022 at 3:41 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote:
> > On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
> >> And in fact, the command documented on
> >> https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
> >> actually produce the output that the docs show, it also shows the log
> >> line, in the default config? If we can't fix the extra logging we
> >> should at least have our docs represent reality -- maybe by adding a
> >> "2>/dev/null" entry? But it'd be better to have it not output that log
> >> in the first place...
> >
> > I attached a patch to adjust the documentation for now.  This apparently
> > crossed my mind earlier [0], but I didn't follow through with it for some
> > reason.
>
> Another thing that we can add is -c log_min_messages=fatal, but my
> method is more complicated than a simple redirection, of course :)

Either does work, but yours has more characters :)


> >> (Of course what I'd really want is to be able to run it on a cluster
> >> that's running, but that was discussed downthread so I'm not bringing
> >> that one up for changes now)
> >
> > I think it is worth revisiting the extra logging and the ability to view
> > runtime-computed GUCs on a running server.  Should this be an open item for
> > v15, or do you think it's alright to leave it for the v16 development
> > cycle?
>
> Well, this is a completely new problem as it opens the door of
> potential concurrent access issues with the data directory lock file
> while reading values from the control file.  And that's not mandatory
> to be able to get those estimations without having to allocate a large
> chunk of memory, which was the primary goal discussed upthread as far
> as I recall.  So I would leave that as an item to potentially tackle
> in future versions.

I think we're talking about two different things here.

I think the "avoid extra logging" would be worth seeing if we can
address for 15.

The "able to run on a live cluster" seems a lot bigger and more scary
and not 15 material.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Tue, Mar 15, 2022 at 11:02:37PM +0100, Magnus Hagander wrote:
> I think we're talking about two different things here.
> 
> I think the "avoid extra logging" would be worth seeing if we can
> address for 15.

A simple approach could be to just set log_min_messages to PANIC before
exiting.  I've attached a patch for this.  With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.

> The "able to run on a live cluster" seems a lot bigger and more scary
> and not 15 material.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
> A simple approach could be to just set log_min_messages to PANIC before
> exiting.  I've attached a patch for this.  With this patch, we'll still see
> a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
> running server, and there will be no extra output as long as the user sets
> log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
> comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
> extra output as long as the user sets log_min_messages to DEBUG2 or higher.

I created a commitfest entry for this:

    https://commitfest.postgresql.org/38/3596/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
> A simple approach could be to just set log_min_messages to PANIC before
> exiting.  I've attached a patch for this.  With this patch, we'll still see
> a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
> running server, and there will be no extra output as long as the user sets
> log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
> comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
> extra output as long as the user sets log_min_messages to DEBUG2 or higher.

>          puts(config_val ? config_val : "");
> +
> +        /* don't emit shutdown messages */
> +        SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE);
> +
>          ExitPostmaster(0);

That's fancy, but I don't like that much.  And this would not protect
either against any messages generated before this code path, either,
even if that should be enough for the current HEAD .

My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway.  So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day.  Does that sound fine?
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Wed, Mar 23, 2022 at 03:25:48PM +0900, Michael Paquier wrote:
> My solution for the docs is perhaps too confusing for the end-user,
> and we are talking about a Linux-only thing here anyway.  So, at the
> end, I am tempted to just add the "2> /dev/null" as suggested upthread
> by Nathan and call it a day.

This still sounds like the best way to go for now, so done this way as
of bbd4951.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:
On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
> > A simple approach could be to just set log_min_messages to PANIC before
> > exiting.  I've attached a patch for this.  With this patch, we'll still see
> > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
> > running server, and there will be no extra output as long as the user sets
> > log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
> > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
> > extra output as long as the user sets log_min_messages to DEBUG2 or higher.
>
> >               puts(config_val ? config_val : "");
> > +
> > +             /* don't emit shutdown messages */
> > +             SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE);
> > +
> >               ExitPostmaster(0);
>
> That's fancy, but I don't like that much.  And this would not protect
> either against any messages generated before this code path, either,

But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.

That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.



> My solution for the docs is perhaps too confusing for the end-user,
> and we are talking about a Linux-only thing here anyway.  So, at the
> end, I am tempted to just add the "2> /dev/null" as suggested upthread
> by Nathan and call it a day.  Does that sound fine?

What would be a linux only thing?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
> On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
>> > A simple approach could be to just set log_min_messages to PANIC before
>> > exiting.  I've attached a patch for this.  With this patch, we'll still see
>> > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
>> > running server, and there will be no extra output as long as the user sets
>> > log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
>> > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
>> > extra output as long as the user sets log_min_messages to DEBUG2 or higher.
>>
>> >               puts(config_val ? config_val : "");
>> > +
>> > +             /* don't emit shutdown messages */
>> > +             SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE);
>> > +
>> >               ExitPostmaster(0);
>>
>> That's fancy, but I don't like that much.  And this would not protect
>> either against any messages generated before this code path, either,
> 
> But neither would the suggestion of redirecting stderr to /dev/null.
> In fact, doing the redirect it will *also* throw away any FATAL that
> happens. In fact, using the 2>/dev/null method, we *also* remove the
> message that says there's another postmaster running in this
> directory, which is strictly worse than the override of
> log_min_messages.
> 
> That said, the redirect can be removed without recompiling postgres,
> so it is probably still hte better choice as a temporary workaround.
> But we should really look into getting a better solution in place once
> we start on 16.

A couple of other options to consider:

1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'.
We might need some special logic for handling the case where the user is
inspecting the log_min_messages parameter.  With this approach, you'd
probably never get extra output unless something was wrong (e.g., database
already running when inspecting a runtime-computed GUC).  Also, this would
silence any extra output that you might see today with non-runtime-computed
GUCs.

2) Add some way to skip just the shutdown message (e.g., a variable set
when output_config_variable is true).  With this approach, you wouldn't get
extra output by default, but you still might if log_min_messages is set to
something like DEBUG3.  This wouldn't impact any extra output that you see
today with non-runtime-computed GUCs.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Thu, Mar 24, 2022 at 01:31:08PM -0700, Nathan Bossart wrote:
> A couple of other options to consider:
> 
> 1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'.
> We might need some special logic for handling the case where the user is
> inspecting the log_min_messages parameter.  With this approach, you'd
> probably never get extra output unless something was wrong (e.g., database
> already running when inspecting a runtime-computed GUC).  Also, this would
> silence any extra output that you might see today with non-runtime-computed
> GUCs.
> 
> 2) Add some way to skip just the shutdown message (e.g., a variable set
> when output_config_variable is true).  With this approach, you wouldn't get
> extra output by default, but you still might if log_min_messages is set to
> something like DEBUG3.  This wouldn't impact any extra output that you see
> today with non-runtime-computed GUCs.

I've attached a first attempt at option 1.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
> But neither would the suggestion of redirecting stderr to /dev/null.
> In fact, doing the redirect it will *also* throw away any FATAL that
> happens. In fact, using the 2>/dev/null method, we *also* remove the
> message that says there's another postmaster running in this
> directory, which is strictly worse than the override of
> log_min_messages.

Well, we could also tweak more the command with a redirection of
stderr to a log file or such, and tell to look at it for errors.

> That said, the redirect can be removed without recompiling postgres,
> so it is probably still hte better choice as a temporary workaround.
> But we should really look into getting a better solution in place once
> we start on 16.

But do we really need a better or more invasive solution for already
running servers though?  A SHOW command would be able to do the work
already in this case.  This would lack consistency compared to the
offline case, but we are not without option either.  That leaves the
case where the server is running, has allocated memory but is not
ready to accept connections, like crash recovery, still this use case
looks rather thin to me.

>> My solution for the docs is perhaps too confusing for the end-user,
>> and we are talking about a Linux-only thing here anyway.  So, at the
>> end, I am tempted to just add the "2> /dev/null" as suggested upthread
>> by Nathan and call it a day.  Does that sound fine?
>
> What would be a linux only thing?

Perhaps not at some point in the future.  Now that's under a section
of the docs only for Linux.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:


On Wed, Apr 20, 2022, 00:12 Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
> But neither would the suggestion of redirecting stderr to /dev/null.
> In fact, doing the redirect it will *also* throw away any FATAL that
> happens. In fact, using the 2>/dev/null method, we *also* remove the
> message that says there's another postmaster running in this
> directory, which is strictly worse than the override of
> log_min_messages.

Well, we could also tweak more the command with a redirection of
stderr to a log file or such, and tell to look at it for errors.

That's would be a pretty terrible ux though. 



> That said, the redirect can be removed without recompiling postgres,
> so it is probably still hte better choice as a temporary workaround.
> But we should really look into getting a better solution in place once
> we start on 16.

But do we really need a better or more invasive solution for already
running servers though?  A SHOW command would be able to do the work
already in this case.  This would lack consistency compared to the
offline case, but we are not without option either.  That leaves the
case where the server is running, has allocated memory but is not
ready to accept connections, like crash recovery, still this use case
looks rather thin to me.


I agree that thats a very narrow use case. And I'm nog sure the use case of a running server is even that important here - it's really the offline one that's important. Or rather, the really compelling one is when there is a server running but I want to check the value offline because it will change. SHOW doesn't help there because it shows the value based on the currently running configuration, not the new one after a restart. 

I don't agree that the redirect is a solution. It's a workaround. 


>> My solution for the docs is perhaps too confusing for the end-user,
>> and we are talking about a Linux-only thing here anyway.  So, at the
>> end, I am tempted to just add the "2> /dev/null" as suggested upthread
>> by Nathan and call it a day.  Does that sound fine?
>
> What would be a linux only thing?

Perhaps not at some point in the future.  Now that's under a section
of the docs only for Linux.


Hmm. So what's the solution on windows? I guess maybe it's not as important there because there is no limit on huge pages, but generally getting the expected shared memory usage might be useful? Just significantly less important. 

/Magnus 

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote:
> I agree that thats a very narrow use case. And I'm not sure the use case of
> a running server is even that important here - it's really the offline one
> that's important. Or rather, the really compelling one is when there is a
> server running but I want to check the value offline because it will
> change. SHOW doesn't help there because it shows the value based on the
> currently running configuration, not the new one after a restart.

You mean the case of a server where one would directly change
postgresql.conf on a running server, and use postgres -C to see how
much the kernel settings need to be changed before the restart?

> Hmm. So what's the solution on windows? I guess maybe it's not as important
> there because there is no limit on huge pages, but generally getting the
> expected shared memory usage might be useful? Just significantly less
> important.

Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]), do we?  If that were the case,
we'd have a use case for huge_page_size, additionally.

That's the case where shared_memory_size_in_huge_pages comes in
handy, as much as does huge_page_size, and note that
shared_memory_size works on WIN32.

[1]: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Magnus Hagander
Date:
On Mon, Apr 25, 2022 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote:
> I agree that thats a very narrow use case. And I'm not sure the use case of
> a running server is even that important here - it's really the offline one
> that's important. Or rather, the really compelling one is when there is a
> server running but I want to check the value offline because it will
> change. SHOW doesn't help there because it shows the value based on the
> currently running configuration, not the new one after a restart.

You mean the case of a server where one would directly change
postgresql.conf on a running server, and use postgres -C to see how
much the kernel settings need to be changed before the restart?

Yes.

AIUI that was the original use-case for this feature. It certainly was for me :)



> Hmm. So what's the solution on windows? I guess maybe it's not as important
> there because there is no limit on huge pages, but generally getting the
> expected shared memory usage might be useful? Just significantly less
> important.

Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]), do we?  If that were the case,
we'd have a use case for huge_page_size, additionally.

Right, for this one in particular -- that's what I meant with my comment about there not being a limit. But this feature works for other settings as well, not just the huge pages one.  Exactly what the use-cases are can vary, but surely they would have the same problems wrt redirects?


--

Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Mon, Apr 25, 2022 at 04:55:25PM +0200, Magnus Hagander wrote:
> AIUI that was the original use-case for this feature. It certainly was for
> me :)

Perhaps we'd be fine with relaxing the requirements here knowing that
the control file should never be larger than PG_CONTROL_MAX_SAFE_SIZE
(aka the read should be atomic so it could be made lockless).  At the
end of the day, to be absolutely correct in the shmem size estimation,
I think that we are going to need what's proposed here or the sizing
may not be right depending on how extensions adjust GUCs after they
load their _PG_init():
https://www.postgresql.org/message-id/20220419154658.GA2487941@nathanxps13

That's a bit independent, but not completely unrelated either
depending on how exact you want your number of estimated huge pages to
be.  Just wanted to mention it.

>> Contrary to Linux, we don't need to care about the number of large
>> pages that are necessary because there is no equivalent of
>> vm.nr_hugepages on Windows (see [1]), do we?  If that were the case,
>> we'd have a use case for huge_page_size, additionally.
>
> Right, for this one in particular -- that's what I meant with my comment
> about there not being a limit. But this feature works for other settings as
> well, not just the huge pages one.  Exactly what the use-cases are can
> vary, but surely they would have the same problems wrt redirects?

Yes, the redirection issue would apply to all the run-time GUCs.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote:
> Yes, the redirection issue would apply to all the run-time GUCs.

Should this be tracked as an open item for v15?  There was another recent
report about the extra log output [0].

[0] https://www.postgresql.org/message-id/YnARlI5nvbziobR4%40momjian.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Fri, May 06, 2022 at 10:13:18AM -0700, Nathan Bossart wrote:
> On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote:
>> Yes, the redirection issue would apply to all the run-time GUCs.
>
> Should this be tracked as an open item for v15?  There was another recent
> report about the extra log output [0].

That makes it for two complaints on two separate threads.  So an open
item seems adapted to adjust this behavior.

I have looked at the patch posted at [1], and I don't quite understand
why you need the extra dance with log_min_messages.  Why don't you
just set the GUC at the end of the code path in PostmasterMain() where
we print non-runtime-computed parameters?  I am not really worrying
about users deciding to set log_min_messages to PANIC in
postgresql.conf when it comes to postgres -C, TBH, as they'd miss the
FATAL messages if the command is attempted on a server already
starting.

Per se the attached.

[1]: https://www.postgresql.org/message-id/20220328173503.GA137769@nathanxps13
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Mon, May 09, 2022 at 03:53:24PM +0900, Michael Paquier wrote:
> I have looked at the patch posted at [1], and I don't quite understand
> why you need the extra dance with log_min_messages.  Why don't you
> just set the GUC at the end of the code path in PostmasterMain() where
> we print non-runtime-computed parameters?

The log_min_messages dance avoids extra output when inspecting
non-runtime-computed GUCs, like this:

    ~/pgdata$ postgres -D . -C log_min_messages -c log_min_messages=debug5
    debug5
    2022-05-10 09:06:04.728 PDT [3715607] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2022-05-10 09:06:04.728 PDT [3715607] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2022-05-10 09:06:04.728 PDT [3715607] DEBUG:  proc_exit(0): 0 callbacks to make
    2022-05-10 09:06:04.728 PDT [3715607] DEBUG:  exit(0)

AFAICT you need to set log_min_messages to at least DEBUG3 to see extra
output for the non-runtime-computed GUCs, so it might not be worth the
added complexity.

> I am not really worrying
> about users deciding to set log_min_messages to PANIC in
> postgresql.conf when it comes to postgres -C, TBH, as they'd miss the
> FATAL messages if the command is attempted on a server already
> starting.

I don't have a strong opinion on this one.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Estimating HugePages Requirements?

From
Michael Paquier
Date:
On Tue, May 10, 2022 at 09:12:49AM -0700, Nathan Bossart wrote:
> AFAICT you need to set log_min_messages to at least DEBUG3 to see extra
> output for the non-runtime-computed GUCs, so it might not be worth the
> added complexity.

This set of messages is showing up for ages with zero complaints from
the field AFAIK, and nobody would use this level of logging except
developers.  One thing that overriding log_min_messages to FATAL does,
however, is to not show anymore those debug3 messages when querying a
runtime-computed GUC, but that's the kind of things we'd hide.  Your
patch would hide those entries in both cases.  Perhaps we could do
that, but at the end, I don't really see any need to complicate this
code path more than necessary, and this is enough to silence the logs
in the cases we care about basically all the time, even if the log
levels are reduced a bit on a given cluster.  Hence, I have applied
the simplest solution to just enforce a log_min_messages=FATAL when
requesting a runtime GUC.
--
Michael

Attachment

Re: Estimating HugePages Requirements?

From
Nathan Bossart
Date:
On Wed, May 11, 2022 at 02:34:25PM +0900, Michael Paquier wrote:
> Hence, I have applied
> the simplest solution to just enforce a log_min_messages=FATAL when
> requesting a runtime GUC.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com