Thread: Re: Estimating HugePages Requirements?
moving to pgsql-hackers@ On 6/9/21, 9:41 AM, "Don Seiler" <don@seiler.us> wrote: > I'm trying to set up a chef recipe to reserve enough HugePages on a > linux system for our PG servers. A given VM will only host one PG > cluster and that will be the only thing on that host that uses > HugePages. Blogs that I've seen suggest that it would be as simple > as taking the shared_buffers setting and dividing that by 2MB (huge > page size), however I found that I needed some more. > > In my test case, shared_buffers is set to 4003MB (calculated by > chef) but PG failed to start until I reserved a few hundred more MB. > When I checked VmPeak, it was 4321MB, so I ended up having to > reserve over 2161 huge pages, over a hundred more than I had > originally thought. > > I'm told other factors contribute to this additional memory > requirement, such as max_connections, wal_buffers, etc. I'm > wondering if anyone has been able to come up with a reliable method > for determining the HugePages requirements for a PG cluster based on > the GUC values (that would be known at deployment time). In RDS, we've added a pg_ctl option that returns the amount of shared memory required. Basically, we start up postmaster just enough to get an accurate value from CreateSharedMemoryAndSemaphores() and then shut down. The patch is quite battle-tested at this point (we first started using it in 2017, and we've been enabling huge pages by default since v10). I'd be happy to clean it up and submit it for discussion in pgsql-hackers@ if there is interest. Nathan
> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote: > > I'd be happy to clean it up and submit it for > discussion in pgsql-hackers@ if there is interest. Yes, I'd like to see it. Thanks for offering. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote: >> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote: >> >> I'd be happy to clean it up and submit it for >> discussion in pgsql-hackers@ if there is interest. > > Yes, I'd like to see it. Thanks for offering. Here's the general idea. It still needs a bit of polishing, but I'm hoping this is enough to spark some discussion on the approach. Nathan
Attachment
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote: >>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote: >>> >>> I'd be happy to clean it up and submit it for >>> discussion in pgsql-hackers@ if there is interest. >> >> Yes, I'd like to see it. Thanks for offering. > > Here's the general idea. It still needs a bit of polishing, but I'm > hoping this is enough to spark some discussion on the approach. Here's a rebased version of the patch. Nathan
Attachment
On Mon, Aug 9, 2021 at 3:57 PM Bossart, Nathan <bossartn@amazon.com> wrote:
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
>>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>>
>>> I'd be happy to clean it up and submit it for
>>> discussion in pgsql-hackers@ if there is interest.
>>
>> Yes, I'd like to see it. Thanks for offering.
>
> Here's the general idea. It still needs a bit of polishing, but I'm
> hoping this is enough to spark some discussion on the approach.
Here's a rebased version of the patch.
Nathan
Hi,
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);
Should the parameter be enum / bitmask so that future addition would not change the method signature ?
Cheers
On 8/9/21, 4:05 PM, "Zhihong Yu" <zyu@yugabyte.com> wrote: > -extern void CreateSharedMemoryAndSemaphores(void); > +extern Size CreateSharedMemoryAndSemaphores(bool size_only); > > Should the parameter be enum / bitmask so that future addition would not change the method signature ? I don't have a strong opinion about this. I don't feel that it's really necessary, but if reviewers want a bitmask instead, I can change it. Nathan
Hi, On 2021-08-09 22:57:18 +0000, Bossart, Nathan wrote: > @@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[]) > */ > InitializeMaxBackends(); > > + if (output_shmem) > + { > + char output[64]; > + Size size; > + > + size = CreateSharedMemoryAndSemaphores(true); > + sprintf(output, "%zu", size); > + > + puts(output); > + ExitPostmaster(0); > + } I don't like putting this into PostmasterMain(). Either BootstrapMain() (specifically checker mode) or GucInfoMain() seem like better places. > -void > -CreateSharedMemoryAndSemaphores(void) > +Size > +CreateSharedMemoryAndSemaphores(bool size_only) > { > PGShmemHeader *shim = NULL; > > @@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void) > /* might as well round it off to a multiple of a typical page size */ > size = add_size(size, 8192 - (size % 8192)); > > + if (size_only) > + return size; > + > elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size); > > /* > @@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void) > */ > if (shmem_startup_hook) > shmem_startup_hook(); > + > + return 0; > } That seems like an ugly API to me. Why don't we split the size determination and shmem creation functions into two? Greetings, Andres Freund
On 8/9/21, 8:43 PM, "Andres Freund" <andres@anarazel.de> wrote: > I don't like putting this into PostmasterMain(). Either BootstrapMain() > (specifically checker mode) or GucInfoMain() seem like better places. I think BootstrapModeMain() makes the most sense. It fits in nicely with the --check logic that's already there. With v3, the following command can be used to retrieve the amount of shared memory required. postgres --output-shmem -D dir While testing this new option, I noticed that you can achieve similar results today with the following command, although this one will actually try to create the shared memory, too. postgres --check -D dir -c log_min_messages=debug3 2> >(grep IpcMemoryCreate) IMO the new option is still handy, but I can see the argument that it might not be necessary. > That seems like an ugly API to me. Why don't we split the size > determination and shmem creation functions into two? I did it this way in v3. Nathan
Attachment
On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote: > I think BootstrapModeMain() makes the most sense. It fits in nicely > with the --check logic that's already there. With v3, the following > command can be used to retrieve the amount of shared memory required. > > postgres --output-shmem -D dir > > While testing this new option, I noticed that you can achieve similar > results today with the following command, although this one will > actually try to create the shared memory, too. That may not be the best option. > IMO the new option is still handy, but I can see the argument that it > might not be necessary. A separate option looks handy. Wouldn't it be better to document it in postgres-ref.sgml then? -- Michael
Attachment
On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote: > > I think BootstrapModeMain() makes the most sense. It fits in nicely > > with the --check logic that's already there. With v3, the following > > command can be used to retrieve the amount of shared memory required. > > > > postgres --output-shmem -D dir > > > > While testing this new option, I noticed that you can achieve similar > > results today with the following command, although this one will > > actually try to create the shared memory, too. > > That may not be the best option. I would say that can be a disastrous option. First of all it would probably not work if you already have something running -- especially when using huge pages. And if it does work, in that or other scenarios, it can potentially have significant impact on a running cluster to suddenly allocate many GB of more memory... > > IMO the new option is still handy, but I can see the argument that it > > might not be necessary. > > A separate option looks handy. Wouldn't it be better to document it > in postgres-ref.sgml then? I'd say a lot more than just handy. I don't think the workaround is really all that useful. (haven't looked at the actual patch yet, just commenting on the principle) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On 8/27/21, 7:41 AM, "Magnus Hagander" <magnus@hagander.net> wrote: > On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote: >> On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote: >> > While testing this new option, I noticed that you can achieve similar >> > results today with the following command, although this one will >> > actually try to create the shared memory, too. >> >> That may not be the best option. > > I would say that can be a disastrous option. > > First of all it would probably not work if you already have something > running -- especially when using huge pages. And if it does work, in > that or other scenarios, it can potentially have significant impact on > a running cluster to suddenly allocate many GB of more memory... The v3 patch actually didn't work if the server was already running. I removed that restriction in v4. >> > IMO the new option is still handy, but I can see the argument that it >> > might not be necessary. >> >> A separate option looks handy. Wouldn't it be better to document it >> in postgres-ref.sgml then? > > I'd say a lot more than just handy. I don't think the workaround is > really all that useful. I added some documentation in v4. Nathan
Attachment
On 2021-08-27 16:40:27 +0200, Magnus Hagander wrote: > On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote: > I'd say a lot more than just handy. I don't think the workaround is > really all that useful. +1
On 8/27/21, 11:16 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > I added some documentation in v4. I realized that my attempt at documenting this new option was missing some important context about the meaning of the return value when used against a running server. I added that in v5. Nathan
Attachment
Hi, On 2021-08-27 19:27:18 +0000, Bossart, Nathan wrote: > + <varlistentry> > + <term><option>--output-shmem</option></term> > + <listitem> > + <para> > + Prints the amount of shared memory required for the given > + configuration and exits. This can be used on a running server, but > + the return value reflects the amount of shared memory needed based > + on the current invocation. It does not return the amount of shared > + memory in use by the running server. This must be the first > + argument on the command line. > + </para> > + > + <para> > + This option is useful for determining the number of huge pages > + needed for the server. For more information, see > + <xref linkend="linux-huge-pages"/>. > + </para> > + </listitem> > + </varlistentry> > + One thing I wonder is if this wouldn't better be dealt with in a more generic way. While this is the most problematic runtime computed GUC, it's not the only one. What if we introduced a new shared_memory_size GUC, and made --describe-config output it? Perhaps adding --describe-config=guc-name? I also wonder if we should output the number of hugepages needed instead of the "raw" bytes of shared memory. The whole business about figuring out the huge page size, dividing the shared memory size by that and then rounding up could be removed in that case. Due to huge_page_size it's not even immediately obvious which huge page size one should use... > diff --git a/src/backend/main/main.c b/src/backend/main/main.c > index 3a2a0d598c..c141ae3d1c 100644 > --- a/src/backend/main/main.c > +++ b/src/backend/main/main.c > @@ -182,9 +182,11 @@ main(int argc, char *argv[]) > */ > > if (argc > 1 && strcmp(argv[1], "--check") == 0) > - BootstrapModeMain(argc, argv, true); > + BootstrapModeMain(argc, argv, true, false); > + else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0) > + BootstrapModeMain(argc, argv, false, true); > else if (argc > 1 && strcmp(argv[1], "--boot") == 0) > - BootstrapModeMain(argc, argv, false); > + BootstrapModeMain(argc, argv, false, false); > #ifdef EXEC_BACKEND > else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0) > SubPostmasterMain(argc, argv); help() needs an update too. > diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c > index 3e4ec53a97..b225b1ee70 100644 > --- a/src/backend/storage/ipc/ipci.c > +++ b/src/backend/storage/ipc/ipci.c > @@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size) > total_addin_request = add_size(total_addin_request, size); > } > > +/* > + * CalculateShmemSize > + * Calculates the amount of shared memory and number of semaphores needed. > + * > + * If num_semaphores is not NULL, it will be set to the number of semaphores > + * required. > + * > + * Note that this function freezes the additional shared memory request size > + * from loadable modules. > + */ > +Size > +CalculateShmemSize(int *num_semaphores) > +{ Can you split this into a separate commit? It feels fairy uncontroversial to me, so I think we could just apply it soon? Greetings, Andres Freund
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote: > One thing I wonder is if this wouldn't better be dealt with in a more generic > way. While this is the most problematic runtime computed GUC, it's not the > only one. What if we introduced a new shared_memory_size GUC, and made > --describe-config output it? Perhaps adding --describe-config=guc-name? > > I also wonder if we should output the number of hugepages needed instead of > the "raw" bytes of shared memory. The whole business about figuring out the > huge page size, dividing the shared memory size by that and then rounding up > could be removed in that case. Due to huge_page_size it's not even immediately > obvious which huge page size one should use... I like both of these ideas. > Can you split this into a separate commit? It feels fairy uncontroversial to > me, so I think we could just apply it soon? I attached a patch for just the uncontroversial part, which is unfortunately all I have time for today. Nathan
Attachment
On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote: > On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote: >> One thing I wonder is if this wouldn't better be dealt with in a more generic >> way. While this is the most problematic runtime computed GUC, it's not the >> only one. What if we introduced a new shared_memory_size GUC, and made >> --describe-config output it? Perhaps adding --describe-config=guc-name? >> >> I also wonder if we should output the number of hugepages needed instead of >> the "raw" bytes of shared memory. The whole business about figuring out the >> huge page size, dividing the shared memory size by that and then rounding up >> could be removed in that case. Due to huge_page_size it's not even immediately >> obvious which huge page size one should use... > > I like both of these ideas. That pretty much looks like -C in concept, isn't it? Except that you cannot get the actual total shared memory value because we'd do this operation before loading shared_preload_libraries and miss any amount asked by extensions. There is a problem similar when attempting to do postgres -C data_checksums, for example, which would output an incorrect value even if the cluster has data checksums enabled. -- Michael
Attachment
On Sat, Aug 28, 2021 at 11:00:11AM +0900, Michael Paquier wrote: > On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote: > > On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote: > >> One thing I wonder is if this wouldn't better be dealt with in a more generic > >> way. While this is the most problematic runtime computed GUC, it's not the > >> only one. What if we introduced a new shared_memory_size GUC, and made > >> --describe-config output it? Perhaps adding --describe-config=guc-name? > >> > >> I also wonder if we should output the number of hugepages needed instead of > >> the "raw" bytes of shared memory. The whole business about figuring out the > >> huge page size, dividing the shared memory size by that and then rounding up > >> could be removed in that case. Due to huge_page_size it's not even immediately > >> obvious which huge page size one should use... > > > > I like both of these ideas. > > That pretty much looks like -C in concept, isn't it? Except that you > cannot get the actual total shared memory value because we'd do this > operation before loading shared_preload_libraries and miss any amount > asked by extensions. There is a problem similar when attempting to do > postgres -C data_checksums, for example, which would output an > incorrect value even if the cluster has data checksums enabled. Since we don't want to try to allocate the huge pages, and we also don't want to compute based on shared_buffers alone, did anyone consider if pg_controldata is the right place to put this ? It includes a lot of related stuff: max_connections setting: 100 max_worker_processes setting: 8 - (added in 2013: 6bc8ef0b7f1f1df3998745a66e1790e27424aa0c) max_wal_senders setting: 10 max_prepared_xacts setting: 2 max_locks_per_xact setting: 64 I'm not sure if there's any reason these aren't also shown (?) autovacuum_max_workers - added in 2007: e2a186b03 max_predicate_locks_per_xact - added in 2011: dafaa3efb max_logical_replication_workers max_replication_slots -- Justin
On 8/27/21, 7:01 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote: >> On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote: >>> One thing I wonder is if this wouldn't better be dealt with in a more generic >>> way. While this is the most problematic runtime computed GUC, it's not the >>> only one. What if we introduced a new shared_memory_size GUC, and made >>> --describe-config output it? Perhaps adding --describe-config=guc-name? >>> >>> I also wonder if we should output the number of hugepages needed instead of >>> the "raw" bytes of shared memory. The whole business about figuring out the >>> huge page size, dividing the shared memory size by that and then rounding up >>> could be removed in that case. Due to huge_page_size it's not even immediately >>> obvious which huge page size one should use... >> >> I like both of these ideas. > > That pretty much looks like -C in concept, isn't it? Except that you > cannot get the actual total shared memory value because we'd do this > operation before loading shared_preload_libraries and miss any amount > asked by extensions. There is a problem similar when attempting to do > postgres -C data_checksums, for example, which would output an > incorrect value even if the cluster has data checksums enabled. Attached is a hacky attempt at adding a shared_memory_size GUC in a way that could be used with -C. This should include the amount of shared memory requested by extensions, too. As long as huge_page_size is nonzero, it seems easy enough to provide the number of huge pages needed as well. Nathan
Attachment
On Sat, Aug 28, 2021 at 05:36:37AM +0000, Bossart, Nathan wrote: > Attached is a hacky attempt at adding a shared_memory_size GUC in a > way that could be used with -C. This should include the amount of > shared memory requested by extensions, too. As long as huge_page_size > is nonzero, it seems easy enough to provide the number of huge pages > needed as well. Yes, the implementation is not good. The key thing is that by wanting to support shared_memory_size with the -C switch of postgres, we need to call process_shared_preload_libraries before output_config_variable. This additionally means to call ApplyLauncherRegister() before that so as all the bgworker slots are not taken first. Going through _PG_init() also means that we'd better use ChangeToDataDir() beforehand. Attached is a WIP to show how the order of the operations could be changed, as that's easier to grasp. Even if we don't do that, having the GUC and the refactoring of CalculateShmemSize() would still be useful, as one could just query an existing instance for an estimation of huge pages for a cloned one. The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and GUC_DISALLOW_IN_FILE, with some documentation, of course. I added the flags to the GUC, not the docs. The code setting up the GUC is not good either. It would make sense to just have that in a small wrapper of ipci.c, perhaps. -- Michael
Attachment
On 8/30/21, 12:29 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > Attached is a WIP to show how the order of the operations could be > changed, as that's easier to grasp. Even if we don't do that, having > the GUC and the refactoring of CalculateShmemSize() would still be > useful, as one could just query an existing instance for an estimation > of huge pages for a cloned one. > > The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and > GUC_DISALLOW_IN_FILE, with some documentation, of course. I added the > flags to the GUC, not the docs. The code setting up the GUC is not > good either. It would make sense to just have that in a small wrapper > of ipci.c, perhaps. I moved the GUC calculation to ipci.c, adjusted the docs, and added a huge_pages_required GUC. It's still a little rough around the edges, and I haven't tested it on Windows, but this seems like the direction the patch is headed. Nathan
Attachment
On Tue, Aug 31, 2021 at 05:37:52AM +0000, Bossart, Nathan wrote: > I moved the GUC calculation to ipci.c, adjusted the docs, and added a > huge_pages_required GUC. It's still a little rough around the edges, > and I haven't tested it on Windows, but this seems like the direction > the patch is headed. Hmm. I am not sure about the addition of huge_pages_required, knowing that we would have shared_memory_size. I'd rather let the calculation part to the user with a scan of /proc/meminfo. +#elif defined(WIN32) + hp_size = GetLargePageMinimum(); +#endif + +#if defined(MAP_HUGETLB) || defined(WIN32) + hp_required = (size_b / hp_size) + 1; As of [1], there is the following description: "If the processor does not support large pages, the return value is zero." So there is a problem here. [1]: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-getlargepageminimum -- Michael
Attachment
On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Hmm. I am not sure about the addition of huge_pages_required, knowing > that we would have shared_memory_size. I'd rather let the calculation > part to the user with a scan of /proc/meminfo. I included this based on some feedback from Andres upthread [0]. I went ahead and split the patch set into 3 pieces in case we end up leaving it out. > +#elif defined(WIN32) > + hp_size = GetLargePageMinimum(); > +#endif > + > +#if defined(MAP_HUGETLB) || defined(WIN32) > + hp_required = (size_b / hp_size) + 1; > As of [1], there is the following description: > "If the processor does not support large pages, the return value is > zero." > So there is a problem here. I've fixed this in v4. Nathan [0] https://postgr.es/m/20210827193813.oqo5lamvyzahs35o%40alap3.anarazel.de
Attachment
On Wed, Sep 01, 2021 at 06:28:21PM +0000, Bossart, Nathan wrote: > On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >> Hmm. I am not sure about the addition of huge_pages_required, knowing >> that we would have shared_memory_size. I'd rather let the calculation >> part to the user with a scan of /proc/meminfo. > > I included this based on some feedback from Andres upthread [0]. I > went ahead and split the patch set into 3 pieces in case we end up > leaving it out. Thanks. Anyway, we don't really need huge_pages_required on Windows, do we? The following docs of Windows tell what do to when using large pages: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support The backend code does that as in PGSharedMemoryCreate(), now that I look at it. And there is no way to change the minimum large page size there as far as I can see because that's decided by the processor, no? There is a case for shared_memory_size on Windows to be able to adjust the sizing of the memory of the host, though. >> +#elif defined(WIN32) >> + hp_size = GetLargePageMinimum(); >> +#endif >> + >> +#if defined(MAP_HUGETLB) || defined(WIN32) >> + hp_required = (size_b / hp_size) + 1; >> As of [1], there is the following description: >> "If the processor does not support large pages, the return value is >> zero." >> So there is a problem here. > > I've fixed this in v4. At the end it would be nice to not finish with two GUCs. Both depend on the reordering of the actions done by the postmaster, so I'd be curious to hear the thoughts of others on this particular point. -- Michael
Attachment
On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > Thanks. Anyway, we don't really need huge_pages_required on Windows, > do we? The following docs of Windows tell what do to when using large > pages: > https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support > > The backend code does that as in PGSharedMemoryCreate(), now that I > look at it. And there is no way to change the minimum large page size > there as far as I can see because that's decided by the processor, no? > There is a case for shared_memory_size on Windows to be able to adjust > the sizing of the memory of the host, though. Yeah, huge_pages_required might not serve much purpose for Windows. We could always set it to -1 for Windows if it seems like it'll do more harm than good. > At the end it would be nice to not finish with two GUCs. Both depend > on the reordering of the actions done by the postmaster, so I'd be > curious to hear the thoughts of others on this particular point. Of course. It'd be great to hear others' thoughts on this stuff. Nathan
On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote: > Yeah, huge_pages_required might not serve much purpose for Windows. > We could always set it to -1 for Windows if it seems like it'll do > more harm than good. I'd be fine with this setup on environments where there is no need for it. >> At the end it would be nice to not finish with two GUCs. Both depend >> on the reordering of the actions done by the postmaster, so I'd be >> curious to hear the thoughts of others on this particular point. > > Of course. It'd be great to hear others' thoughts on this stuff. Just to be clear here, the ordering of HEAD is that for the postmaster: - Load configuration. - Handle -C config_param. - checkDataDir(), to check permissions of the data dir, etc. - checkControlFile(), to see if the control file exists. - Switch to data directory as work dir. - Lock file creation. - Initial read of the control file (where the GUC data_checksums is set). - Register apply launcher - shared_preload_libraries With 0002, we have that: - Load configuration. - checkDataDir(), to check permissions of the data dir, etc. - checkControlFile(), to see if the control file exists. - Switch to data directory as work dir. - Register apply launcher - shared_preload_libraries - Calculate the shmem GUCs (new step) - Handle -C config_param. - Lock file creation. - Initial read of the control file (where the GUC data_checksums is set). One thing that would be incorrect upon more review is that we'd still have data_checksums wrong with -C, meaning that the initial read of the control file should be moved further up, and getting the control file checks done before registering workers would be better. Keeping the lock file at the end would be fine AFAIK, but should we worry about the interactions with _PG_init() here? 0001, that refactors the calculation of the shmem size into a different routine, is fine as-is, so I'd like to move on and apply it. -- Michael
Attachment
At Thu, 2 Sep 2021 16:46:56 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in > On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > > Thanks. Anyway, we don't really need huge_pages_required on Windows, > > do we? The following docs of Windows tell what do to when using large > > pages: > > https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support > > > > The backend code does that as in PGSharedMemoryCreate(), now that I > > look at it. And there is no way to change the minimum large page size > > there as far as I can see because that's decided by the processor, no? > > There is a case for shared_memory_size on Windows to be able to adjust > > the sizing of the memory of the host, though. > > Yeah, huge_pages_required might not serve much purpose for Windows. > We could always set it to -1 for Windows if it seems like it'll do > more harm than good. I agreed to this. > > At the end it would be nice to not finish with two GUCs. Both depend > > on the reordering of the actions done by the postmaster, so I'd be > > curious to hear the thoughts of others on this particular point. > > Of course. It'd be great to hear others' thoughts on this stuff. Honestly, I would be satisfied if the following error message contained required huge pages. FATAL: could not map anonymous shared memory: Cannot allocate memory HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space,or huge pages. To reduce the request size (currently 148897792 bytes), reduce PostgreSQL's shared memory usage, perhapsby reducing shared_buffers or max_connections. Or emit a different message if huge_pages=on. FATAL: could not map anonymous shared memory from huge pages HINT: This usually means that PostgreSQL's request for huge pages more than available. The required 2048kB huge pages forthe required memory size (currently 148897792 bytes) is 71 pages. Returning to this feature, even if I am informed that via GUC, I won't add memory by looking shared_memory_size. Anyway since shard_buffers occupies almost all portion of shared memory allocated to postgres, we are not supposed to need such a precise adjustment of the required size of shared memory. On the other hand available number of huge pages is configurable and we need to set it as required. On the other hand, it might seem to me a bit strange that there's only huge_page_required and not shared_memory_size in the view of comprehensiveness or completeness. So my feeling at this point is "I need only huge_pages_required but might want shared_memory_size just for completeness". By the way I noticed that postgres -C huge_page_size shows 0, which I think should have the number used for the calculation if we show huge_page_required. I noticed that postgres -C shared_memory_size showed 137 (= 144703488) whereas the error message above showed 148897792 bytes (142MB). So it seems that something is forgotten while calculating shared_memory_size. As the consequence, launching postgres setting huge_pages_required (69 pages) as vm.nr_hugepages ended up in the "could not map anonymous shared memory" error. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote: >> Yeah, huge_pages_required might not serve much purpose for Windows. >> We could always set it to -1 for Windows if it seems like it'll do >> more harm than good. > > I'd be fine with this setup on environments where there is no need for > it. I did this in v5. > One thing that would be incorrect upon more review is that we'd still > have data_checksums wrong with -C, meaning that the initial read of > the control file should be moved further up, and getting the control > file checks done before registering workers would be better. Keeping > the lock file at the end would be fine AFAIK, but should we worry > about the interactions with _PG_init() here? I think we can avoid so much reordering by moving the -C handling instead. That should also fix things like data_checksums. I split the reordering part out into its own patch in v5. You bring up an interesting point about _PG_init(). Presently, you can safely assume that the data directory is locked during _PG_init(), so there's no need to worry about breaking something on a running server. I don't know how important this is. Most _PG_init() functions that I've seen will define some GUCs, request some shared memory, register some background workers, and/or install some hooks. Those all seem like safe things to do, but I wouldn't be at all surprised to hear examples to the contrary. In any case, it looks like the current ordering of these two steps has been there for 15+ years. If this is a concern, one option would be to disallow running "-C shared_memory_size" on running servers. That would have to extend to GUCs like data_checksums and huge_pages_required, too. > 0001, that refactors the calculation of the shmem size into a > different routine, is fine as-is, so I'd like to move on and apply > it. Sounds good to me. Nathan
Attachment
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: > By the way I noticed that postgres -C huge_page_size shows 0, which I > think should have the number used for the calculation if we show > huge_page_required. I would agree with this if huge_page_size was a runtime-computed GUC, but since it's intended for users to explicitly request the huge page size, it might be slightly confusing. Perhaps another option would be to create a new GUC for this. Or maybe it's enough to note that the value will be changed from 0 at runtime if huge pages are supported. In any case, it might be best to handle this separately. > I noticed that postgres -C shared_memory_size showed 137 (= 144703488) > whereas the error message above showed 148897792 bytes (142MB). So it > seems that something is forgotten while calculating > shared_memory_size. As the consequence, launching postgres setting > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the > "could not map anonymous shared memory" error. Hm. I'm not seeing this with the v5 patch set, so maybe I inadvertently fixed it already. Can you check this again with v5? Nathan
Hi, On 2021-09-01 15:53:52 +0900, Michael Paquier wrote: > Hmm. I am not sure about the addition of huge_pages_required, knowing > that we would have shared_memory_size. I'd rather let the calculation > part to the user with a scan of /proc/meminfo. -1. We can easily do better, what do we gain by making the user do this stuff? Especially because the right value also depends on huge_page_size. Greetings, Andres Freund
On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote: > On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > You bring up an interesting point about _PG_init(). Presently, you > can safely assume that the data directory is locked during _PG_init(), > so there's no need to worry about breaking something on a running > server. I don't know how important this is. Most _PG_init() > functions that I've seen will define some GUCs, request some shared > memory, register some background workers, and/or install some hooks. > Those all seem like safe things to do, but I wouldn't be at all > surprised to hear examples to the contrary. In any case, it looks > like the current ordering of these two steps has been there for 15+ > years. Yeah. What you are describing here matches what I have seen in the past and what we do in core for _PG_init(). Now extensions developers could do more fancy things, like dropping things on-disk to check the load state, for whatever reasons. And things could break in such cases. Perhaps people should not do that, but it is no fun either to break code that has been working for years even if that's just a major upgrade. + * We skip this step if we are just going to print a GUC's value and exit + * a few steps down. */ - CreateDataDirLockFile(true); + if (output_config_variable == NULL) + CreateDataDirLockFile(true); Anyway, 0002 gives me shivers. > If this is a concern, one option would be to disallow running "-C > shared_memory_size" on running servers. That would have to extend to > GUCs like data_checksums and huge_pages_required, too. Just noting this bit from 0003 that would break without 0002: -$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput> -6490428K +$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput> +6339 >> 0001, that refactors the calculation of the shmem size into a >> different routine, is fine as-is, so I'd like to move on and apply >> it. > > Sounds good to me. Applied this one. Without concluding on 0002 yet, another thing that we could do is to just add the GUCs. These sound rather useful on their own (mixed feelings about huge_pages_required but I can see why it is useful to avoid the setup steps and the backend already grabs this information), particularly when it comes to cloned setups that share a lot of properties. -- Michael
Attachment
On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote: >> On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >>> 0001, that refactors the calculation of the shmem size into a >>> different routine, is fine as-is, so I'd like to move on and apply >>> it. >> >> Sounds good to me. > > Applied this one. Thanks! > Without concluding on 0002 yet, another thing that we could do is to > just add the GUCs. These sound rather useful on their own (mixed > feelings about huge_pages_required but I can see why it is useful to > avoid the setup steps and the backend already grabs this information), > particularly when it comes to cloned setups that share a lot of > properties. I think this is a good starting point, but I'd like to follow up on making them visible without completely starting the server. The main purpose for adding these GUCs is to be able to set up huge pages before server startup. Disallowing "-C huge_pages_required" on a running server to enable this use-case seems like a modest tradeoff. Anyway, I'll restructure the remaining patches to add the GUCs first and then address the 0002 business separately. Nathan
On 9/5/21, 9:26 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >> Without concluding on 0002 yet, another thing that we could do is to >> just add the GUCs. These sound rather useful on their own (mixed >> feelings about huge_pages_required but I can see why it is useful to >> avoid the setup steps and the backend already grabs this information), >> particularly when it comes to cloned setups that share a lot of >> properties. > > I think this is a good starting point, but I'd like to follow up on > making them visible without completely starting the server. The main > purpose for adding these GUCs is to be able to set up huge pages > before server startup. Disallowing "-C huge_pages_required" on a > running server to enable this use-case seems like a modest tradeoff. > > Anyway, I'll restructure the remaining patches to add the GUCs first > and then address the 0002 business separately. Attached is a new patch set. The first two patches just add the new GUCs, and the third is an attempt at providing useful values for those GUCs via -C. Nathan
Attachment
On Mon, Sep 06, 2021 at 11:55:42PM +0000, Bossart, Nathan wrote: > Attached is a new patch set. The first two patches just add the new > GUCs, and the third is an attempt at providing useful values for those > GUCs via -C. + sprintf(buf, "%lu MB", size_mb); + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); One small-ish comment about 0002: there is no need to add the unit into the buffer set as GUC_UNIT_MB would take care of that. The patch looks fine. +#ifndef WIN32 +#include <sys/mman.h> +#endif So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much a fan of moving around platform-specific checks when these have remained local to each shmem implementation. Could it be cleaner to add GetHugePageSize() to win32_shmem.c and make it always declared in the SysV implementation? -- Michael
Attachment
At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in > On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: > > By the way I noticed that postgres -C huge_page_size shows 0, which I > > think should have the number used for the calculation if we show > > huge_page_required. > > I would agree with this if huge_page_size was a runtime-computed GUC, > but since it's intended for users to explicitly request the huge page > size, it might be slightly confusing. Perhaps another option would be > to create a new GUC for this. Or maybe it's enough to note that the > value will be changed from 0 at runtime if huge pages are supported. > In any case, it might be best to handle this separately. (Sorry, I was confused, but) yeah, agreed. > > I noticed that postgres -C shared_memory_size showed 137 (= 144703488) > > whereas the error message above showed 148897792 bytes (142MB). So it > > seems that something is forgotten while calculating > > shared_memory_size. As the consequence, launching postgres setting > > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the > > "could not map anonymous shared memory" error. > > Hm. I'm not seeing this with the v5 patch set, so maybe I > inadvertently fixed it already. Can you check this again with v5? Thanks! I confirmed that the numbers match with v5. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > + sprintf(buf, "%lu MB", size_mb); > + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); > One small-ish comment about 0002: there is no need to add the unit > into the buffer set as GUC_UNIT_MB would take care of that. The patch > looks fine. I fixed this in v7. > +#ifndef WIN32 > +#include <sys/mman.h> > +#endif > So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much > a fan of moving around platform-specific checks when these have > remained local to each shmem implementation. Could it be cleaner to > add GetHugePageSize() to win32_shmem.c and make it always declared in > the SysV implementation? I don't know if it's really all that much cleaner, but I did it this way in v7. IMO it's a little weird that GetHugePageSize() doesn't return the value from GetLargePageMinimum(), but that's what we'd need to do to avoid setting huge_pages_required for Windows without any platform-specific checks. Nathan
Attachment
On 9/6/21, 11:24 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: > At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in >> On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: >> > I noticed that postgres -C shared_memory_size showed 137 (= 144703488) >> > whereas the error message above showed 148897792 bytes (142MB). So it >> > seems that something is forgotten while calculating >> > shared_memory_size. As the consequence, launching postgres setting >> > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the >> > "could not map anonymous shared memory" error. >> >> Hm. I'm not seeing this with the v5 patch set, so maybe I >> inadvertently fixed it already. Can you check this again with v5? > > Thanks! I confirmed that the numbers match with v5. Thanks for confirming. Nathan
On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote: > On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >> + sprintf(buf, "%lu MB", size_mb); >> + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); >> One small-ish comment about 0002: there is no need to add the unit >> into the buffer set as GUC_UNIT_MB would take care of that. The patch >> looks fine. > > I fixed this in v7. Switched the variable name to shared_memory_size_mb for easier grepping, moved it to a more correct location with the other read-only GUCS, and applied 0002. Well, 0001 here. >> +#ifndef WIN32 >> +#include <sys/mman.h> >> +#endif >> So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much >> a fan of moving around platform-specific checks when these have >> remained local to each shmem implementation. Could it be cleaner to >> add GetHugePageSize() to win32_shmem.c and make it always declared in >> the SysV implementation? > > I don't know if it's really all that much cleaner, but I did it this > way in v7. IMO it's a little weird that GetHugePageSize() doesn't > return the value from GetLargePageMinimum(), but that's what we'd need > to do to avoid setting huge_pages_required for Windows without any > platform-specific checks. Thanks. Keeping MAP_HUGETLB within the SysV portions is an improvement IMO. That's subject to one's taste, perhaps. After sleeping on it, I'd be fine to live with the logic based on the new GUC flag called GUC_RUNTIME_COMPUTED to control if a parameter can be looked at either an earlier or a later stage of the startup sequences, with the later stage meaning that such parameters cannot be checked if a server is running. This case was originally broken anyway, so it does not make it worse, just better. + This can be used on a running server for most parameters. However, + the server must be shut down for some runtime-computed parameters + (e.g., <xref linkend="guc-huge-pages-required"/>). Perhaps we should add a couple of extra parameters here, like shared_memory_size, and perhaps wal_segment_size? The list does not have to be complete, just meaningful enough. -- Michael
Attachment
On 2021/09/08 12:50, Michael Paquier wrote: > On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote: >> On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >>> + sprintf(buf, "%lu MB", size_mb); >>> + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); >>> One small-ish comment about 0002: there is no need to add the unit >>> into the buffer set as GUC_UNIT_MB would take care of that. The patch >>> looks fine. >> >> I fixed this in v7. > > Switched the variable name to shared_memory_size_mb for easier > grepping, moved it to a more correct location with the other read-only > GUCS, and applied 0002. Well, 0001 here. Thanks for adding useful feature! + {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM, When reading the applied code, I found the category of shared_memory_size is RESOURCES_MEM. Why? This seems right because the parameter is related to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS is more proper as the category? BTW, the category of any other PGC_INTERNAL parameters seems to be PRESET_OPTIONS. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 9/7/21, 8:50 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Switched the variable name to shared_memory_size_mb for easier > grepping, moved it to a more correct location with the other read-only > GUCS, and applied 0002. Well, 0001 here. Thanks! And thanks for cleaning up the small mistake in aa37a43. > + This can be used on a running server for most parameters. However, > + the server must be shut down for some runtime-computed parameters > + (e.g., <xref linkend="guc-huge-pages-required"/>). > Perhaps we should add a couple of extra parameters here, like > shared_memory_size, and perhaps wal_segment_size? The list does not > have to be complete, just meaningful enough. Good idea. Nathan
On 9/8/21, 12:11 AM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote: > Thanks for adding useful feature! :) > + {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM, > > When reading the applied code, I found the category of shared_memory_size > is RESOURCES_MEM. Why? This seems right because the parameter is related > to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS > is more proper as the category? BTW, the category of any other > PGC_INTERNAL parameters seems to be PRESET_OPTIONS. Yeah, I did wonder about this. We're even listing it in the "Preset Options" section in the docs. I updated this in the new patch set, which is attached. Nathan
Attachment
On Wed, Sep 08, 2021 at 04:10:41PM +0900, Fujii Masao wrote: > + {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM, > > When reading the applied code, I found the category of shared_memory_size > is RESOURCES_MEM. Why? This seems right because the parameter is related > to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS > is more proper as the category? BTW, the category of any other > PGC_INTERNAL parameters seems to be PRESET_OPTIONS. Yes, that's an oversight from me. I was looking at that yesterday, noticed some exceptions in the GUC list with things not allowed in files and just concluded that RESOURCES_MEM should be fine, but the docs tell a different story. Thanks, fixed. -- Michael
Attachment
On Wed, Sep 08, 2021 at 05:52:33PM +0000, Bossart, Nathan wrote: > Yeah, I did wonder about this. We're even listing it in the "Preset > Options" section in the docs. I updated this in the new patch set, > which is attached. I broke that again, so rebased as v9 attached. FWIW, I don't have an environment at hand these days to test properly 0001, so this will have to wait a bit. I really like the approach taken by 0002, and it is independent of the other patch while extending support for postgres -c to provide the correct runtime values. So let's wrap this part first. No need to send a reorganized patch set. -- Michael
Attachment
On Thu, Sep 09, 2021 at 01:19:14PM +0900, Michael Paquier wrote: > I broke that again, so rebased as v9 attached. Well. -- Michael
Attachment
On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > FWIW, I don't have an environment at hand these days to test properly > 0001, so this will have to wait a bit. I really like the approach > taken by 0002, and it is independent of the other patch while > extending support for postgres -c to provide the correct runtime > values. So let's wrap this part first. No need to send a reorganized > patch set. Sounds good. For 0001, the biggest thing on my mind at the moment is the name of the GUC. "huge_pages_required" feels kind of ambiguous. From the name alone, it could mean either "the number of huge pages required" or "huge pages are required for the server to run." Also, the number of huge pages required is not actually required if you don't want to run the server with huge pages. I think it might be clearer to somehow indicate that the value is essentially the size of the main shared memory area in terms of the huge page size, but I'm not sure how to do that concisely. Perhaps it is enough to just make sure the description of "huge_pages_required" is detailed enough. For 0002, I have two small concerns. My first concern is that it might be confusing to customers when the runtime GUCs cannot be returned for a running server. We have the note in the docs, but if you're encountering it on the command line, it's not totally clear what the problem is. $ postgres -D . -C log_min_messages warning $ postgres -D . -C shared_memory_size 2021-09-09 18:51:21.617 UTC [7924] FATAL: lock file "postmaster.pid" already exists 2021-09-09 18:51:21.617 UTC [7924] HINT: Is another postmaster (PID 7912) running in data directory "/local/home/bossartn/pgdata"? My other concern is that by default, viewing the runtime-computed GUCs will also emit a LOG. $ postgres -D . -C shared_memory_size 142 2021-09-09 18:53:25.194 UTC [8006] LOG: database system is shut down Running these commands with log_min_messages=debug5 emits way more information for the runtime-computed GUCs than for others, but IMO that is alright. However, perhaps we should adjust the logging in 0002 to improve the default user experience. I attached an attempt at that. With the attached patch, trying to view a runtime-computed GUC on a running server will look like this: $ postgres -D . -C shared_memory_size 2021-09-09 21:24:21.552 UTC [6224] FATAL: lock file "postmaster.pid" already exists 2021-09-09 21:24:21.552 UTC [6224] DETAIL: Runtime-computed GUC "shared_memory_size" cannot be viewed on a runningserver. 2021-09-09 21:24:21.552 UTC [6224] HINT: Is another postmaster (PID 3628) running in data directory "/local/home/bossartn/pgdata"? And viewing a runtime-computed GUC on a server that is shut down will look like this: $ postgres -D . -C shared_memory_size 142 I'm not tremendously happy with the patch, but I hope that it at least helps with the discussion. Nathan
Attachment
On Thu, Sep 09, 2021 at 09:53:22PM +0000, Bossart, Nathan wrote: > For 0002, I have two small concerns. My first concern is that it > might be confusing to customers when the runtime GUCs cannot be > returned for a running server. We have the note in the docs, but if > you're encountering it on the command line, it's not totally clear > what the problem is. Yeah, that's true. There are more unlikely-to-happen errors that could be triggered and prevent the command to work. I have never tried using error_context_stack in a code path as early as that, to be honest. > Running these commands with log_min_messages=debug5 emits way more > information for the runtime-computed GUCs than for others, but IMO > that is alright. However, perhaps we should adjust the logging in > 0002 to improve the default user experience. I attached an attempt at > that. Registered bgworkers would generate a DEBUG entry, for one. > I'm not tremendously happy with the patch, but I hope that it at least > helps with the discussion. As far as the behavior is documented, I'd be fine with the approach to keep the code in its simplest shape. I agree that the message is confusing, still it is not wrong either as we try to query a run-time parameter, but we need the lock. -- Michael
Attachment
On 9/9/21, 7:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > As far as the behavior is documented, I'd be fine with the approach to > keep the code in its simplest shape. I agree that the message is > confusing, still it is not wrong either as we try to query a run-time > parameter, but we need the lock. That seems alright to me. Nathan
On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote: > For 0001, the biggest thing on my mind at the moment is the name of > the GUC. "huge_pages_required" feels kind of ambiguous. From the > name alone, it could mean either "the number of huge pages required" > or "huge pages are required for the server to run." Also, the number > of huge pages required is not actually required if you don't want to > run the server with huge pages. +1 to all of that. > I think it might be clearer to > somehow indicate that the value is essentially the size of the main > shared memory area in terms of the huge page size, but I'm not sure > how to do that concisely. Perhaps it is enough to just make sure the > description of "huge_pages_required" is detailed enough. shared_memory_size_in_huge_pages? It's kinda long, but a long name that you can understand without reading the docs is better than a short one where you can't. -- Robert Haas EDB: http://www.enterprisedb.com
On 9/10/21, 1:02 PM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> I think it might be clearer to >> somehow indicate that the value is essentially the size of the main >> shared memory area in terms of the huge page size, but I'm not sure >> how to do that concisely. Perhaps it is enough to just make sure the >> description of "huge_pages_required" is detailed enough. > > shared_memory_size_in_huge_pages? It's kinda long, but a long name > that you can understand without reading the docs is better than a > short one where you can't. I think that's an improvement. The only other idea I have at the moment is num_huge_pages_required_for_shared_memory. Nathan
On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > shared_memory_size_in_huge_pages? It's kinda long, but a long name > > that you can understand without reading the docs is better than a > > short one where you can't. > > I think that's an improvement. The only other idea I have at the > moment is num_huge_pages_required_for_shared_memory. Hmm, that to me sounds like maybe only part of shared memory uses huge pages and maybe we're just giving you the number required for that part. I realize that it doesn't work that way but I don't know if everyone will. -- Robert Haas EDB: http://www.enterprisedb.com
On 9/13/21, 8:59 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> I think that's an improvement. The only other idea I have at the >> moment is num_huge_pages_required_for_shared_memory. > > Hmm, that to me sounds like maybe only part of shared memory uses huge > pages and maybe we're just giving you the number required for that > part. I realize that it doesn't work that way but I don't know if > everyone will. Yeah, I agree. What about huge_pages_needed_for_shared_memory_size or huge_pages_needed_for_main_shared_memory? I'm still not stoked about using "required" or "needed" in the name, as it sounds like huge pages must be allocated for the server to run, which is only true if huge_pages=on. I haven't thought of a better word to use, though. Nathan
On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote: > Yeah, I agree. What about > huge_pages_needed_for_shared_memory_size or > huge_pages_needed_for_main_shared_memory? I'm still not stoked about > using "required" or "needed" in the name, as it sounds like huge pages > must be allocated for the server to run, which is only true if > huge_pages=on. I haven't thought of a better word to use, though. I prefer the first of those to the second. I don't find it particularly better or worse than my previous suggestion of shared_memory_size_in_huge_pages. -- Robert Haas EDB: http://www.enterprisedb.com
"Bossart, Nathan" <bossartn@amazon.com> writes: > Yeah, I agree. What about > huge_pages_needed_for_shared_memory_size or > huge_pages_needed_for_main_shared_memory? Seems like "huge_pages_needed_for_shared_memory" would be sufficient. regards, tom lane
On Mon, Sep 13, 2021 at 04:20:00PM -0400, Robert Haas wrote: > On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> Yeah, I agree. What about >> huge_pages_needed_for_shared_memory_size or >> huge_pages_needed_for_main_shared_memory? I'm still not stoked about >> using "required" or "needed" in the name, as it sounds like huge pages >> must be allocated for the server to run, which is only true if >> huge_pages=on. I haven't thought of a better word to use, though. > > I prefer the first of those to the second. I don't find it > particularly better or worse than my previous suggestion of > shared_memory_size_in_huge_pages. I am not particularly fond of the use "needed" in this context, so I'd be fine with your suggestion of "shared_memory_size_in_huge_pages. Some other ideas I could think of: - shared_memory_size_as_huge_pages - huge_pages_for_shared_memory_size Having shared_memory_size in the GUC name is kind of appealing though in terms of grepping, and one gets the relationship with shared_memory_size immediately. If the consensus is huge_pages_needed_for_shared_memory_size, I won't fight it, but IMO that's too long. -- Michael
Attachment
On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Seems like "huge_pages_needed_for_shared_memory" would be sufficient. I think we are down to either shared_memory_size_in_huge_pages or huge_pages_needed_for_shared_memory. Robert's argument against huge_pages_needed_for_shared_memory was that it might sound like only part of shared memory uses huge pages and we're only giving the number required for that. Speaking of which, isn't that technically true? For shared_memory_size_in_huge_pages, the intent is to make it sound like we are providing shared_memory_size in terms of the huge page size, but I think it could also be interpreted as "the amount of shared memory that is currently stored in huge pages." I personally lean towards huge_pages_needed_for_shared_memory because it feels the most clear and direct to me. I'm not vehemently opposed to shared_memory_size_in_huge_pages, though. I don't think either one is too misleading. Nathan
At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in > On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > Seems like "huge_pages_needed_for_shared_memory" would be sufficient. > > I think we are down to either shared_memory_size_in_huge_pages or > huge_pages_needed_for_shared_memory. Robert's argument against > huge_pages_needed_for_shared_memory was that it might sound like only > part of shared memory uses huge pages and we're only giving the number > required for that. Speaking of which, isn't that technically true? > For shared_memory_size_in_huge_pages, the intent is to make it sound > like we are providing shared_memory_size in terms of the huge page > size, but I think it could also be interpreted as "the amount of > shared memory that is currently stored in huge pages." > > I personally lean towards huge_pages_needed_for_shared_memory because > it feels the most clear and direct to me. I'm not vehemently opposed > to shared_memory_size_in_huge_pages, though. I don't think either one > is too misleading. I like 'in' slightly than 'for' in this context. I stand by Michael that that name looks somewhat too long especially considering that that name won't be completed on shell command lines, but won't fight it, too. On the other hand the full-spelled name can be thought as one can spell it out from memory easily than a name halfway shortened. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 9/13/21, 5:49 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: > At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in >> On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> > Seems like "huge_pages_needed_for_shared_memory" would be sufficient. >> >> I think we are down to either shared_memory_size_in_huge_pages or >> huge_pages_needed_for_shared_memory. Robert's argument against >> huge_pages_needed_for_shared_memory was that it might sound like only >> part of shared memory uses huge pages and we're only giving the number >> required for that. Speaking of which, isn't that technically true? >> For shared_memory_size_in_huge_pages, the intent is to make it sound >> like we are providing shared_memory_size in terms of the huge page >> size, but I think it could also be interpreted as "the amount of >> shared memory that is currently stored in huge pages." >> >> I personally lean towards huge_pages_needed_for_shared_memory because >> it feels the most clear and direct to me. I'm not vehemently opposed >> to shared_memory_size_in_huge_pages, though. I don't think either one >> is too misleading. > > I like 'in' slightly than 'for' in this context. I stand by Michael > that that name looks somewhat too long especially considering that > that name won't be completed on shell command lines, but won't fight > it, too. On the other hand the full-spelled name can be thought as > one can spell it out from memory easily than a name halfway shortened. I think I see more support for shared_memory_size_in_huge_pages than for huge_pages_needed_for_shared_memory at the moment. I'll update the patch set in the next day or two to use shared_memory_size_in_huge_pages unless something changes in the meantime. Nathan
On Tue, Sep 14, 2021 at 06:00:44PM +0000, Bossart, Nathan wrote: > I think I see more support for shared_memory_size_in_huge_pages than > for huge_pages_needed_for_shared_memory at the moment. I'll update > the patch set in the next day or two to use > shared_memory_size_in_huge_pages unless something changes in the > meantime. I have been looking at the patch to add the new GUC flag and the new sequence for postgres -C, and we could have some TAP tests. There were two places that made sense to me: pg_checksums/t/002_actions.pl and recovery/t/017_shm.pl. I have chosen the former as it has coverage across more platforms, and used data_checksums for this purpose, with an extra positive test to check for the case where a GUC can be queried while the server is running. There are four parameters that are updated here: - shared_memory_size - wal_segment_size - data_checksums - data_directory_mode That looks sensible, after looking at the full set of GUCs. Attached is a refreshed patch (commit message is the same as v9 for now), with some minor tweaks and the tests. Thoughts? -- Michael
Attachment
On 9/14/21, 8:06 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Attached is a refreshed patch (commit message is the same as v9 for > now), with some minor tweaks and the tests. > > Thoughts? LGTM + This can be used on a running server for most parameters. However, + the server must be shut down for some runtime-computed parameters + (e.g., <xref linkend="guc-shared-memory-size"/>, and + <xref linkend="guc-wal-segment-size"/>). nitpick: I think you can remove the comma before the "and" in the list of examples. Nathan
On Wed, Sep 15, 2021 at 10:31:20PM +0000, Bossart, Nathan wrote: > + This can be used on a running server for most parameters. However, > + the server must be shut down for some runtime-computed parameters > + (e.g., <xref linkend="guc-shared-memory-size"/>, and > + <xref linkend="guc-wal-segment-size"/>). > > nitpick: I think you can remove the comma before the "and" in the list > of examples. Fixed that, and applied. Could you rebase the last patch with the name suggested for the last GUC, including the docs? It looks like we are heading for shared_memory_size_in_huge_pages. -- Michael
Attachment
On 9/15/21, 7:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Fixed that, and applied. Could you rebase the last patch with the > name suggested for the last GUC, including the docs? It looks like we > are heading for shared_memory_size_in_huge_pages. Thanks! And done. For the huge pages setup documentation, I considered sending stderr to /dev/null to eliminate the LOG from the output, but I opted against that. That would've looked like this: postgres -D $PGDATA -C shared_memory_size_in_huge_pages 2> /dev/null Otherwise, there aren't any significant changes in this version of the patch besides the name change. Nathan
Attachment
On 9/16/21, 10:17 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote: > + * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages > + * is not supported, *hugepagesize and *mmap_flags will be set to 0. > > nitpick: *are* not supported, as you say elsewhere. Updated. I think I originally chose "is" because I was referring to Huge Pages as a singular feature, but that sounds a bit awkward, and I don't think there's any substantial difference either way. > + gettext_noop("Shows the number of huge pages needed for the main shared memory area."), > > Maybe this was already discussed, but "main" could be misleading. > > To me that sounds like there might be huge pages needed for something other > than the "main" area and the reported value might turn out to be inadequate, > (which is exactly the issue these patch are trying to address). > > In particular, this sounds like it's just going to report > shared_buffers/huge_page_size (since shared buffers is usually the "main" use > of shared memory) - rather than reporting the size of the entire shared memory > in units of huge_pages. I'm not sure I agree on this one. The documentation for huge_pages [0] and shared_memory_type [1] uses the same phrasing multiple times, and the new shared_memory_size GUC uses it as well [2]. I don't see anything in the documentation that suggests that shared_buffers is the only thing in the main shared memory area, and the documentation for setting up huge pages makes no mention of any extra memory that needs to be considered, either. Furthermore, I'm not sure what else we'd call it. I don't think it's accurate to suggest that it's the entirety of shared memory for the server, as it's possible to dynamically allocate more as needed (which IIUC won't use any explicitly allocated huge pages). Nathan [0] https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-HUGE-PAGES [1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE [2] https://www.postgresql.org/docs/devel/runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE
Attachment
On Thu, Sep 16, 2021 at 09:26:56PM +0000, Bossart, Nathan wrote: > I'm not sure I agree on this one. The documentation for huge_pages > [0] and shared_memory_type [1] uses the same phrasing multiple times, > and the new shared_memory_size GUC uses it as well [2]. I don't see > anything in the documentation that suggests that shared_buffers is the > only thing in the main shared memory area, and the documentation for > setting up huge pages makes no mention of any extra memory that needs > to be considered, either. Looks rather sane to me, FWIW. I have not tested on Linux properly yet (not tempted to take my bets on the buildfarm on a Friday, either), but I should be able to handle that at the beginning of next week. + GetHugePageSize(&hp_size, &unused); + if (hp_size != 0) I'd rather change GetHugePageSize() to be able to accept NULL for the parameter values, rather than declaring such variables. + To determine the number of huge pages needed, use the + <command>postgres</command> command to see the value of + <xref linkend="guc-shared-memory-size-in-huge-pages"/>. We may want to say as well here that the server should be offline? It would not hurt to duplicate this information with postgres-ref.sgml. + This setting is supported only on Linux. It is always set to Nit: This paragraph is missing two <productname>s for Linux. The docs are random about that, but these are new entries. -- Michael
Attachment
On 9/16/21, 7:21 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > + GetHugePageSize(&hp_size, &unused); > + if (hp_size != 0) > I'd rather change GetHugePageSize() to be able to accept NULL for the > parameter values, rather than declaring such variables. Done. > + To determine the number of huge pages needed, use the > + <command>postgres</command> command to see the value of > + <xref linkend="guc-shared-memory-size-in-huge-pages"/>. > We may want to say as well here that the server should be offline? > It would not hurt to duplicate this information with > postgres-ref.sgml. Done. > + This setting is supported only on Linux. It is always set to > Nit: This paragraph is missing two <productname>s for Linux. The docs > are random about that, but these are new entries. Done. Nathan
Attachment
Should we also initialize the shared memory GUCs in bootstrap and single-user mode? I think I missed this in bd17880. Nathan diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 48615c0ebc..4c4cf44871 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -324,6 +324,12 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) InitializeMaxBackends(); + /* + * Initialize runtime-computed GUCs that depend on the amount of shared + * memory required. + */ + InitializeShmemGUCs(); + CreateSharedMemoryAndSemaphores(); /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 0775abe35d..cae0b079b9 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3978,6 +3978,12 @@ PostgresSingleUserMain(int argc, char *argv[], /* Initialize MaxBackends */ InitializeMaxBackends(); + /* + * Initialize runtime-computed GUCs that depend on the amount of shared + * memory required. + */ + InitializeShmemGUCs(); + CreateSharedMemoryAndSemaphores(); /*
On Fri, Sep 17, 2021 at 04:31:44PM +0000, Bossart, Nathan wrote: > Done. Thanks. I have gone through the last patch this morning, did some tests on all the platforms I have at hand (including Linux) and finished by applying it after doing some small tweaks. First, I have finished by extending GetHugePageSize() to accept NULL for its first argument hugepagesize. A second thing was in the docs, where it is still useful IMO to keep the reference to /proc/meminfo and /sys/kernel/mm/hugepages to let users know how the system impacts the calculation of the new GUC. Let's see what the buildfarm thinks about it. -- Michael
Attachment
On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote: > Should we also initialize the shared memory GUCs in bootstrap and > single-user mode? I think I missed this in bd17880. Why would we need that for the bootstrap mode? While looking at the patch for shared_memory_size, I have looked at those code paths to note that some of the runtime GUCs would be set thanks to the load of the control file, but supporting this case sounded rather limited to me for --single when it came to shared memory and huge page estimation and we don't load shared_preload_libraries in this context either, which could lead to wrong estimations. Anyway, I am not going to fight hard if people would like that for the --single mode, even if it may lead to an underestimation of the shmem allocated. -- Michael
Attachment
On 9/20/21, 6:48 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Thanks. I have gone through the last patch this morning, did some > tests on all the platforms I have at hand (including Linux) and > finished by applying it after doing some small tweaks. First, I have > finished by extending GetHugePageSize() to accept NULL for its first > argument hugepagesize. A second thing was in the docs, where it is > still useful IMO to keep the reference to /proc/meminfo and > /sys/kernel/mm/hugepages to let users know how the system impacts the > calculation of the new GUC. > > Let's see what the buildfarm thinks about it. Thanks! Nathan
On 9/20/21, 7:29 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote: >> Should we also initialize the shared memory GUCs in bootstrap and >> single-user mode? I think I missed this in bd17880. > > Why would we need that for the bootstrap mode? > > While looking at the patch for shared_memory_size, I have looked at > those code paths to note that some of the runtime GUCs would be set > thanks to the load of the control file, but supporting this case > sounded rather limited to me for --single when it came to shared > memory and huge page estimation and we don't load > shared_preload_libraries in this context either, which could lead to > wrong estimations. Anyway, I am not going to fight hard if people > would like that for the --single mode, even if it may lead to an > underestimation of the shmem allocated. I was looking at this from the standpoint of keeping the startup steps consistent between the modes. Looking again, I can't think of a strong reason to add it to bootstrap mode. I think the case for adding it to single-user mode is a bit stronger, as commands like "SHOW shared_memory_size;" currently return 0. I lean in favor of adding it for single-user mode, but it's probably fine either way. Nathan
On Tue, Sep 21, 2021 at 04:06:38PM +0000, Bossart, Nathan wrote: > I was looking at this from the standpoint of keeping the startup steps > consistent between the modes. Looking again, I can't think of > a strong reason to add it to bootstrap mode. I think the case for > adding it to single-user mode is a bit stronger, as commands like > "SHOW shared_memory_size;" currently return 0. I lean in favor of > adding it for single-user mode, but it's probably fine either way. I am not sure either as that's a tradeoff between an underestimation and no information. The argument of consistency indeed matters. Let's see if others have any opinion to share on this point. -- Michael
Attachment
On Tue, Sep 21, 2021 at 11:53 PM Michael Paquier <michael@paquier.xyz> wrote: > I am not sure either as that's a tradeoff between an underestimation > and no information. The argument of consistency indeed matters. > Let's see if others have any opinion to share on this point. Well, if we think the information won't be safe to use, it's better to report nothing than a wrong value, I think. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 9, 2021 at 11:53 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > FWIW, I don't have an environment at hand these days to test properly > > 0001, so this will have to wait a bit. I really like the approach > > taken by 0002, and it is independent of the other patch while > > extending support for postgres -c to provide the correct runtime > > values. So let's wrap this part first. No need to send a reorganized > > patch set. > > Sounds good. > > For 0001, the biggest thing on my mind at the moment is the name of > the GUC. "huge_pages_required" feels kind of ambiguous. From the > name alone, it could mean either "the number of huge pages required" > or "huge pages are required for the server to run." Also, the number > of huge pages required is not actually required if you don't want to > run the server with huge pages. I think it might be clearer to > somehow indicate that the value is essentially the size of the main > shared memory area in terms of the huge page size, but I'm not sure > how to do that concisely. Perhaps it is enough to just make sure the > description of "huge_pages_required" is detailed enough. > > For 0002, I have two small concerns. My first concern is that it > might be confusing to customers when the runtime GUCs cannot be > returned for a running server. We have the note in the docs, but if > you're encountering it on the command line, it's not totally clear > what the problem is. > > $ postgres -D . -C log_min_messages > warning > $ postgres -D . -C shared_memory_size > 2021-09-09 18:51:21.617 UTC [7924] FATAL: lock file "postmaster.pid" already exists > 2021-09-09 18:51:21.617 UTC [7924] HINT: Is another postmaster (PID 7912) running in data directory "/local/home/bossartn/pgdata"? > > My other concern is that by default, viewing the runtime-computed GUCs > will also emit a LOG. > > $ postgres -D . -C shared_memory_size > 142 > 2021-09-09 18:53:25.194 UTC [8006] LOG: database system is shut down > > Running these commands with log_min_messages=debug5 emits way more > information for the runtime-computed GUCs than for others, but IMO > that is alright. However, perhaps we should adjust the logging in > 0002 to improve the default user experience. I attached an attempt at > that. > > With the attached patch, trying to view a runtime-computed GUC on a > running server will look like this: > > $ postgres -D . -C shared_memory_size > 2021-09-09 21:24:21.552 UTC [6224] FATAL: lock file "postmaster.pid" already exists > 2021-09-09 21:24:21.552 UTC [6224] DETAIL: Runtime-computed GUC "shared_memory_size" cannot be viewed on a runningserver. > 2021-09-09 21:24:21.552 UTC [6224] HINT: Is another postmaster (PID 3628) running in data directory "/local/home/bossartn/pgdata"? > > And viewing a runtime-computed GUC on a server that is shut down will > look like this: > > $ postgres -D . -C shared_memory_size > 142 Nothing fixing this ended up actually getting committed, right? That is, we still get the extra log output? And in fact, the command documented on https://www.postgresql.org/docs/devel/kernel-resources.html doesn't actually produce the output that the docs show, it also shows the log line, in the default config? If we can't fix the extra logging we should at least have our docs represent reality -- maybe by adding a "2>/dev/null" entry? But it'd be better to have it not output that log in the first place... (Of course what I'd really want is to be able to run it on a cluster that's running, but that was discussed downthread so I'm not bringing that one up for changes now) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote: > Nothing fixing this ended up actually getting committed, right? That > is, we still get the extra log output? Correct. > And in fact, the command documented on > https://www.postgresql.org/docs/devel/kernel-resources.html doesn't > actually produce the output that the docs show, it also shows the log > line, in the default config? If we can't fix the extra logging we > should at least have our docs represent reality -- maybe by adding a > "2>/dev/null" entry? But it'd be better to have it not output that log > in the first place... I attached a patch to adjust the documentation for now. This apparently crossed my mind earlier [0], but I didn't follow through with it for some reason. > (Of course what I'd really want is to be able to run it on a cluster > that's running, but that was discussed downthread so I'm not bringing > that one up for changes now) I think it is worth revisiting the extra logging and the ability to view runtime-computed GUCs on a running server. Should this be an open item for v15, or do you think it's alright to leave it for the v16 development cycle? [0] https://postgr.es/m/C45224E1-29C8-414C-A8E6-B718C07ACB94%40amazon.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote: > On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote: >> And in fact, the command documented on >> https://www.postgresql.org/docs/devel/kernel-resources.html doesn't >> actually produce the output that the docs show, it also shows the log >> line, in the default config? If we can't fix the extra logging we >> should at least have our docs represent reality -- maybe by adding a >> "2>/dev/null" entry? But it'd be better to have it not output that log >> in the first place... > > I attached a patch to adjust the documentation for now. This apparently > crossed my mind earlier [0], but I didn't follow through with it for some > reason. Another thing that we can add is -c log_min_messages=fatal, but my method is more complicated than a simple redirection, of course :) >> (Of course what I'd really want is to be able to run it on a cluster >> that's running, but that was discussed downthread so I'm not bringing >> that one up for changes now) > > I think it is worth revisiting the extra logging and the ability to view > runtime-computed GUCs on a running server. Should this be an open item for > v15, or do you think it's alright to leave it for the v16 development > cycle? Well, this is a completely new problem as it opens the door of potential concurrent access issues with the data directory lock file while reading values from the control file. And that's not mandatory to be able to get those estimations without having to allocate a large chunk of memory, which was the primary goal discussed upthread as far as I recall. So I would leave that as an item to potentially tackle in future versions. -- Michael
Attachment
On Tue, Mar 15, 2022 at 3:41 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote: > > On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote: > >> And in fact, the command documented on > >> https://www.postgresql.org/docs/devel/kernel-resources.html doesn't > >> actually produce the output that the docs show, it also shows the log > >> line, in the default config? If we can't fix the extra logging we > >> should at least have our docs represent reality -- maybe by adding a > >> "2>/dev/null" entry? But it'd be better to have it not output that log > >> in the first place... > > > > I attached a patch to adjust the documentation for now. This apparently > > crossed my mind earlier [0], but I didn't follow through with it for some > > reason. > > Another thing that we can add is -c log_min_messages=fatal, but my > method is more complicated than a simple redirection, of course :) Either does work, but yours has more characters :) > >> (Of course what I'd really want is to be able to run it on a cluster > >> that's running, but that was discussed downthread so I'm not bringing > >> that one up for changes now) > > > > I think it is worth revisiting the extra logging and the ability to view > > runtime-computed GUCs on a running server. Should this be an open item for > > v15, or do you think it's alright to leave it for the v16 development > > cycle? > > Well, this is a completely new problem as it opens the door of > potential concurrent access issues with the data directory lock file > while reading values from the control file. And that's not mandatory > to be able to get those estimations without having to allocate a large > chunk of memory, which was the primary goal discussed upthread as far > as I recall. So I would leave that as an item to potentially tackle > in future versions. I think we're talking about two different things here. I think the "avoid extra logging" would be worth seeing if we can address for 15. The "able to run on a live cluster" seems a lot bigger and more scary and not 15 material. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, Mar 15, 2022 at 11:02:37PM +0100, Magnus Hagander wrote: > I think we're talking about two different things here. > > I think the "avoid extra logging" would be worth seeing if we can > address for 15. A simple approach could be to just set log_min_messages to PANIC before exiting. I've attached a patch for this. With this patch, we'll still see a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a running server, and there will be no extra output as long as the user sets log_min_messages to INFO or higher (i.e., not a DEBUG* value). For comparison, 'postgres -C' for a non-runtime-computed GUC does not emit extra output as long as the user sets log_min_messages to DEBUG2 or higher. > The "able to run on a live cluster" seems a lot bigger and more scary > and not 15 material. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote: > A simple approach could be to just set log_min_messages to PANIC before > exiting. I've attached a patch for this. With this patch, we'll still see > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a > running server, and there will be no extra output as long as the user sets > log_min_messages to INFO or higher (i.e., not a DEBUG* value). For > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit > extra output as long as the user sets log_min_messages to DEBUG2 or higher. I created a commitfest entry for this: https://commitfest.postgresql.org/38/3596/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote: > A simple approach could be to just set log_min_messages to PANIC before > exiting. I've attached a patch for this. With this patch, we'll still see > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a > running server, and there will be no extra output as long as the user sets > log_min_messages to INFO or higher (i.e., not a DEBUG* value). For > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit > extra output as long as the user sets log_min_messages to DEBUG2 or higher. > puts(config_val ? config_val : ""); > + > + /* don't emit shutdown messages */ > + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); > + > ExitPostmaster(0); That's fancy, but I don't like that much. And this would not protect either against any messages generated before this code path, either, even if that should be enough for the current HEAD . My solution for the docs is perhaps too confusing for the end-user, and we are talking about a Linux-only thing here anyway. So, at the end, I am tempted to just add the "2> /dev/null" as suggested upthread by Nathan and call it a day. Does that sound fine? -- Michael
Attachment
On Wed, Mar 23, 2022 at 03:25:48PM +0900, Michael Paquier wrote: > My solution for the docs is perhaps too confusing for the end-user, > and we are talking about a Linux-only thing here anyway. So, at the > end, I am tempted to just add the "2> /dev/null" as suggested upthread > by Nathan and call it a day. This still sounds like the best way to go for now, so done this way as of bbd4951. -- Michael
Attachment
On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote: > > A simple approach could be to just set log_min_messages to PANIC before > > exiting. I've attached a patch for this. With this patch, we'll still see > > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a > > running server, and there will be no extra output as long as the user sets > > log_min_messages to INFO or higher (i.e., not a DEBUG* value). For > > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit > > extra output as long as the user sets log_min_messages to DEBUG2 or higher. > > > puts(config_val ? config_val : ""); > > + > > + /* don't emit shutdown messages */ > > + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); > > + > > ExitPostmaster(0); > > That's fancy, but I don't like that much. And this would not protect > either against any messages generated before this code path, either, But neither would the suggestion of redirecting stderr to /dev/null. In fact, doing the redirect it will *also* throw away any FATAL that happens. In fact, using the 2>/dev/null method, we *also* remove the message that says there's another postmaster running in this directory, which is strictly worse than the override of log_min_messages. That said, the redirect can be removed without recompiling postgres, so it is probably still hte better choice as a temporary workaround. But we should really look into getting a better solution in place once we start on 16. > My solution for the docs is perhaps too confusing for the end-user, > and we are talking about a Linux-only thing here anyway. So, at the > end, I am tempted to just add the "2> /dev/null" as suggested upthread > by Nathan and call it a day. Does that sound fine? What would be a linux only thing? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote: > On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote: >> On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote: >> > A simple approach could be to just set log_min_messages to PANIC before >> > exiting. I've attached a patch for this. With this patch, we'll still see >> > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a >> > running server, and there will be no extra output as long as the user sets >> > log_min_messages to INFO or higher (i.e., not a DEBUG* value). For >> > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit >> > extra output as long as the user sets log_min_messages to DEBUG2 or higher. >> >> > puts(config_val ? config_val : ""); >> > + >> > + /* don't emit shutdown messages */ >> > + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); >> > + >> > ExitPostmaster(0); >> >> That's fancy, but I don't like that much. And this would not protect >> either against any messages generated before this code path, either, > > But neither would the suggestion of redirecting stderr to /dev/null. > In fact, doing the redirect it will *also* throw away any FATAL that > happens. In fact, using the 2>/dev/null method, we *also* remove the > message that says there's another postmaster running in this > directory, which is strictly worse than the override of > log_min_messages. > > That said, the redirect can be removed without recompiling postgres, > so it is probably still hte better choice as a temporary workaround. > But we should really look into getting a better solution in place once > we start on 16. A couple of other options to consider: 1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'. We might need some special logic for handling the case where the user is inspecting the log_min_messages parameter. With this approach, you'd probably never get extra output unless something was wrong (e.g., database already running when inspecting a runtime-computed GUC). Also, this would silence any extra output that you might see today with non-runtime-computed GUCs. 2) Add some way to skip just the shutdown message (e.g., a variable set when output_config_variable is true). With this approach, you wouldn't get extra output by default, but you still might if log_min_messages is set to something like DEBUG3. This wouldn't impact any extra output that you see today with non-runtime-computed GUCs. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Mar 24, 2022 at 01:31:08PM -0700, Nathan Bossart wrote: > A couple of other options to consider: > > 1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'. > We might need some special logic for handling the case where the user is > inspecting the log_min_messages parameter. With this approach, you'd > probably never get extra output unless something was wrong (e.g., database > already running when inspecting a runtime-computed GUC). Also, this would > silence any extra output that you might see today with non-runtime-computed > GUCs. > > 2) Add some way to skip just the shutdown message (e.g., a variable set > when output_config_variable is true). With this approach, you wouldn't get > extra output by default, but you still might if log_min_messages is set to > something like DEBUG3. This wouldn't impact any extra output that you see > today with non-runtime-computed GUCs. I've attached a first attempt at option 1. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote: > But neither would the suggestion of redirecting stderr to /dev/null. > In fact, doing the redirect it will *also* throw away any FATAL that > happens. In fact, using the 2>/dev/null method, we *also* remove the > message that says there's another postmaster running in this > directory, which is strictly worse than the override of > log_min_messages. Well, we could also tweak more the command with a redirection of stderr to a log file or such, and tell to look at it for errors. > That said, the redirect can be removed without recompiling postgres, > so it is probably still hte better choice as a temporary workaround. > But we should really look into getting a better solution in place once > we start on 16. But do we really need a better or more invasive solution for already running servers though? A SHOW command would be able to do the work already in this case. This would lack consistency compared to the offline case, but we are not without option either. That leaves the case where the server is running, has allocated memory but is not ready to accept connections, like crash recovery, still this use case looks rather thin to me. >> My solution for the docs is perhaps too confusing for the end-user, >> and we are talking about a Linux-only thing here anyway. So, at the >> end, I am tempted to just add the "2> /dev/null" as suggested upthread >> by Nathan and call it a day. Does that sound fine? > > What would be a linux only thing? Perhaps not at some point in the future. Now that's under a section of the docs only for Linux. -- Michael
Attachment
On Wed, Apr 20, 2022, 00:12 Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
> But neither would the suggestion of redirecting stderr to /dev/null.
> In fact, doing the redirect it will *also* throw away any FATAL that
> happens. In fact, using the 2>/dev/null method, we *also* remove the
> message that says there's another postmaster running in this
> directory, which is strictly worse than the override of
> log_min_messages.
Well, we could also tweak more the command with a redirection of
stderr to a log file or such, and tell to look at it for errors.
That's would be a pretty terrible ux though.
> That said, the redirect can be removed without recompiling postgres,
> so it is probably still hte better choice as a temporary workaround.
> But we should really look into getting a better solution in place once
> we start on 16.
But do we really need a better or more invasive solution for already
running servers though? A SHOW command would be able to do the work
already in this case. This would lack consistency compared to the
offline case, but we are not without option either. That leaves the
case where the server is running, has allocated memory but is not
ready to accept connections, like crash recovery, still this use case
looks rather thin to me.
I agree that thats a very narrow use case. And I'm nog sure the use case of a running server is even that important here - it's really the offline one that's important. Or rather, the really compelling one is when there is a server running but I want to check the value offline because it will change. SHOW doesn't help there because it shows the value based on the currently running configuration, not the new one after a restart.
I don't agree that the redirect is a solution. It's a workaround.
>> My solution for the docs is perhaps too confusing for the end-user,
>> and we are talking about a Linux-only thing here anyway. So, at the
>> end, I am tempted to just add the "2> /dev/null" as suggested upthread
>> by Nathan and call it a day. Does that sound fine?
>
> What would be a linux only thing?
Perhaps not at some point in the future. Now that's under a section
of the docs only for Linux.
Hmm. So what's the solution on windows? I guess maybe it's not as important there because there is no limit on huge pages, but generally getting the expected shared memory usage might be useful? Just significantly less important.
/Magnus
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote: > I agree that thats a very narrow use case. And I'm not sure the use case of > a running server is even that important here - it's really the offline one > that's important. Or rather, the really compelling one is when there is a > server running but I want to check the value offline because it will > change. SHOW doesn't help there because it shows the value based on the > currently running configuration, not the new one after a restart. You mean the case of a server where one would directly change postgresql.conf on a running server, and use postgres -C to see how much the kernel settings need to be changed before the restart? > Hmm. So what's the solution on windows? I guess maybe it's not as important > there because there is no limit on huge pages, but generally getting the > expected shared memory usage might be useful? Just significantly less > important. Contrary to Linux, we don't need to care about the number of large pages that are necessary because there is no equivalent of vm.nr_hugepages on Windows (see [1]), do we? If that were the case, we'd have a use case for huge_page_size, additionally. That's the case where shared_memory_size_in_huge_pages comes in handy, as much as does huge_page_size, and note that shared_memory_size works on WIN32. [1]: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support -- Michael
Attachment
On Mon, Apr 25, 2022 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote:
> I agree that thats a very narrow use case. And I'm not sure the use case of
> a running server is even that important here - it's really the offline one
> that's important. Or rather, the really compelling one is when there is a
> server running but I want to check the value offline because it will
> change. SHOW doesn't help there because it shows the value based on the
> currently running configuration, not the new one after a restart.
You mean the case of a server where one would directly change
postgresql.conf on a running server, and use postgres -C to see how
much the kernel settings need to be changed before the restart?
Yes.
AIUI that was the original use-case for this feature. It certainly was for me :)
> Hmm. So what's the solution on windows? I guess maybe it's not as important
> there because there is no limit on huge pages, but generally getting the
> expected shared memory usage might be useful? Just significantly less
> important.
Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]), do we? If that were the case,
we'd have a use case for huge_page_size, additionally.
Right, for this one in particular -- that's what I meant with my comment about there not being a limit. But this feature works for other settings as well, not just the huge pages one. Exactly what the use-cases are can vary, but surely they would have the same problems wrt redirects?
On Mon, Apr 25, 2022 at 04:55:25PM +0200, Magnus Hagander wrote: > AIUI that was the original use-case for this feature. It certainly was for > me :) Perhaps we'd be fine with relaxing the requirements here knowing that the control file should never be larger than PG_CONTROL_MAX_SAFE_SIZE (aka the read should be atomic so it could be made lockless). At the end of the day, to be absolutely correct in the shmem size estimation, I think that we are going to need what's proposed here or the sizing may not be right depending on how extensions adjust GUCs after they load their _PG_init(): https://www.postgresql.org/message-id/20220419154658.GA2487941@nathanxps13 That's a bit independent, but not completely unrelated either depending on how exact you want your number of estimated huge pages to be. Just wanted to mention it. >> Contrary to Linux, we don't need to care about the number of large >> pages that are necessary because there is no equivalent of >> vm.nr_hugepages on Windows (see [1]), do we? If that were the case, >> we'd have a use case for huge_page_size, additionally. > > Right, for this one in particular -- that's what I meant with my comment > about there not being a limit. But this feature works for other settings as > well, not just the huge pages one. Exactly what the use-cases are can > vary, but surely they would have the same problems wrt redirects? Yes, the redirection issue would apply to all the run-time GUCs. -- Michael
Attachment
On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote: > Yes, the redirection issue would apply to all the run-time GUCs. Should this be tracked as an open item for v15? There was another recent report about the extra log output [0]. [0] https://www.postgresql.org/message-id/YnARlI5nvbziobR4%40momjian.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, May 06, 2022 at 10:13:18AM -0700, Nathan Bossart wrote: > On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote: >> Yes, the redirection issue would apply to all the run-time GUCs. > > Should this be tracked as an open item for v15? There was another recent > report about the extra log output [0]. That makes it for two complaints on two separate threads. So an open item seems adapted to adjust this behavior. I have looked at the patch posted at [1], and I don't quite understand why you need the extra dance with log_min_messages. Why don't you just set the GUC at the end of the code path in PostmasterMain() where we print non-runtime-computed parameters? I am not really worrying about users deciding to set log_min_messages to PANIC in postgresql.conf when it comes to postgres -C, TBH, as they'd miss the FATAL messages if the command is attempted on a server already starting. Per se the attached. [1]: https://www.postgresql.org/message-id/20220328173503.GA137769@nathanxps13 -- Michael
Attachment
On Mon, May 09, 2022 at 03:53:24PM +0900, Michael Paquier wrote: > I have looked at the patch posted at [1], and I don't quite understand > why you need the extra dance with log_min_messages. Why don't you > just set the GUC at the end of the code path in PostmasterMain() where > we print non-runtime-computed parameters? The log_min_messages dance avoids extra output when inspecting non-runtime-computed GUCs, like this: ~/pgdata$ postgres -D . -C log_min_messages -c log_min_messages=debug5 debug5 2022-05-10 09:06:04.728 PDT [3715607] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make 2022-05-10 09:06:04.728 PDT [3715607] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make 2022-05-10 09:06:04.728 PDT [3715607] DEBUG: proc_exit(0): 0 callbacks to make 2022-05-10 09:06:04.728 PDT [3715607] DEBUG: exit(0) AFAICT you need to set log_min_messages to at least DEBUG3 to see extra output for the non-runtime-computed GUCs, so it might not be worth the added complexity. > I am not really worrying > about users deciding to set log_min_messages to PANIC in > postgresql.conf when it comes to postgres -C, TBH, as they'd miss the > FATAL messages if the command is attempted on a server already > starting. I don't have a strong opinion on this one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, May 10, 2022 at 09:12:49AM -0700, Nathan Bossart wrote: > AFAICT you need to set log_min_messages to at least DEBUG3 to see extra > output for the non-runtime-computed GUCs, so it might not be worth the > added complexity. This set of messages is showing up for ages with zero complaints from the field AFAIK, and nobody would use this level of logging except developers. One thing that overriding log_min_messages to FATAL does, however, is to not show anymore those debug3 messages when querying a runtime-computed GUC, but that's the kind of things we'd hide. Your patch would hide those entries in both cases. Perhaps we could do that, but at the end, I don't really see any need to complicate this code path more than necessary, and this is enough to silence the logs in the cases we care about basically all the time, even if the log levels are reduced a bit on a given cluster. Hence, I have applied the simplest solution to just enforce a log_min_messages=FATAL when requesting a runtime GUC. -- Michael
Attachment
On Wed, May 11, 2022 at 02:34:25PM +0900, Michael Paquier wrote: > Hence, I have applied > the simplest solution to just enforce a log_min_messages=FATAL when > requesting a runtime GUC. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com