Thread: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17757
Logged by:          David Angel
Email address:      david_sisson@dell.com
PostgreSQL version: 14.5
Operating system:   Linux
Description:

On an OS where hugepages are enabled, if no hugepages resources are assigned
in Kubernetes and the postgres instance is set to hugepages = off in the
config then one would assume that the DB would not use hugepages.
However, because the initdb process uses postgresql.conf.sample or
postgresql.conf.template instead of the actual specified configuration the
applied setting is actually hugepages = try during initdb.
In these cases, the initdb phase will attempt to allocate huge pages that
are available in the OS, but it will be denied access by Kubernetes and
fail.

Here is a PR with a possible fix:
https://github.com/postgres/postgres/pull/114/files

Here are some links for further information
Ours: https://github.com/CrunchyData/postgres-operator/issues/3477

Others with the same having no solution to disable huge pages.
https://github.com/CrunchyData/postgres-operator/issues/3039
https://github.com/CrunchyData/postgres-operator/issues/2258
https://github.com/CrunchyData/postgres-operator/issues/3126
https://github.com/CrunchyData/postgres-operator/issues/3421

Bitnami
https://github.com/bitnami/charts/issues/7901



On 1/20/23 23:48, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      17757
> Logged by:          David Angel
> Email address:      david_sisson@dell.com
> PostgreSQL version: 14.5
> Operating system:   Linux
> Description:        
> 
> On an OS where hugepages are enabled, if no hugepages resources are assigned
> in Kubernetes and the postgres instance is set to hugepages = off in the
> config then one would assume that the DB would not use hugepages.

There's no config at that point - it's initdb that creates it, by
copying the .sample file, IIRC. So not sure which file you're modifying.

> However, because the initdb process uses postgresql.conf.sample or
> postgresql.conf.template instead of the actual specified configuration the
> applied setting is actually hugepages = try during initdb.

Specified how?

> In these cases, the initdb phase will attempt to allocate huge pages that
> are available in the OS, but it will be denied access by Kubernetes and
> fail.

Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
not available, or what? Because that's the only explanation I can see,
looking at the code.

Or it just does not realize there are no hugepages, returns something
and then crashes with SIGBUS later when trying to access it?

> 
> Here is a PR with a possible fix:
> https://github.com/postgres/postgres/pull/114/files
> 

I doubt we want to just go straight to changing the default value for
everyone. IMHO if the "try" logic is somehow broken, we should fix the
try logic, not mess with the defaults.

In the worst case, the operator can probably tweak the .sample config
before calling initdb.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Hi,

On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> On 1/20/23 23:48, PG Bug reporting form wrote:
> > In these cases, the initdb phase will attempt to allocate huge pages that
> > are available in the OS, but it will be denied access by Kubernetes and
> > fail.
> 
> Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> not available, or what? Because that's the only explanation I can see,
> looking at the code.

Yea, that's what I was wondering about as well.


> Or it just does not realize there are no hugepages, returns something
> and then crashes with SIGBUS later when trying to access it?

I assume that that's the case. There's references to bus errors in a bunch of
the linked issues. E.g.
https://github.com/CrunchyData/postgres-operator/issues/413

selecting default max_connections ... sh: line 1:    60 Bus error               (core dumped)
"/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1
 

It's possible that the problem would go away if we used MAP_POPULATE for the
allocation.

I'd guess that this is annoying cgroups stuff :(


> I doubt we want to just go straight to changing the default value for
> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> try logic, not mess with the defaults.

Agreed. But we could disable huge pages explicitly inside initdb - there's
really no point in using it there...

Greetings,

Andres Freund



Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 1/20/23 23:48, PG Bug reporting form wrote:
>> Here is a PR with a possible fix:
>> https://github.com/postgres/postgres/pull/114/files

> I doubt we want to just go straight to changing the default value for
> everyone.

Yeah, that proposal is a non-starter.  I could see providing an
initdb option to adjust the value applied during initdb, though.

Ideally, maybe what we want is a generalized switch that could
replace any variable in the sample config, along the lines of
the server's "-c foo=bar".  I recall having tried to do that and
having run into quoting hazards, but I did not try very hard.

            regards, tom lane



Andres Freund <andres@anarazel.de> writes:
> On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
>> I doubt we want to just go straight to changing the default value for
>> everyone. IMHO if the "try" logic is somehow broken, we should fix the
>> try logic, not mess with the defaults.

> Agreed. But we could disable huge pages explicitly inside initdb - there's
> really no point in using it there...

One of the things initdb is trying to do is establish a set of values
that is known to allow the server to start.  Not using the same settings
that the server is expected to use would break that idea completely.

            regards, tom lane



Hi,

On 2023-01-21 18:33:03 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> >> I doubt we want to just go straight to changing the default value for
> >> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> >> try logic, not mess with the defaults.

> > Agreed. But we could disable huge pages explicitly inside initdb - there's
> > really no point in using it there...
> 
> One of the things initdb is trying to do is establish a set of values
> that is known to allow the server to start.  Not using the same settings
> that the server is expected to use would break that idea completely.

Yea, I'm not saying like the approach. OTOH, we don't provide a proper way to
influence the configuration, which is bad, as this issue shows.

Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly
certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure
that we can actually use the memory in initdb.

Unfortunately it's not unproblematic to use it in general, because with large
shared_buffers values it can be quite slow, because the kernel initializes the
memory in a single thread. I've seen ~3GB/s on multi-socket machines.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly
> certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure
> that we can actually use the memory in initdb.

> Unfortunately it's not unproblematic to use it in general, because with large
> shared_buffers values it can be quite slow, because the kernel initializes the
> memory in a single thread. I've seen ~3GB/s on multi-socket machines.

Hmm ... but if we can't use it by default, we're still back to the
problem of needing a way to tell initdb to do things differently.
I'd just as soon keep that to "set huge_pages = off" rather than
inventing whole new things.

            regards, tom lane



Hi,

On 2023-01-21 15:29:22 -0800, Andres Freund wrote:
> On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> > On 1/20/23 23:48, PG Bug reporting form wrote:
> > > In these cases, the initdb phase will attempt to allocate huge pages that
> > > are available in the OS, but it will be denied access by Kubernetes and
> > > fail.
> >
> > Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> > with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> > not available, or what? Because that's the only explanation I can see,
> > looking at the code.
>
> Yea, that's what I was wondering about as well.
>
>
> > Or it just does not realize there are no hugepages, returns something
> > and then crashes with SIGBUS later when trying to access it?
>
> I assume that that's the case. There's references to bus errors in a bunch of
> the linked issues. E.g.
> https://github.com/CrunchyData/postgres-operator/issues/413
>
> selecting default max_connections ... sh: line 1:    60 Bus error               (core dumped)
"/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1
 
>
> It's possible that the problem would go away if we used MAP_POPULATE for the
> allocation.

> I'd guess that this is annoying cgroups stuff :(

Ah, the fun:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html

  The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
  control group and enforces the limit during page fault. Since HugeTLB
  doesn't support page reclaim, enforcing the limit at page fault time implies
  that, the application will get SIGBUS signal if it tries to fault in HugeTLB
  pages beyond its limit. Therefore the application needs to know exactly how many
  HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
  there are enough available on the machine for all the users to avoid processes
  getting SIGBUS.

but there's also

      Reservation accounting

  hugetlb.<hugepagesize>.rsvd.limit_in_bytes hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
hugetlb.<hugepagesize>.rsvd.usage_in_byteshugetlb.<hugepagesize>.rsvd.failcnt
 

  The HugeTLB controller allows to limit the HugeTLB reservations per control
  group and enforces the controller limit at reservation time and at the fault
  of HugeTLB memory for which no reservation exists. Since reservation limits
  are enforced at reservation time (on mmap or shget), reservation limits
  never causes the application to get SIGBUS signal if the memory was reserved
  before hand. For MAP_NORESERVE allocations, the reservation limit behaves
  the same as the fault limit, enforcing memory usage at fault time and
  causing the application to receive a SIGBUS if it’s crossing its limit.

  Reservation limits are superior to page fault limits described above, since
  reservation limits are enforced at reservation time (on mmap or shget), and
  never causes the application to get SIGBUS signal if the memory was reserved
  before hand. This allows for easier fallback to alternatives such as
  non-HugeTLB memory for example. In the case of page fault accounting, it’s
  very hard to avoid processes getting SIGBUS since the sysadmin needs
  precisely know the HugeTLB usage of all the tasks in the system and make
  sure there is enough pages to satisfy all requests. Avoiding tasks getting
  SIGBUS on overcommited systems is practically impossible with page fault
  accounting.

So the problem is that the wrong time of cgroup limits are used. I don't know
if that's a kubernetes or a postgres-operator issue.

Greetings,

Andres Freund




On 1/22/23 00:30, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> On 1/20/23 23:48, PG Bug reporting form wrote:
>>> Here is a PR with a possible fix:
>>> https://github.com/postgres/postgres/pull/114/files
> 
>> I doubt we want to just go straight to changing the default value for
>> everyone.
> 
> Yeah, that proposal is a non-starter.  I could see providing an
> initdb option to adjust the value applied during initdb, though.
> 
> Ideally, maybe what we want is a generalized switch that could
> replace any variable in the sample config, along the lines of
> the server's "-c foo=bar".  I recall having tried to do that and
> having run into quoting hazards, but I did not try very hard.
> 

Yeah, I was looking for something like "-c" in initdb, only to realize
there's nothing like that. The main "problem" with adding that is that
we're unlikely to backpatch that (I guess), and thus it does not really
solve the issue for the OP.

I'm not sure we'd be keen to backpatch a change of the default, but
maybe we would ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 1/22/23 00:30, Tom Lane wrote:
>> Yeah, that proposal is a non-starter.  I could see providing an
>> initdb option to adjust the value applied during initdb, though.
>> Ideally, maybe what we want is a generalized switch that could
>> replace any variable in the sample config, along the lines of
>> the server's "-c foo=bar".  I recall having tried to do that and
>> having run into quoting hazards, but I did not try very hard.

> Yeah, I was looking for something like "-c" in initdb, only to realize
> there's nothing like that. The main "problem" with adding that is that
> we're unlikely to backpatch that (I guess), and thus it does not really
> solve the issue for the OP.

> I'm not sure we'd be keen to backpatch a change of the default, but
> maybe we would ...

Back-patching a change of default seems like REALLY a non-starter.
Perhaps adding a switch (which would break nothing if not used)
could be discussed, though.

            regards, tom lane



Hi,

On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote:
> I'm not sure we'd be keen to backpatch a change of the default, but
> maybe we would ...

After figuring out that it's clearly a configuration issue *somewhere* outside
of postgres's remit, I'm not that sure it's worth doing something concretely
to avoid the SIGBUS issue.


But if we end up doing something, I think a parameter triggering use of
MAP_POPULATE would be a good idea. It's actually useful outside of the SIGBUS
issue, because benchmarks reach a steady state noticably more quickly when
using it.

OTOH, in a production scenario with large shared_buffers I'd probably not want
to use it, because getting up more quickly and and distributing the memory
initialization across across cores is more important.


I think it'd be ok to explicitly specify such an option in initdb - after all,
initdb does do work to determine the correct shared buffers size etc, and
MAP_POPULATE will lead to a more reliable determination.  Not just with huge
pages, but also with "small" pages and system-level memory overcommit.

Greetings,

Andres Freund



I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard
"postgresql.conf"file.
 
huge_pages can be turned on through outside manipulation but it can't be turned off.
Not without altering the sample config file.

Thanks,
David Angel   😊



Internal Use - Confidential

-----Original Message-----
From: Andres Freund <andres@anarazel.de> 
Sent: Saturday, January 21, 2023 8:08 PM
To: Tomas Vondra
Cc: Tom Lane; Sisson, David; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL] 

Hi,

On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote:
> I'm not sure we'd be keen to backpatch a change of the default, but 
> maybe we would ...

After figuring out that it's clearly a configuration issue *somewhere* outside of postgres's remit, I'm not that sure
it'sworth doing something concretely to avoid the SIGBUS issue.
 


But if we end up doing something, I think a parameter triggering use of MAP_POPULATE would be a good idea. It's
actuallyuseful outside of the SIGBUS issue, because benchmarks reach a steady state noticably more quickly when using
it.

OTOH, in a production scenario with large shared_buffers I'd probably not want to use it, because getting up more
quicklyand and distributing the memory initialization across across cores is more important.
 


I think it'd be ok to explicitly specify such an option in initdb - after all, initdb does do work to determine the
correctshared buffers size etc, and MAP_POPULATE will lead to a more reliable determination.  Not just with huge pages,
butalso with "small" pages and system-level memory overcommit.
 

Greetings,

Andres Freund


> On Jan 23, 2023, at 11:26, Sisson, David <David.Sisson@dell.com> wrote:
>
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard
"postgresql.conf"file. 

We are?  I believe the default is "huge_pages = try", not off.


The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb
runs.
There is no way to turn it off without altering the sample config file.

It is quite difficult to nearly impossible to alter the "postgresql.conf.sample" file using a 3rd party controller.
The file is read-only at runtime within Kubernetes.
Only some controllers let you modify the sample file without rebuilding their code.

You guys are awesome with truly outstanding responses.
I certainly didn't expect my initial solution to be used but to help in finding a good solution.  😊

Thanks,
David Angel





Internal Use - Confidential

-----Original Message-----
From: Christophe Pettus <xof@thebuild.com> 
Sent: Monday, January 23, 2023 1:38 PM
To: Sisson, David
Cc: Andres Freund; Tomas Vondra; Tom Lane; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL] 



> On Jan 23, 2023, at 11:26, Sisson, David <David.Sisson@dell.com> wrote:
> 
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard
"postgresql.conf"file.
 

We are?  I believe the default is "huge_pages = try", not off.

Hi,

On 2023-01-23 19:26:09 +0000, Sisson, David wrote:
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard
"postgresql.conf"file.
 
> huge_pages can be turned on through outside manipulation but it can't be
> turned off.

It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally
huge_pages = try is harmless, because it'll just fall back. That source of
SIGBUSes needs to be fixed regardless of anything else - plenty allocators try
to use huge pages for example, so you'll run into problems regardless of
postgres' default.

That said, I'm for allowing to specify options to initdb.

Greetings,

Andres Freund



"Sisson, David" <David.Sisson@dell.com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when
initdbruns. 

What "standard postgresql.conf file"?  There is no such thing until
initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

            regards, tom lane



On Mon, Jan 23, 2023 at 12:51 PM Sisson, David <David.Sisson@dell.com> wrote:
The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.
There is no way to turn it off without altering the sample config file.


Right, the present way to control what is seen by initdb is postgresql.conf.sample since that is the template that initdb uses to then produce an actual postgresql.conf for the newly created instance.  postgresql.conf is only ever a per-instance configuration file.  It doesn't make sense to "change postgresql.conf in hopes of influencing some future initdb run."

David J.

That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply
customizationsto it. 
To the consumer, it is just yaml to be added to the "postgresql.conf" file.

That makes it much harder to fix and means it is really the controllers at fault.

This probably needs to be explicitly documented when creating a HA cluster or within initdb docs.
https://www.postgresql.org/docs/15/app-initdb.html

Maybe something about how initdb uses sample and what configuration settings must be pre-configured.

Thanks,
David Angel








Internal Use - Confidential

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, January 23, 2023 1:56 PM
To: Sisson, David
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL]

"Sisson, David" <David.Sisson@dell.com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when
initdbruns. 

What "standard postgresql.conf file"?  There is no such thing until initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

            regards, tom lane



A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with
"huge_pages=false".
Would that be acceptable?

Passing in a config setting into initdb would still require a rebuild of all controllers.
That could take months to years at best.

Thanks,
David Angel




Internal Use - Confidential

-----Original Message-----
From: Sisson, David <David_Sisson@Dell.com>
Sent: Monday, January 23, 2023 2:12 PM
To: Tom Lane
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org; Sisson, David; Howell, Stephen
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply
customizationsto it. 
To the consumer, it is just yaml to be added to the "postgresql.conf" file.

That makes it much harder to fix and means it is really the controllers at fault.

This probably needs to be explicitly documented when creating a HA cluster or within initdb docs.
https://www.postgresql.org/docs/15/app-initdb.html

Maybe something about how initdb uses sample and what configuration settings must be pre-configured.

Thanks,
David Angel








Internal Use - Confidential

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, January 23, 2023 1:56 PM
To: Sisson, David
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL]

"Sisson, David" <David.Sisson@dell.com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when
initdbruns. 

What "standard postgresql.conf file"?  There is no such thing until initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

            regards, tom lane



Hi,

On 2023-01-23 20:35:17 +0000, Sisson, David wrote:
> A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with
"huge_pages=false".
> Would that be acceptable?

This is a kubernetes or postgres-operator bug (setting up the wrong cgroup
limit, which the docs explicitly warn against doing). I don't think we want to
accumulate workarounds like that in postgres.


> Passing in a config setting into initdb would still require a rebuild of all controllers.
> That could take months to years at best.

Huh. I don't know anything about the controller, but that seems problematic
independent of this specific issue. And you'd still need to deploy a new
version of postgres to get such changes...


> Internal Use - Confidential

Hardly.

Greetings,

Andres Freund



The controllers generally always pull in the latest PostgreSQL.
It is easy to get the latest version with PostgreSQL updated.

Unfortunately, getting a bug fix is a lot harder.
One controller currently holding this defect for over a year with no end in sight.


Found this:
https://github.com/opencontainers/runtime-spec/issues/1050

Looks like a PR exists for it but the solution is invalid.
https://github.com/kailun-qin/runtime-spec/commit/a6505339204535150260d8e4f0bc112628f1fa87


More info:
https://www.postgresql.org/message-id/flat/20200218093240.jd3lgoxmisyl2tt5%40localhost#61c2c7fc3d3dd80512c9130b6967be16


It would be nice if "try" worked as expected.
I totally understand it is not a PostgreSQL issue but any assistance would be very appreciated.


Thanks,
David Angel




Internal Use - Confidential

-----Original Message-----
From: Andres Freund <andres@anarazel.de>
Sent: Monday, January 23, 2023 3:10 PM
To: Sisson, David
Cc: Tom Lane; Christophe Pettus; Tomas Vondra; pgsql-bugs@lists.postgresql.org; Howell, Stephen
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL]

Hi,

On 2023-01-23 20:35:17 +0000, Sisson, David wrote:
> A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with
"huge_pages=false".
> Would that be acceptable?

This is a kubernetes or postgres-operator bug (setting up the wrong cgroup limit, which the docs explicitly warn
againstdoing). I don't think we want to accumulate workarounds like that in postgres. 


> Passing in a config setting into initdb would still require a rebuild of all controllers.
> That could take months to years at best.

Huh. I don't know anything about the controller, but that seems problematic independent of this specific issue. And
you'dstill need to deploy a new version of postgres to get such changes... 


> Internal Use - Confidential

Hardly.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally
> huge_pages = try is harmless, because it'll just fall back. That source of
> SIGBUSes needs to be fixed regardless of anything else - plenty allocators try
> to use huge pages for example, so you'll run into problems regardless of
> postgres' default.

That seems likely to me too.

> That said, I'm for allowing to specify options to initdb.

Yeah, I think that has enough other potential applications to be worth
doing.  Here's a quick draft patch (sans user-facing docs as yet).
It injects any given values into postgresql.auto.conf, not
postgresql.conf proper.  I did that mainly because the latter looked
beyond the abilities of the primitive string-munging code we have in
there, but I think it can be argued to be a reasonable choice anyway.

            regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7a58c33ace..e6f1f42cae 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -82,6 +82,13 @@
 /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
 extern const char *select_default_timezone(const char *share_path);

+/* simple list of strings */
+typedef struct _stringlist
+{
+    char       *str;
+    struct _stringlist *next;
+} _stringlist;
+
 static const char *const auth_methods_host[] = {
     "trust", "reject", "scram-sha-256", "md5", "password", "ident", "radius",
 #ifdef ENABLE_GSS
@@ -142,6 +149,8 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static _stringlist *extra_guc_names = NULL;
+static _stringlist *extra_guc_values = NULL;
 static bool debug = false;
 static bool noclean = false;
 static bool noinstructions = false;
@@ -253,6 +262,7 @@ static void check_input(char *path);
 static void write_version_file(const char *extrapath);
 static void set_null_conf(void);
 static void test_config_settings(void);
+static bool test_specific_config_settings(int test_conns, int test_buffs);
 static void setup_config(void);
 static void bootstrap_template1(void);
 static void setup_auth(FILE *cmdfd);
@@ -359,6 +369,27 @@ escape_quotes_bki(const char *src)
     return result;
 }

+/*
+ * Add an item at the end of a stringlist.
+ */
+static void
+add_stringlist_item(_stringlist **listhead, const char *str)
+{
+    _stringlist *newentry = pg_malloc(sizeof(_stringlist));
+    _stringlist *oldentry;
+
+    newentry->str = pg_strdup(str);
+    newentry->next = NULL;
+    if (*listhead == NULL)
+        *listhead = newentry;
+    else
+    {
+        for (oldentry = *listhead; oldentry->next; oldentry = oldentry->next)
+             /* skip */ ;
+        oldentry->next = newentry;
+    }
+}
+
 /*
  * make a copy of the array of lines, with token replaced by replacement
  * the first time it occurs on each line.
@@ -873,11 +904,9 @@ test_config_settings(void)
         400, 300, 200, 100, 50
     };

-    char        cmd[MAXPGPATH];
     const int    connslen = sizeof(trial_conns) / sizeof(int);
     const int    bufslen = sizeof(trial_bufs) / sizeof(int);
     int            i,
-                status,
                 test_conns,
                 test_buffs,
                 ok_buffers = 0;
@@ -903,19 +932,7 @@ test_config_settings(void)
         test_conns = trial_conns[i];
         test_buffs = MIN_BUFS_FOR_CONNS(test_conns);

-        snprintf(cmd, sizeof(cmd),
-                 "\"%s\" --check %s %s "
-                 "-c max_connections=%d "
-                 "-c shared_buffers=%d "
-                 "-c dynamic_shared_memory_type=%s "
-                 "< \"%s\" > \"%s\" 2>&1",
-                 backend_exec, boot_options, extra_options,
-                 test_conns, test_buffs,
-                 dynamic_shared_memory_type,
-                 DEVNULL, DEVNULL);
-        fflush(NULL);
-        status = system(cmd);
-        if (status == 0)
+        if (test_specific_config_settings(test_conns, test_buffs))
         {
             ok_buffers = test_buffs;
             break;
@@ -940,19 +957,7 @@ test_config_settings(void)
             break;
         }

-        snprintf(cmd, sizeof(cmd),
-                 "\"%s\" --check %s %s "
-                 "-c max_connections=%d "
-                 "-c shared_buffers=%d "
-                 "-c dynamic_shared_memory_type=%s "
-                 "< \"%s\" > \"%s\" 2>&1",
-                 backend_exec, boot_options, extra_options,
-                 n_connections, test_buffs,
-                 dynamic_shared_memory_type,
-                 DEVNULL, DEVNULL);
-        fflush(NULL);
-        status = system(cmd);
-        if (status == 0)
+        if (test_specific_config_settings(n_connections, test_buffs))
             break;
     }
     n_buffers = test_buffs;
@@ -968,6 +973,48 @@ test_config_settings(void)
     printf("%s\n", default_timezone ? default_timezone : "GMT");
 }

+/*
+ * Test a specific combination of configuration settings.
+ */
+static bool
+test_specific_config_settings(int test_conns, int test_buffs)
+{
+    PQExpBuffer cmd = createPQExpBuffer();
+    int            status;
+    _stringlist *gnames,
+               *gvalues;
+
+    /* Set up the test postmaster invocation */
+    printfPQExpBuffer(cmd,
+                      "\"%s\" --check %s %s "
+                      "-c max_connections=%d "
+                      "-c shared_buffers=%d "
+                      "-c dynamic_shared_memory_type=%s",
+                      backend_exec, boot_options, extra_options,
+                      test_conns, test_buffs,
+                      dynamic_shared_memory_type);
+
+    /* Add any user-given setting overrides */
+    for (gnames = extra_guc_names, gvalues = extra_guc_values;
+         gnames != NULL;        /* assume lists have the same length */
+         gnames = gnames->next, gvalues = gvalues->next)
+    {
+        appendPQExpBuffer(cmd, " -c %s=", gnames->str);
+        appendShellString(cmd, gvalues->str);
+    }
+
+    appendPQExpBuffer(cmd,
+                      " < \"%s\" > \"%s\" 2>&1",
+                      DEVNULL, DEVNULL);
+
+    fflush(NULL);
+    status = system(cmd->data);
+
+    destroyPQExpBuffer(cmd);
+
+    return (status == 0);
+}
+
 /*
  * Calculate the default wal_size with a "pretty" unit.
  */
@@ -994,7 +1041,10 @@ setup_config(void)
     char      **conflines;
     char        repltok[MAXPGPATH];
     char        path[MAXPGPATH];
-    char       *autoconflines[3];
+    char      **autoconflines;
+    int            numautoconflines;
+    _stringlist *gnames,
+               *gvalues;

     fputs(_("creating configuration files ... "), stdout);
     fflush(stdout);
@@ -1152,15 +1202,32 @@ setup_config(void)
     if (chmod(path, pg_file_create_mode) != 0)
         pg_fatal("could not change permissions of \"%s\": %m", path);

+    free(conflines);
+
+
     /*
-     * create the automatic configuration file to store the configuration
-     * parameters set by ALTER SYSTEM command. The parameters present in this
-     * file will override the value of parameters that exists before parse of
-     * this file.
+     * Create the automatic configuration file that stores the configuration
+     * parameters set by the ALTER SYSTEM command.  We also place any
+     * parameter values set with initdb's -c option into this file.
      */
+    numautoconflines = 2;        /* fixed header lines */
+    for (gnames = extra_guc_names; gnames != NULL; gnames = gnames->next)
+        numautoconflines++;
+    autoconflines = (char **) pg_malloc((numautoconflines + 1) * sizeof(char *));
+
     autoconflines[0] = pg_strdup("# Do not edit this file manually!\n");
     autoconflines[1] = pg_strdup("# It will be overwritten by the ALTER SYSTEM command.\n");
-    autoconflines[2] = NULL;
+    numautoconflines = 2;
+    for (gnames = extra_guc_names, gvalues = extra_guc_values;
+         gnames != NULL;        /* assume lists have the same length */
+         gnames = gnames->next, gvalues = gvalues->next)
+    {
+        autoconflines[numautoconflines++] =
+            psprintf("%s = '%s'\n",
+                     gnames->str,
+                     escape_quotes(gvalues->str));
+    }
+    autoconflines[numautoconflines] = NULL;

     sprintf(path, "%s/postgresql.auto.conf", pg_data);

@@ -1168,7 +1235,7 @@ setup_config(void)
     if (chmod(path, pg_file_create_mode) != 0)
         pg_fatal("could not change permissions of \"%s\": %m", path);

-    free(conflines);
+    free(autoconflines);


     /* pg_hba.conf */
@@ -2103,6 +2170,7 @@ usage(const char *progname)
     printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
     printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
     printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+    printf(_("  -c NAME=VALUE             override default setting for server parameter\n"));
     printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
     printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
     printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
@@ -2808,7 +2876,8 @@ main(int argc, char *argv[])

     /* process command-line options */

-    while ((c = getopt_long(argc, argv, "A:dD:E:gkL:nNsST:U:WX:", long_options, &option_index)) != -1)
+    while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+                            long_options, &option_index)) != -1)
     {
         switch (c)
         {
@@ -2831,6 +2900,24 @@ main(int argc, char *argv[])
             case 11:
                 authmethodhost = pg_strdup(optarg);
                 break;
+            case 'c':
+                {
+                    char       *buf = pg_strdup(optarg);
+                    char       *equals = strchr(buf, '=');
+
+                    if (!equals)
+                    {
+                        pg_log_error("-c %s requires a value", buf);
+                        pg_log_error_hint("Try \"%s --help\" for more information.",
+                                          progname);
+                        exit(1);
+                    }
+                    *equals++ = '\0';    /* terminate variable name */
+                    add_stringlist_item(&extra_guc_names, buf);
+                    add_stringlist_item(&extra_guc_values, equals);
+                    pfree(buf);
+                }
+                break;
             case 'D':
                 pg_data = pg_strdup(optarg);
                 break;

Hi,

On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > That said, I'm for allowing to specify options to initdb.
> 
> Yeah, I think that has enough other potential applications to be worth
> doing.  Here's a quick draft patch (sans user-facing docs as yet).
> It injects any given values into postgresql.auto.conf, not
> postgresql.conf proper.  I did that mainly because the latter looked
> beyond the abilities of the primitive string-munging code we have in
> there, but I think it can be argued to be a reasonable choice anyway.

Oh, I had thought we'd just pass them on with -c to the processes that initdb
starts. But perhaps just persisting them isn't a bad idea...

- Andres



Andres Freund <andres@anarazel.de> writes:
> On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
>> Yeah, I think that has enough other potential applications to be worth
>> doing.  Here's a quick draft patch (sans user-facing docs as yet).
>> It injects any given values into postgresql.auto.conf, not
>> postgresql.conf proper.  I did that mainly because the latter looked
>> beyond the abilities of the primitive string-munging code we have in
>> there, but I think it can be argued to be a reasonable choice anyway.

> Oh, I had thought we'd just pass them on with -c to the processes that initdb
> starts. But perhaps just persisting them isn't a bad idea...

It certainly seems to me that that would be the mainstream use-case,
so why not fill in the file as the user probably wants?  They can
always change it.  Also, as I mentioned, the expectation is that
initdb will set up a known-working combination of settings; and
we don't really know that if we leave off whatever was injected by
"-c".  In the case at hand, if we don't propagate "huge_pages = off"
to the installed configuration, the server still won't work.

            regards, tom lane



Hi,

On 2023-01-23 19:45:19 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
> >> Yeah, I think that has enough other potential applications to be worth
> >> doing.  Here's a quick draft patch (sans user-facing docs as yet).
> >> It injects any given values into postgresql.auto.conf, not
> >> postgresql.conf proper.  I did that mainly because the latter looked
> >> beyond the abilities of the primitive string-munging code we have in
> >> there, but I think it can be argued to be a reasonable choice anyway.
> 
> > Oh, I had thought we'd just pass them on with -c to the processes that initdb
> > starts. But perhaps just persisting them isn't a bad idea...
> 
> It certainly seems to me that that would be the mainstream use-case,
> so why not fill in the file as the user probably wants?  They can
> always change it.  Also, as I mentioned, the expectation is that
> initdb will set up a known-working combination of settings; and
> we don't really know that if we leave off whatever was injected by
> "-c".  In the case at hand, if we don't propagate "huge_pages = off"
> to the installed configuration, the server still won't work.

Yea, makes sense.

Greetings,

Andres Freund