Thread: PMChildFlags array
Hi,
Observed below errors in logfile
2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
what could be the possible reasons for this to occur and is there any chance of database corruption after this event ?
Regards,
Bhargav
Any suggestions on this ?
On Thu, 3 Oct 2019 at 16:27, bhargav kamineni <bhargavpostgres@gmail.com> wrote:
Hi,Observed below errors in logfile2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""what could be the possible reasons for this to occur and is there any chance of database corruption after this event ?Regards,Bhargav
On 10/3/19 3:57 AM, bhargav kamineni wrote: > Hi, > > Observed below errors in logfile > > 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07 > 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,"" > 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20 > 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment > ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,"" > 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20 > 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment > ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,"" > Postgres version? OS and version? What was the database doing just before the FATAL line? > what could be the possible reasons for this to occur and is there any > chance of database corruption after this event ? The source(backend/storage/ipc/pmsignal.c ) says: "/* Out of slots ... should never happen, else postmaster.c messed up */ elog(FATAL, "no free slots in PMChildFlags array"); " Someone else will need to comment on what 'messed up' could be. > > > Regards, > Bhargav > -- Adrian Klaver adrian.klaver@aklaver.com
> Hi,
>
> Observed below errors in logfile
>
> 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
>
>Postgres version?
PostgreSQL 10.8
>OS and version?
>
> Observed below errors in logfile
>
> 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
>
>Postgres version?
PostgreSQL 10.8
>OS and version?
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
What was the database doing just before the FATAL line?
Postgres was rejecting a bunch of connections from a user who is having a connection limit set. that was the the FATAL error that i could see in log file.
VERSION="18.04.1 LTS (Bionic Beaver)"
What was the database doing just before the FATAL line?
Postgres was rejecting a bunch of connections from a user who is having a connection limit set. that was the the FATAL error that i could see in log file.
FATAL,53300,"too many connections for role ""user_app"""
db=\du user_app
List of roles
Role name | Attributes | Member of
--------------+-------------------------------+--------------------
user_app | No inheritance +| {application_role}
| 100 connections +|
| Password valid until infinity |
List of roles
Role name | Attributes | Member of
--------------+-------------------------------+--------------------
user_app | No inheritance +| {application_role}
| 100 connections +|
| Password valid until infinity |
> what could be the possible reasons for this to occur and is there any
> chance of database corruption after this event ?
The source(backend/storage/ipc/pmsignal.c ) says:
"/* Out of slots ... should never happen, else postmaster.c messed up */
elog(FATAL, "no free slots in PMChildFlags array");
"
Someone else will need to comment on what 'messed up' could be
> chance of database corruption after this event ?
The source(backend/storage/ipc/pmsignal.c ) says:
"/* Out of slots ... should never happen, else postmaster.c messed up */
elog(FATAL, "no free slots in PMChildFlags array");
"
Someone else will need to comment on what 'messed up' could be
On Thu, 3 Oct 2019 at 18:56, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/3/19 3:57 AM, bhargav kamineni wrote:
> Hi,
>
> Observed below errors in logfile
>
> 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
> 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",,,,,,,,,""
>
Postgres version?
OS and version?
What was the database doing just before the FATAL line?
> what could be the possible reasons for this to occur and is there any
> chance of database corruption after this event ?
The source(backend/storage/ipc/pmsignal.c ) says:
"/* Out of slots ... should never happen, else postmaster.c messed up */
elog(FATAL, "no free slots in PMChildFlags array");
"
Someone else will need to comment on what 'messed up' could be.
>
>
> Regards,
> Bhargav
>
--
Adrian Klaver
adrian.klaver@aklaver.com
bhargav kamineni <bhargavpostgres@gmail.com> writes: > Postgres was rejecting a bunch of connections from a user who is having a > connection limit set. that was the the FATAL error that i could see in log > file. > FATAL,53300,"too many connections for role ""user_app""" > db=\du user_app > List of roles > Role name | Attributes | Member of > --------------+-------------------------------+-------------------- > user_app | No inheritance +| {application_role} > | 100 connections +| > | Password valid until infinity | Hm, what's the overall max_connections limit? (I'm wondering in particular if it's more or less than 100.) regards, tom lane
bhargav kamineni <bhargavpostgres@gmail.com> writes:
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
> FATAL,53300,"too many connections for role ""user_app"""
> db=\du user_app
> List of roles
> Role name | Attributes | Member of
> --------------+-------------------------------+--------------------
> user_app | No inheritance +| {application_role}
> | 100 connections +|
> | Password valid until infinity |
>Hm, what's the overall max_connections limit? (I'm wondering
in particular if it's more or less than 100.)
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
> FATAL,53300,"too many connections for role ""user_app"""
> db=\du user_app
> List of roles
> Role name | Attributes | Member of
> --------------+-------------------------------+--------------------
> user_app | No inheritance +| {application_role}
> | 100 connections +|
> | Password valid until infinity |
>Hm, what's the overall max_connections limit? (I'm wondering
in particular if it's more or less than 100.)
its set to 500;
show max_connections ;
max_connections
-----------------
500
max_connections
-----------------
500
On Thu, 3 Oct 2019 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
bhargav kamineni <bhargavpostgres@gmail.com> writes:
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
> FATAL,53300,"too many connections for role ""user_app"""
> db=\du user_app
> List of roles
> Role name | Attributes | Member of
> --------------+-------------------------------+--------------------
> user_app | No inheritance +| {application_role}
> | 100 connections +|
> | Password valid until infinity |
Hm, what's the overall max_connections limit? (I'm wondering
in particular if it's more or less than 100.)
regards, tom lane
bhargav kamineni <bhargavpostgres@gmail.com> writes: >> What was the database doing just before the FATAL line? > Postgres was rejecting a bunch of connections from a user who is having a > connection limit set. that was the the FATAL error that i could see in log > file. > FATAL,53300,"too many connections for role ""user_app""" So ... how many is "a bunch"? Looking at the code, it seems like it'd be possible for a sufficiently aggressive spawner of incoming connections to reach the MaxLivePostmasterChildren limit. While the postmaster would correctly reject additional connection attempts after that, what it would not do is ensure that any child slots are left for new parallel worker processes. So we could hypothesize that the error you're seeing in the log is from failure to spawn a parallel worker process, due to being out of child slots. However, given that max_connections = 500, MaxLivePostmasterChildren() would be 1000-plus. This would mean that reaching this condition would require *at least* 500 concurrent connection-attempts-that-haven't-yet- been-rejected, maybe well more than that if you didn't have close to 500 legitimately open sessions. That seems like a lot, enough to suggest that you've got some pretty serious bug in your client-side logic. Anyway, I think it's clearly a bug that canAcceptConnections() thinks the number of acceptable connections is identical to the number of allowed child processes; it needs to be less, by the number of background processes we want to support. But it seems like a darn hard-to-hit bug, so I'm not quite sure that that explains your observation. regards, tom lane
On 2019-Oct-03, bhargav kamineni wrote: > bhargav kamineni <bhargavpostgres@gmail.com> writes: > > Postgres was rejecting a bunch of connections from a user who is having a > > connection limit set. that was the the FATAL error that i could see in log > > file. > > FATAL,53300,"too many connections for role ""user_app""" > > > db=\du user_app > > List of roles > > Role name | Attributes | Member of > > --------------+-------------------------------+-------------------- > > user_app | No inheritance +| {application_role} > > | 100 connections +| > > | Password valid until infinity | Was the machine overloaded at the time the problem occurred? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks Tom Lane for detailing the issue.
>So ... how many is "a bunch"?more than 85
>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit. While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.
>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit. While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.
we have enabled "max_parallel_workers_per_gather = 4". 20 days before we ran into this issue .
>However, given that max_connections = 500, MaxLivePostmasterChildren()
>would be 1000-plus. This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions. That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.
>would be 1000-plus. This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions. That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.
below errors observed after crash in postgres logfile :
ERROR: xlog flush request is not satisfied for couple of tables , we have initiated the vacuum full on those tables and the error went off after that.
ERROR: right sibling's left-link doesn't match: block 273660 links to 273500 instead of expected 273661 in index -- observed this error while doing vacuum freeze on databsase , we have dropped this index and created a new one
Observations :
Vacuum freeze analyze job is getting stuck at database end which is initiated thru cronjob, pg_cancel_backend(), pg_termiante_backend() is not able to terminate those stuck process , Restarting the database only able to clear those process , i am thinking this is happening due to corruption (if this is true how can i detect this ? pg_dump ?). is there any way to overcome this problem ?
does migrating the database to a new instance (pg_basebackup and switching over to new instance ) solves this issue ?
Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
On Fri, 4 Oct 2019 at 03:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
bhargav kamineni <bhargavpostgres@gmail.com> writes:
>> What was the database doing just before the FATAL line?
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
> FATAL,53300,"too many connections for role ""user_app"""
So ... how many is "a bunch"?
Looking at the code, it seems like it'd be possible for a sufficiently
aggressive spawner of incoming connections to reach the
MaxLivePostmasterChildren limit. While the postmaster would correctly
reject additional connection attempts after that, what it would not do
is ensure that any child slots are left for new parallel worker processes.
So we could hypothesize that the error you're seeing in the log is from
failure to spawn a parallel worker process, due to being out of child
slots.
However, given that max_connections = 500, MaxLivePostmasterChildren()
would be 1000-plus. This would mean that reaching this condition would
require *at least* 500 concurrent connection-attempts-that-haven't-yet-
been-rejected, maybe well more than that if you didn't have close to
500 legitimately open sessions. That seems like a lot, enough to suggest
that you've got some pretty serious bug in your client-side logic.
Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
regards, tom lane
bhargav kamineni <bhargavpostgres@gmail.com> writes: >> So ... how many is "a bunch"? > more than 85 Hm. That doesn't seem like it'd be enough to trigger the problem; you'd need about max_connections excess connections (that are shortly going to be rejected) to run into this problem, and you said you had max_connections = 500. Maybe several different clients were all doing this at once? But anyway, AFAICS there is only one code path that could lead to the reported error message, so one way or another you got there. I've pushed a fix for this, which will be in next month's releases. > below errors observed after crash in postgres logfile : > ERROR: xlog flush request is not satisfied for couple of tables , we have > initiated the vacuum full on those tables and the error went off after that. > ERROR: right sibling's left-link doesn't match: block 273660 links to > 273500 instead of expected 273661 in index -- observed this error while > doing vacuum freeze on databsase , we have dropped this index and created a > new one That seems unrelated. A postmaster crash shouldn't have any data-corruption consequences, since it never touches any relation files directly. regards, tom lane