Thread: Multiple postmasters running from same directory
Hi,
We are running Postgresql 9.4 with streaming replication and repmgr. Operating system is RHEL6.8
On the master I can see multiple postmaster processes from the same data directory.
ps -ef |grep -i postgres|grep postm
postgres 81440 1 0 Jan31 ? 00:11:37 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
postgres 97072 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
postgres 97074 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
The streaming replication with one standby looks fine.
I was expecting to see only one postmaster process instead of three and the time shown in PS output for two extra processes changes to current time with every PS command I enter. Secondly, I logfile is full of "Incomplete startup packet" message.
I need help from you experts, Is this the right behaviour of postgres? what could have gone wrong in my case.
Best Regards
Vikas
Vikas Sharma wrote: > We are running Postgresql 9.4 with streaming replication and repmgr. Operating system is RHEL6.8 > > On the master I can see multiple postmaster processes from the same data directory. > > ps -ef |grep -i postgres|grep postm > postgres 81440 1 0 Jan31 ? 00:11:37 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data > postgres 97072 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data > postgres 97074 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data > > The streaming replication with one standby looks fine. > > I was expecting to see only one postmaster process instead of three and the time shown in > PS output for two extra processes changes to current time with every PS command I enter. > Secondly, I logfile is full of "Incomplete startup packet" message. > > I need help from you experts, Is this the right behaviour of postgres? what could have gone wrong in my case. That looks ok. The two other processes are children of the postmaster. It is strange that their process title did not get updated. What do you see for the processes with "pid" 97072 and 97074 in pg_stat_activity? The "incomplete startup packet" is caused by processes that connect to the PostgreSQL TCP port, but don't complete a database connection. Often these are monitoring or load balancing programs. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > Vikas Sharma wrote: >> On the master I can see multiple postmaster processes from the same data directory. >> ps -ef |grep -i postgres|grep postm >> postgres 81440 1 0 Jan31 ? 00:11:37 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data >> postgres 97072 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data >> postgres 97074 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data > The two other processes are children of the postmaster. > It is strange that their process title did not get updated. Seeing that they're showing zero runtime, I bet that these are just-forked children that have not had time to change their process title yet. The thing that is strange is that you have a steady enough flow of new connections that there are usually some children like that. > The "incomplete startup packet" is caused by processes that connect to the > PostgreSQL TCP port, but don't complete a database connection. > Often these are monitoring or load balancing programs. Putting two and two together, you have some monitoring program that is hitting the postmaster with a constant stream of TCP connection requests none of which get completed, resulting in a whole lot of useless fork activity. Dial down the monitoring. regards, tom lane
On Tue, Feb 13, 2018 at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: >> Vikas Sharma wrote: >>> On the master I can see multiple postmaster processes from the same data directory. >>> ps -ef |grep -i postgres|grep postm >>> postgres 81440 1 0 Jan31 ? 00:11:37 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data >>> postgres 97072 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data >>> postgres 97074 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data > >> The two other processes are children of the postmaster. >> It is strange that their process title did not get updated. > > Seeing that they're showing zero runtime, I bet that these are just-forked > children that have not had time to change their process title yet. > The thing that is strange is that you have a steady enough flow of new > connections that there are usually some children like that. I assume proc title is changed after full startup, as it shows db and user.... >> The "incomplete startup packet" is caused by processes that connect to the >> PostgreSQL TCP port, but don't complete a database connection. >> Often these are monitoring or load balancing programs. > > Putting two and two together, you have some monitoring program that is > hitting the postmaster with a constant stream of TCP connection requests > none of which get completed, resulting in a whole lot of useless fork > activity. Dial down the monitoring. Adding the incomplete startup to the mix, it may be a misconfigured monitoring program sending just a byte or two, or zero, and then waiting for response, which will give ps more time to catch the child in that state. Haven't look at the code, but given messages state with 1 identifier byte plus a 4 byte length, many of the forms of reading that would lead to a big wait for at least 5 bytes, or for the first byte. Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes: > On Tue, Feb 13, 2018 at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Putting two and two together, you have some monitoring program that is >> hitting the postmaster with a constant stream of TCP connection requests >> none of which get completed, resulting in a whole lot of useless fork >> activity. Dial down the monitoring. > Adding the incomplete startup to the mix, it may be a misconfigured > monitoring program sending just a byte or two, or zero, and then > waiting for response, which will give ps more time to catch the child > in that state. Haven't look at the code, but given messages state with > 1 identifier byte plus a 4 byte length, many of the forms of reading > that would lead to a big wait for at least 5 bytes, or for the first > byte. Hm, yeah. From memory, the child process will wait a maximum of 60 seconds to receive a startup packet. If the hypothesized probing program sends nothing, or just a small number of bytes, and then sits rather than closing the connection, then this state would easily persist long enough to be observable in ps. If you're not sure where these probes are coming from, turning on log_connections should help: the "connection received" message comes out before waiting for the startup packet. regards, tom lane
Thanks Tom,
So is it normal for postgres to fork out new postmaster processes from the same data directory? I haven't seen this earlier.
I will check from where those connection requests are coming in,
Best Regards
Vikas
On Feb 13, 2018 15:50, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> Vikas Sharma wrote:
>> On the master I can see multiple postmaster processes from the same data directory.
>> ps -ef |grep -i postgres|grep postm
>> postgres 81440 1 0 Jan31 ? 00:11:37 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
>> postgres 97072 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
>> postgres 97074 81440 0 12:17 ? 00:00:00 /usr/pgsql-9.4/bin/postmaster -D /var/lib/pgsql/9.4/data
> The two other processes are children of the postmaster.
> It is strange that their process title did not get updated.
Seeing that they're showing zero runtime, I bet that these are just-forked
children that have not had time to change their process title yet.
The thing that is strange is that you have a steady enough flow of new
connections that there are usually some children like that.
> The "incomplete startup packet" is caused by processes that connect to the
> PostgreSQL TCP port, but don't complete a database connection.
> Often these are monitoring or load balancing programs.
Putting two and two together, you have some monitoring program that is
hitting the postmaster with a constant stream of TCP connection requests
none of which get completed, resulting in a whole lot of useless fork
activity. Dial down the monitoring.
regards, tom lane
Vikas Sharma <shavikas@gmail.com> writes: > So is it normal for postgres to fork out new postmaster processes from the > same data directory? I haven't seen this earlier. They're not postmasters, they're child processes, as you can easily tell from the PID/PPID columns of your ps output. But a process inherits its title from the parent at fork(), and per this discussion, they haven't changed it yet. regards, tom lane