Thread: postgresql systemd service fails to start only on boot but notmanually
Hello everyone, I'm having trouble with my PostgreSQL Systemd service and I wonder whether this is a problem with the software or some problem with the packaging or build. My server fails to start PostgreSQL only on boot, if I restart it manually afterwards it doesn't have any problem starting. Here is the log extracted from the journal: ``` 2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432 2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" 2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST 2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled 2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled 2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections 2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request 2018-09-21 20:49:10.497 CEST [306] LOG: aborting any active transactions 2018-09-21 20:49:10.498 CEST [637] FATAL: terminating autovacuum process due to administrator command 2018-09-21 20:49:10.500 CEST [306] LOG: worker process: logical replication launcher (PID 636) exited with code 1 2018-09-21 20:49:10.501 CEST [349] LOG: shutting down 2018-09-21 20:49:10.518 CEST [306] LOG: database system is shut down ``` Any Ideas? I'd really like to fix this since now after every reboot I have to remember to manually restart it. Thanks.
On 9/22/18 3:44 AM, Doron Behar wrote: > Hello everyone, > > I'm having trouble with my PostgreSQL Systemd service and I wonder > whether this is a problem with the software or some problem with the > packaging or build. > > My server fails to start PostgreSQL only on boot, if I restart it > manually afterwards it doesn't have any problem starting. Here is the > log extracted from the journal: > > ``` > 2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432 > 2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" > 2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST > 2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled > 2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled > 2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connection > 2018-09-21 20:49:10.486 CEST [306]LOG: received fast shutdown request > 2018-09-21 20:49:10.497 CEST [306] LOG: aborting any active transactions > 2018-09-21 20:49:10.498 CEST [637] FATAL: terminating autovacuum process due to administrator command > 2018-09-21 20:49:10.500 CEST [306] LOG: worker process: logical replication launcher (PID 636) exited with code 1 > 2018-09-21 20:49:10.501 CEST [349] LOG: shutting down > 2018-09-21 20:49:10.518 CEST [306] LOG: database system is shut down > ``` > > Any Ideas? I'd really like to fix this since now after every reboot I have to > remember to manually restart it. Linux distro and version? Assuming Postgres version 10+ given logical replication warning. Still actual version would be nice. How did you install Postgres? Where did systemd script come from? What is in the systemd script? What does the log show when you do a successful manual start? What does the system log show when the Postgres reboot startup fails? > > Thanks. > > -- Adrian Klaver adrian.klaver@aklaver.com
Re: postgresql systemd service fails to start only on boot but notmanually
From
Christoph Moench-Tegeder
Date:
## Doron Behar (doron.behar@gmail.com): > My server fails to start PostgreSQL only on boot, if I restart it > manually afterwards it doesn't have any problem starting. Here is the > log extracted from the journal: > > ``` > 2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432 > 2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" > 2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST > 2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled > 2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled This would indicate that your machine is overloaded during start - perhaps there's just too much being started at the same time? ObRant: that's what happens if people take "system startup duration" as a benchmark and optimize for that - sure, running one clumsy shell script after another isn't effective usage of today's systems, but starting eight dozens programs all at once may have other side effects. Really, with the hardware taking small ages to find it's own arse before even loading the boot loader, those few seconds weren't worth optimizing - and if people reboot their computers so often that startup time takes a measurable toll on their productive day, perhaps they should rather spend their time thinking about their usage pattern than "optimizing" the startup process. So, now that I've got that off my chest... your machine propably tries to do too much at the same time when booting: the worker processes take longer than 90 seconds to start. Slow CPU or storage maybe? > 2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections > 2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request And in the mean time, systemd has lost it's patience, declares the start as failed and terminates the process group. (The default systemd timeout is 90 seconds, at least in some releases of systemd, so this fits quite nicely). You could try to work around this by increasing TimeoutStartSec in postgresql's systemd unit (or even globally), which perhaps only hides the problem until the next service suddenly doesn't start anymore. You could move postgresql to the end of the boot order by adding "After=..." to the Unit section of the systemd service file, the value behind "After=" being all the other services in the same target, which should reduce parallelism and improve PostgreSQL's startup behaviour. A more advanced variant of that would be to create a new systemd target, make that start "After" multiuser.target or even graphical.target (depending on your setup), make sure it "Requires" the current default systemd target and make postgresql the only additional service in that target. (This would be the cleanest solution, but you should get some grasp of systemd and how your specific distribution uses it before meddling with the default targets; I don't know every distribution/version variant of systemd integration, so I can't give that specific instructions here). Or you figure out what the heck your machine is running during startup any why it is that slow, and try to fix that. Regards, Christoph -- Spare Space
On Sat, Sep 22, 2018 at 07:14:33AM -0700, Adrian Klaver wrote: > > Linux distro and version? Arch Linux $ uname -a Linux vps 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 UTC 2018 x86_64 GNU/Linux It's a VPS with one CPU core on it and 2G RAM - not very much I know, hosted on OVH > > Assuming Postgres version 10+ given logical replication warning. Still > actual version would be nice. > > How did you install Postgres? With the package manager and the files used in it's build are viewable from here: https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/postgresql > > Where did systemd script come from? > > What is in the systemd script? The systemd service was installed with the package [Unit] Description=PostgreSQL database server After=network.target [Service] Type=notify TimeoutSec=120 User=postgres Group=postgres Environment=PGROOT=/var/lib/postgres SyslogIdentifier=postgres PIDFile=/var/lib/postgres/data/postmaster.pid RuntimeDirectory=postgresql RuntimeDirectoryMode=755 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data ExecStart=/usr/bin/postgres -D ${PGROOT}/data ExecReload=/bin/kill -HUP ${MAINPID} KillMode=mixed KillSignal=SIGINT # Due to PostgreSQL's use of shared memory, OOM killer is often overzealous in # killing Postgres, so adjust it downward OOMScoreAdjust=-200 # Additional security-related features PrivateTmp=true ProtectHome=true ProtectSystem=full NoNewPrivileges=true [Install] WantedBy=multi-user.target > > What does the log show when you do a successful manual start? 2018-09-22 09:38:44.470 CEST [15251] LOG: listening on IPv4 address "127.0.0.1", port 5432 2018-09-22 09:38:44.472 CEST [15251] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" 2018-09-22 09:38:44.485 CEST [15253] LOG: database system was shut down at 2018-09-21 20:49:10 CEST 2018-09-22 09:38:44.490 CEST [15251] LOG: database system is ready to accept connections > > What does the system log show when the Postgres reboot startup fails? This was posted in my 1st message.
On Sat, Sep 22, 2018 at 04:58:18PM +0200, Christoph Moench-Tegeder wrote: > ## Doron Behar (doron.behar@gmail.com): > > > My server fails to start PostgreSQL only on boot, if I restart it > > manually afterwards it doesn't have any problem starting. Here is the > > log extracted from the journal: > > > > ``` > > 2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432 > > 2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" > > 2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST > > 2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled > > 2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled > > This would indicate that your machine is overloaded during start - > perhaps there's just too much being started at the same time? > ObRant: that's what happens if people take "system startup duration" > as a benchmark and optimize for that - sure, running one clumsy shell > script after another isn't effective usage of today's systems, > but starting eight dozens programs all at once may have other > side effects. Really, with the hardware taking small ages to find > it's own arse before even loading the boot loader, those few seconds > weren't worth optimizing - and if people reboot their computers so > often that startup time takes a measurable toll on their productive > day, perhaps they should rather spend their time thinking about their > usage pattern than "optimizing" the startup process. > > So, now that I've got that off my chest... your machine propably tries to > do too much at the same time when booting: the worker processes take > longer than 90 seconds to start. Slow CPU or storage maybe? > > > 2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections > > 2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request > > And in the mean time, systemd has lost it's patience, declares the > start as failed and terminates the process group. (The default systemd > timeout is 90 seconds, at least in some releases of systemd, so > this fits quite nicely). > > You could try to work around this by increasing TimeoutStartSec > in postgresql's systemd unit (or even globally), which perhaps > only hides the problem until the next service suddenly doesn't > start anymore. > You could move postgresql to the end of the boot order by > adding "After=..." to the Unit section of the systemd service > file, the value behind "After=" being all the other services in > the same target, which should reduce parallelism and improve > PostgreSQL's startup behaviour. > A more advanced variant of that would be to create a new > systemd target, make that start "After" multiuser.target > or even graphical.target (depending on your setup), make sure > it "Requires" the current default systemd target and make > postgresql the only additional service in that target. > (This would be the cleanest solution, but you should get some > grasp of systemd and how your specific distribution uses it > before meddling with the default targets; I don't know every > distribution/version variant of systemd integration, so I > can't give that specific instructions here). > Or you figure out what the heck your machine is running > during startup any why it is that slow, and try to fix that. > > Regards, > Christoph Thanks for your very detailed answer, that helped me a lot. I've increased `TimeoutSec=` to infinity in the systemd service since it was set initially to 120 seconds which apparently wasn't enough for my poor VPS with 2G RAM and 1 CPU core. That worked great, I still feel like I have slow startups but at least PostgreSQL doesn't totally fail to start on boot. I'll try to debug the slow startups on my own, thanks again for everything! Doron.