Thread: Standby Server Bus 7 error
Hi,
Any assistance in interpreting the logs is much appreciated.
Replication server crashes. Below is snippet from log.
2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr
imary at 12/B0000000 on timeline 1
2014-07-22 23:43:12 EDT FATAL: could not receive data from
WAL stream: server closed the connection unexpectedly
This probably means the server terminated a
bnormally
before or while processing the request.
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D0': No such file or directory
2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
00 in log segment 0000000100000012000000D0, offset 0
2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
000012000000D0" from archive
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D1': No such file or directory
2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
00 in log segment 0000000100000012000000D1, offset 0
2014-07-22 23:43:28 EDT LOG: started streaming WAL from primary at 12/D1000000 on timeline 1
2014-07-22 23:53:13 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
cp: cannot stat '/var/lib/postgresql/archive/0000000100000012000000F1': No such file or directory
2014-07-22 23:53:13 EDT LOG: unexpected pageaddr 12/AA000000 in log segment 0000000100000012000000F1, offset 0
2014-07-22 23:53:13 EDT LOG: started streaming WAL from primary at 12/F1000000 on timeline 1
Upon trying a /etc/init.d/postgresql restart
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D" from archive
2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at 1B/1DFC64C0
2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only connections
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E" from archive
2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error
2014-07-23 14:47:56 EDT LOG: terminating any other acti server processes
Recovery.conf
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
standby_mode = 'on'
primary_conninfo = 'host=[ipaddress] port=5432 user=sherweb_standby_server sslmode=require'
archive_cleanup_command = '/usr/lib/postgresql/9.3/bin/pg_archivecleanup /var/lib/postgresql/9.3/archive/ %r'
On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote: > Any assistance in interpreting the logs is much appreciated. > Replication server crashes. Below is snippet from log. > 2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr > imary at 12/B0000000 on timeline 1 > 2014-07-22 23:43:12 EDT FATAL: could not receive data from > WAL stream: server closed the connection unexpectedly > This probably means the server terminated a > bnormally > before or while processing the request. The replication stream has finished. Did you root server crash in some way? > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D0': No such file or directory > 2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000 > 00 in log segment 0000000100000012000000D0, offset 0 This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master? > 2014-07-22 23:43:28 EDT LOG: restored log file "0000000100 > 000012000000D0" from archive > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D1': No such file or directory > 2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000 > 00 in log segment 0000000100000012000000D1, offset 0 Same here. > Upon trying a /etc/init.d/postgresql restart > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D" > from archive > 2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at > 1B/1DFC64C0 > 2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only > connections > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E" > from archive > 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by > signal 7: Bus error This may find its root to a hardware problem... What is the version of 9.3 you are using btw? -- Michael
Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> >> 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error > This may find its root to a hardware problem... ... or a VM bug, which is the most frequent cause I have seen. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-----Original Message----- From: Michael Paquier [mailto:michael.paquier@gmail.com] Sent: Wednesday, July 23, 2014 10:25 PM To: Fabio Milano Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Standby Server Bus 7 error On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote: > Any assistance in interpreting the logs is much appreciated. > Replication server crashes. Below is snippet from log. > 2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr imary at > 12/B0000000 on timeline 1 > 2014-07-22 23:43:12 EDT FATAL: could not receive data from WAL > stream: server closed the connection unexpectedly > This probably means the server terminated a bnormally > before or while processing the request. The replication stream has finished. Did you root server crash in some way? > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D0': No such file or directory > 2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000 > 00 in log segment 0000000100000012000000D0, offset 0 This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycledand is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master? > 2014-07-22 23:43:28 EDT LOG: restored log file "0000000100 > 000012000000D0" from archive > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D1': No such file or directory > 2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000 > 00 in log segment 0000000100000012000000D1, offset 0 Same here. > Upon trying a /etc/init.d/postgresql restart > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D" > from archive > 2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at > 1B/1DFC64C0 > 2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only > connections > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E" > from archive > 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by > signal 7: Bus error This may find its root to a hardware problem... What is the version of 9.3 you are using btw? -- Michael Thank you Michael, No report of crashing on the Master. >> This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycledand is not yet overwritten with new data. Could this be caused by something happening by the developers on the Master? The Archive command on the master is: 'rsync -az %p postgres@[ipaddress]:archive/%f' >> This may find its root to a hardware problem... The standby server is a VPS. Previously 9.0.13 replication was working . 9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed Master new install OS Ubuntu 14.04 and new postgres 9.3.4 install Version is: psql (PostgreSQL) 9.3.4 Performed a manual resync yesterday however another crash this morning. Standby - /etc/init.d/postgresql stop Master - psql -c "select pg_start_backup('label', true)" Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip address]:/var/lib/postgresql/9.3/main/ Master - psql -c "SELECT pg_stop_backup()" Standby - /etc/init.d/postgresql start Another crash. 2014-07-24 09:58:31 EDT LOG: restored log file "000000010000003300000096" from archive cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory 2014-07-24 09:58:31 EDT LOG: unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0 2014-07-24 09:58:32 EDT LOG: started streaming WAL from primary at 33/97000000 on timeline 1 2014-07-24 10:03:50 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory 2014-07-24 10:03:50 EDT LOG: unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0 2014-07-24 10:04:06 EDT LOG: restored log file "0000000100000033000000B7" from archive cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory 2014-07-24 10:04:06 EDT LOG: unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0 2014-07-24 10:04:06 EDT LOG: started streaming WAL from primary at 33/B8000000 on timeline 1 2014-07-24 10:06:24 EDT LOG: startup process (PID 18905) was terminated by signal 7: Bus error 2014-07-24 10:06:24 EDT LOG: terminating any other active server processes /etc/init.d/postgresql restart 2014-07-24 15:19:55 EDT LOG: restored log file "000000010000003400000009" from archive 2014-07-24 15:19:57 EDT LOG: startup process (PID 22458) was terminated by signal 7: Bus error 2014-07-24 15:19:57 EDT LOG: terminating any other active server processes
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Fabio Milano Sent: Thursday, July 24, 2014 12:40 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Standby Server Bus 7 error -----Original Message----- From: Michael Paquier [mailto:michael.paquier@gmail.com] Sent: Wednesday, July 23, 2014 10:25 PM To: Fabio Milano Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Standby Server Bus 7 error On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote: > Any assistance in interpreting the logs is much appreciated. > Replication server crashes. Below is snippet from log. > 2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr imary at > 12/B0000000 on timeline 1 > 2014-07-22 23:43:12 EDT FATAL: could not receive data from WAL > stream: server closed the connection unexpectedly > This probably means the server terminated a bnormally > before or while processing the request. The replication stream has finished. Did you root server crash in some way? > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D0': No such file or directory > 2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000 > 00 in log segment 0000000100000012000000D0, offset 0 This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycledand is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master? > 2014-07-22 23:43:28 EDT LOG: restored log file "0000000100 > 000012000000D0" from archive > cp: cannot stat '/var/lib/postgresql/archive/00000001000000 > 12000000D1': No such file or directory > 2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000 > 00 in log segment 0000000100000012000000D1, offset 0 Same here. > Upon trying a /etc/init.d/postgresql restart > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D" > from archive > 2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at > 1B/1DFC64C0 > 2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only > connections > 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E" > from archive > 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by > signal 7: Bus error This may find its root to a hardware problem... What is the version of 9.3 you are using btw? -- Michael Thank you Michael, No report of crashing on the Master. >> This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycledand is not yet overwritten with new data. Could this be caused by something happening by the developers on the Master? The Archive command on the master is: 'rsync -az %p postgres@[ipaddress]:archive/%f' >> This may find its root to a hardware problem... The standby server is a VPS. Previously 9.0.13 replication was working . 9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed Master new install OS Ubuntu 14.04 and new postgres9.3.4 install Version is: psql (PostgreSQL) 9.3.4 Performed a manual resync yesterday however another crash this morning. Standby - /etc/init.d/postgresql stop Master - psql -c "select pg_start_backup('label', true)" Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip address]:/var/lib/postgresql/9.3/main/ Master - psql -c "SELECT pg_stop_backup()" Standby - /etc/init.d/postgresql start Another crash. 2014-07-24 09:58:31 EDT LOG: restored log file "000000010000003300000096" from archive cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory 2014-07-24 09:58:31 EDT LOG: unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0 2014-07-24 09:58:32 EDT LOG: started streaming WAL from primary at 33/97000000 on timeline 1 2014-07-24 10:03:50 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory 2014-07-24 10:03:50 EDT LOG: unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0 2014-07-24 10:04:06 EDT LOG: restored log file "0000000100000033000000B7" from archive cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory 2014-07-24 10:04:06 EDT LOG: unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0 2014-07-24 10:04:06 EDT LOG: started streaming WAL from primary at 33/B8000000 on timeline 1 2014-07-24 10:06:24 EDT LOG: startup process (PID 18905) was terminated by signal 7: Bus error 2014-07-24 10:06:24 EDT LOG: terminating any other active server processes /etc/init.d/postgresql restart 2014-07-24 15:19:55 EDT LOG: restored log file "000000010000003400000009" from archive 2014-07-24 15:19:57 EDT LOG: startup process (PID 22458) was terminated by signal 7: Bus error 2014-07-24 15:19:57 EDT LOG: terminating any other active server processes -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general >> Confirmed with Developers: >> We haven't done any rollbacks, and haven't seen any crashes.
On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Michael Paquier <michael.paquier@gmail.com> wrote: > >> On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> > >>> 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error >> This may find its root to a hardware problem... > > ... or a VM bug, which is the most frequent cause I have seen. yeah, in particular, with OpenVZ. Is this server virtualized? merlin
Yes. The Standby/Slave server is a VPS. Virtuozzo. -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Monday, July 28, 2014 9:47 AM To: Kevin Grittner Cc: Michael Paquier; Fabio Milano; pgsql-general@postgresql.org Subject: Re: [GENERAL] Standby Server Bus 7 error On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Michael Paquier <michael.paquier@gmail.com> wrote: > >> On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> > >>> 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error >> This may find its root to a hardware problem... > > ... or a VM bug, which is the most frequent cause I have seen. yeah, in particular, with OpenVZ. Is this server virtualized? merlin
On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote: > Yes. > > The Standby/Slave server is a VPS. Virtuozzo. Virtuozzo is based on OpenVZ. So it's time to look for OpenVZ bus errors and what people did to get around them. In at least one case reducing shared buffers fixed the problem. merlin
Thanks you! -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Monday, July 28, 2014 9:54 AM To: Fabio Milano Cc: Kevin Grittner; Michael Paquier; pgsql-general@postgresql.org Subject: Re: [GENERAL] Standby Server Bus 7 error On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote: > Yes. > > The Standby/Slave server is a VPS. Virtuozzo. Virtuozzo is based on OpenVZ. So it's time to look for OpenVZ bus errors and what people did to get around them. In at leastone case reducing shared buffers fixed the problem. merlin