Thread: Standby Server Bus 7 error

Standby Server Bus 7 error

From
Fabio Milano
Date:

Hi,

 

Any assistance in interpreting the logs is much appreciated.

 

Replication server crashes. Below is snippet from log.

 

2014-07-22 23:36:23 EDT LOG:  started streaming WAL from pr

imary at 12/B0000000 on timeline 1

2014-07-22 23:43:12 EDT FATAL:  could not receive data from

WAL stream: server closed the connection unexpectedly

                This probably means the server terminated a

bnormally

                before or while processing the request.

 

cp: cannot stat '/var/lib/postgresql/archive/00000001000000

12000000D0': No such file or directory

2014-07-22 23:43:12 EDT LOG:  unexpected pageaddr 12/8D0000

00 in log segment 0000000100000012000000D0, offset 0

2014-07-22 23:43:28 EDT LOG:  restored log file "0000000100

000012000000D0" from archive

cp: cannot stat '/var/lib/postgresql/archive/00000001000000

12000000D1': No such file or directory

2014-07-22 23:43:28 EDT LOG:  unexpected pageaddr 12/A50000

00 in log segment 0000000100000012000000D1, offset 0

2014-07-22 23:43:28 EDT LOG:  started streaming WAL from primary at 12/D1000000 on timeline 1

2014-07-22 23:53:13 EDT FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly

                This probably means the server terminated abnormally

                before or while processing the request.

 

cp: cannot stat '/var/lib/postgresql/archive/0000000100000012000000F1': No such file or directory

2014-07-22 23:53:13 EDT LOG:  unexpected pageaddr 12/AA000000 in log segment 0000000100000012000000F1, offset 0

2014-07-22 23:53:13 EDT LOG:  started streaming WAL from primary at 12/F1000000 on timeline 1

 

 

 

 

Upon trying a /etc/init.d/postgresql restart

 

2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001D" from archive

2014-07-23 14:47:55 EDT LOG:  consistent recovery state    ched at 1B/1DFC64C0

2014-07-23 14:47:55 EDT LOG:  database system is ready t   ccept read only connections

2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001E" from archive

2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by signal 7: Bus error

2014-07-23 14:47:56 EDT LOG:  terminating any other acti   server processes

 

Recovery.conf

 

restore_command = 'cp /var/lib/postgresql/archive/%f %p'

standby_mode = 'on'

primary_conninfo = 'host=[ipaddress] port=5432 user=sherweb_standby_server sslmode=require'

archive_cleanup_command = '/usr/lib/postgresql/9.3/bin/pg_archivecleanup /var/lib/postgresql/9.3/archive/ %r'

Re: Standby Server Bus 7 error

From
Michael Paquier
Date:
On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:
> Any assistance in interpreting the logs is much appreciated.
> Replication server crashes. Below is snippet from log.
> 2014-07-22 23:36:23 EDT LOG:  started streaming WAL from pr
> imary at 12/B0000000 on timeline 1
> 2014-07-22 23:43:12 EDT FATAL:  could not receive data from
> WAL stream: server closed the connection unexpectedly
>                 This probably means the server terminated a
> bnormally
>                 before or while processing the request.
The replication stream has finished. Did you root server crash in some way?

> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D0': No such file or directory
> 2014-07-22 23:43:12 EDT LOG:  unexpected pageaddr 12/8D0000
> 00 in log segment 0000000100000012000000D0, offset 0
This means that server expected new WAL, but found instead an old
record, something that happens when a WAL file is recycled and is not
yet overwritten with new data. Did your archives get corrupted in some
way by a crash of master?

> 2014-07-22 23:43:28 EDT LOG:  restored log file "0000000100
> 000012000000D0" from archive
> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D1': No such file or directory
> 2014-07-22 23:43:28 EDT LOG:  unexpected pageaddr 12/A50000
> 00 in log segment 0000000100000012000000D1, offset 0
Same here.

> Upon trying a /etc/init.d/postgresql restart
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001D"
> from archive
> 2014-07-23 14:47:55 EDT LOG:  consistent recovery state    ched at
> 1B/1DFC64C0
> 2014-07-23 14:47:55 EDT LOG:  database system is ready t   ccept read only
> connections
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001E"
> from archive
> 2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by
> signal 7: Bus error
This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael


Re: Standby Server Bus 7 error

From
Kevin Grittner
Date:
Michael Paquier <michael.paquier@gmail.com> wrote:

> On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>

>>  2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by  signal 7: Bus error
> This may find its root to a hardware problem...

... or a VM bug, which is the most frequent cause I have seen.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Standby Server Bus 7 error

From
Fabio Milano
Date:

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com] 
Sent: Wednesday, July 23, 2014 10:25 PM
To: Fabio Milano
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:
> Any assistance in interpreting the logs is much appreciated.
> Replication server crashes. Below is snippet from log.
> 2014-07-22 23:36:23 EDT LOG:  started streaming WAL from pr imary at 
> 12/B0000000 on timeline 1
> 2014-07-22 23:43:12 EDT FATAL:  could not receive data from WAL 
> stream: server closed the connection unexpectedly
>                 This probably means the server terminated a bnormally
>                 before or while processing the request.
The replication stream has finished. Did you root server crash in some way?

> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D0': No such file or directory
> 2014-07-22 23:43:12 EDT LOG:  unexpected pageaddr 12/8D0000
> 00 in log segment 0000000100000012000000D0, offset 0
This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is
recycledand is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master?
 

> 2014-07-22 23:43:28 EDT LOG:  restored log file "0000000100 
> 000012000000D0" from archive
> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D1': No such file or directory
> 2014-07-22 23:43:28 EDT LOG:  unexpected pageaddr 12/A50000
> 00 in log segment 0000000100000012000000D1, offset 0
Same here.

> Upon trying a /etc/init.d/postgresql restart
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001D"
> from archive
> 2014-07-23 14:47:55 EDT LOG:  consistent recovery state    ched at
> 1B/1DFC64C0
> 2014-07-23 14:47:55 EDT LOG:  database system is ready t   ccept read only
> connections
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001E"
> from archive
> 2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by
> signal 7: Bus error
This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

Thank you Michael,

No report of crashing on the Master.

>> This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is
recycledand is not yet overwritten with new data.
 

Could this be caused by something happening by the developers on the Master?

 The Archive command on the master is: 
'rsync -az %p postgres@[ipaddress]:archive/%f'


>> This may find its root to a hardware problem...
The standby server is a VPS. Previously 9.0.13 replication was working . 
9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed
Master new install OS Ubuntu 14.04 and new postgres 9.3.4 install

Version is:

psql (PostgreSQL) 9.3.4

Performed a manual resync yesterday however another crash this morning.

Standby - /etc/init.d/postgresql stop
Master - psql -c "select pg_start_backup('label', true)"
Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip
address]:/var/lib/postgresql/9.3/main/
Master - psql -c "SELECT pg_stop_backup()"
Standby - /etc/init.d/postgresql start

Another crash.

2014-07-24 09:58:31 EDT LOG:  restored log file "000000010000003300000096" from archive
cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory
2014-07-24 09:58:31 EDT LOG:  unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0
2014-07-24 09:58:32 EDT LOG:  started streaming WAL from primary at 33/97000000 on timeline 1
2014-07-24 10:03:50 EDT FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory
2014-07-24 10:03:50 EDT LOG:  unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0
2014-07-24 10:04:06 EDT LOG:  restored log file "0000000100000033000000B7" from archive
cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory
2014-07-24 10:04:06 EDT LOG:  unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0
2014-07-24 10:04:06 EDT LOG:  started streaming WAL from primary at 33/B8000000 on timeline 1
2014-07-24 10:06:24 EDT LOG:  startup process (PID 18905) was terminated by signal 7: Bus error
2014-07-24 10:06:24 EDT LOG:  terminating any other active server processes

/etc/init.d/postgresql restart

2014-07-24 15:19:55 EDT LOG:  restored log file "000000010000003400000009" from archive
2014-07-24 15:19:57 EDT LOG:  startup process (PID 22458) was terminated by signal 7: Bus error
2014-07-24 15:19:57 EDT LOG:  terminating any other active server processes

Re: Standby Server Bus 7 error

From
Fabio Milano
Date:

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Fabio Milano
Sent: Thursday, July 24, 2014 12:40 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error



-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Wednesday, July 23, 2014 10:25 PM
To: Fabio Milano
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:
> Any assistance in interpreting the logs is much appreciated.
> Replication server crashes. Below is snippet from log.
> 2014-07-22 23:36:23 EDT LOG:  started streaming WAL from pr imary at
> 12/B0000000 on timeline 1
> 2014-07-22 23:43:12 EDT FATAL:  could not receive data from WAL
> stream: server closed the connection unexpectedly
>                 This probably means the server terminated a bnormally
>                 before or while processing the request.
The replication stream has finished. Did you root server crash in some way?

> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D0': No such file or directory
> 2014-07-22 23:43:12 EDT LOG:  unexpected pageaddr 12/8D0000
> 00 in log segment 0000000100000012000000D0, offset 0
This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is
recycledand is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master?
 

> 2014-07-22 23:43:28 EDT LOG:  restored log file "0000000100 
> 000012000000D0" from archive
> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D1': No such file or directory
> 2014-07-22 23:43:28 EDT LOG:  unexpected pageaddr 12/A50000
> 00 in log segment 0000000100000012000000D1, offset 0
Same here.

> Upon trying a /etc/init.d/postgresql restart
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001D"
> from archive
> 2014-07-23 14:47:55 EDT LOG:  consistent recovery state    ched at
> 1B/1DFC64C0
> 2014-07-23 14:47:55 EDT LOG:  database system is ready t   ccept read only
> connections
> 2014-07-23 14:47:55 EDT LOG:  restored log file "0000000   00001B0000001E"
> from archive
> 2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by
> signal 7: Bus error
This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

Thank you Michael,

No report of crashing on the Master.

>> This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is
recycledand is not yet overwritten with new data.
 

Could this be caused by something happening by the developers on the Master?

 The Archive command on the master is: 
'rsync -az %p postgres@[ipaddress]:archive/%f'


>> This may find its root to a hardware problem...
The standby server is a VPS. Previously 9.0.13 replication was working . 
9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed Master new install OS Ubuntu 14.04 and new
postgres9.3.4 install
 

Version is:

psql (PostgreSQL) 9.3.4

Performed a manual resync yesterday however another crash this morning.

Standby - /etc/init.d/postgresql stop
Master - psql -c "select pg_start_backup('label', true)"
Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip
address]:/var/lib/postgresql/9.3/main/
Master - psql -c "SELECT pg_stop_backup()"
Standby - /etc/init.d/postgresql start

Another crash.

2014-07-24 09:58:31 EDT LOG:  restored log file "000000010000003300000096" from archive
cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory
2014-07-24 09:58:31 EDT LOG:  unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0
2014-07-24 09:58:32 EDT LOG:  started streaming WAL from primary at 33/97000000 on timeline 1
2014-07-24 10:03:50 EDT FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory
2014-07-24 10:03:50 EDT LOG:  unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0
2014-07-24 10:04:06 EDT LOG:  restored log file "0000000100000033000000B7" from archive
cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory
2014-07-24 10:04:06 EDT LOG:  unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0
2014-07-24 10:04:06 EDT LOG:  started streaming WAL from primary at 33/B8000000 on timeline 1
2014-07-24 10:06:24 EDT LOG:  startup process (PID 18905) was terminated by signal 7: Bus error
2014-07-24 10:06:24 EDT LOG:  terminating any other active server processes

/etc/init.d/postgresql restart

2014-07-24 15:19:55 EDT LOG:  restored log file "000000010000003400000009" from archive
2014-07-24 15:19:57 EDT LOG:  startup process (PID 22458) was terminated by signal 7: Bus error
2014-07-24 15:19:57 EDT LOG:  terminating any other active server processes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

>> Confirmed with Developers:

>> We haven't done any rollbacks, and haven't seen any crashes.

Re: Standby Server Bus 7 error

From
Merlin Moncure
Date:
On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Michael Paquier <michael.paquier@gmail.com> wrote:
>
>> On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>
>
>>>  2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by  signal 7: Bus error
>> This may find its root to a hardware problem...
>
> ... or a VM bug, which is the most frequent cause I have seen.

yeah, in particular, with OpenVZ.  Is this server virtualized?

merlin


Re: Standby Server Bus 7 error

From
Fabio Milano
Date:
Yes.

The Standby/Slave server is a VPS. Virtuozzo.

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Monday, July 28, 2014 9:47 AM
To: Kevin Grittner
Cc: Michael Paquier; Fabio Milano; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Michael Paquier <michael.paquier@gmail.com> wrote:
>
>> On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>
>
>>>  2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)     terminated by  signal 7: Bus error
>> This may find its root to a hardware problem...
>
> ... or a VM bug, which is the most frequent cause I have seen.

yeah, in particular, with OpenVZ.  Is this server virtualized?

merlin

Re: Standby Server Bus 7 error

From
Merlin Moncure
Date:
On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote:
> Yes.
>
> The Standby/Slave server is a VPS. Virtuozzo.

Virtuozzo is based on OpenVZ.  So it's time to look for OpenVZ bus
errors and what people did to get around them. In at least one case
reducing shared buffers fixed the problem.

merlin


Re: Standby Server Bus 7 error

From
Fabio Milano
Date:
Thanks you!

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Monday, July 28, 2014 9:54 AM
To: Fabio Milano
Cc: Kevin Grittner; Michael Paquier; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote:
> Yes.
>
> The Standby/Slave server is a VPS. Virtuozzo.

Virtuozzo is based on OpenVZ.  So it's time to look for OpenVZ bus errors and what people did to get around them. In at
leastone case reducing shared buffers fixed the problem.
 

merlin