Thread: BUG #8701: recover process hang on slave

BUG #8701: recover process hang on slave

From
amutu@amutu.com
Date:
The following bug has been logged on the website:

Bug reference:      8701
Logged by:          amutu
Email address:      amutu@amutu.com
PostgreSQL version: 9.1.9
Operating system:   CentOS 6 x86-64
Description:

we have a master and two streaming salve pg.we find One of the slave
replay_location is far behand the other.


both sent_location is BF1/921F6000;the write_location and flush_location is
similar;but one of the server replay_location is BF1/9210DD10,the oter is
6DE/D958E8.


on the abnormal server,top show that a postgres process replay the
00000001000006DE00000000 WAL,and the process take up 100% usage of the cpu
core.


I try to restart the salve,but failed.
I get the core of the process,it shows:


Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `postgres: startup process   recovering
00000001000006DE00000000'.
#0  0x00000000006264e8 in smgrclose ()
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.49.tl1.x86_64
(gdb) bt
#0  0x00000000006264e8 in smgrclose ()
#1  0x00000000006265c8 in smgrcloseall ()
#2  0x0000000000495322 in XLogDropDatabase ()
#3  0x0000000000516253 in dbase_redo ()
#4  0x0000000000492d40 in StartupXLOG ()
#5  0x0000000000495148 in StartupProcessMain ()
#6  0x00000000004ac26f in AuxiliaryProcessMain ()
#7  0x00000000005eb383 in StartChildProcess ()
#8  0x00000000005ef3dc in PostmasterMain ()
#9  0x0000000000590fe8 in main ()

Re: BUG #8701: recover process hang on slave

From
Alvaro Herrera
Date:
amutu@amutu.com wrote:

> we have a master and two streaming salve pg.we find One of the slave
> replay_location is far behand the other.
>
>
> both sent_location is BF1/921F6000;the write_location and flush_location is
> similar;but one of the server replay_location is BF1/9210DD10,the oter is
> 6DE/D958E8.
>
> on the abnormal server,top show that a postgres process replay the
> 00000001000006DE00000000 WAL,and the process take up 100% usage of the cpu
> core.

Perhaps you can try to pg_xlogdump the offending pg_xlog file?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8701: recover process hang on slave

From
Sergey Konoplev
Date:
Hi,

On Wed, Dec 25, 2013 at 6:47 PM,  <amutu@amutu.com> wrote:
> PostgreSQL version: 9.1.9

In the last minor release 9.1.11 there were a bunch of important fixes
affecting the replication process, so I would suggest you to upgrade
first and ASAP.

http://www.postgresql.org/docs/current/static/release-9-1-11.html
http://www.databasesoup.com/2013/12/why-you-need-to-apply-todays-update.htm=
l

> I try to restart the salve=EF=BC=8Cbut failed.

If it wont help then look at the thread below. It might be the same case.

http://www.postgresql.org/message-id/flat/E1VtTni-00082E-Jv@wrigleys.postgr=
esql.org

--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com