Thread: Replication terminated due to PANIC
Hi all,
I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i setup a hot standby by using pgbasebackup. Today i got the below alert from standby box :[1] (from line 412,723) 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC: _bt_restore_page: cannot add item to page
When i check, the replication is terminated due to slave DB shutdown. From the logs i can see below messages :-
2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR: could not open file "global/14078": No such file or directory
2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT: writing block 0 of relation global/14078
2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING: could not write block 0 of global/14078
2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL: Multiple failures --- write error might be permanent.
2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR: could not open file "global/14078": No such file or directory
2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT: writing block 0 of relation global/14078
2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING: could not write block 0 of global/14078
2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL: Multiple failures --- write error might be permanent.
I checked in global directory of master, the directory 14078 doesn't exist.
Anyone has faced above issue ?On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote: > I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i > setup a hot standby by using pgbasebackup. Today i got the below alert from > standby box : > > [1] (from line 412,723) > 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC: > _bt_restore_page: cannot add item to page > > When i check, the replication is terminated due to slave DB shutdown. From > the logs i can see below messages :- I am not sure that it is your situation but take a look at this thread: http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A@mail.gmail.com There is a patch by Andres Freund in the end of the discussion. Three weeks have passed after I installed the patched version and it looks like the patch fixed my issue. > > 2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR: could > not open file "global/14078": No such file or directory > 2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT: > writing block 0 of relation global/14078 > 2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING: could > not write block 0 of global/14078 > 2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL: > Multiple failures --- write error might be permanent. > > I checked in global directory of master, the directory 14078 doesn't exist. > > Anyone has faced above issue ? > > Thanks -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Thanks Sergey for such a quick response, but i dont think this is some patch problem because we have other DB servers also running fine on same version and message is also different :
host= PANIC: _bt_restore_page: cannot add item to page
And the whole day replication is working fine but at midnight when log rotates it shows belows msg :
2013-04-24 00:00:00 UTC [26989]: [4945032-1] user= db= host= LOG: checkpoint starting: time
2013-04-24 00:00:00 UTC [26989]: [4945033-1] user= db= host= ERROR: could not open file "global/14078": No such file or directory
2013-04-24 00:00:00 UTC [26989]: [4945034-1] user= db= host= CONTEXT: writing block 0 of relation global/14078
2013-04-24 00:00:00 UTC [26989]: [4945035-1] user= db= host= WARNING: could not write block 0 of global/14078
2013-04-24 00:00:00 UTC [26989]: [4945036-1] user= db= host= DETAIL: Multiple failures --- write error might be permanent.
Looks like some index corruption.
Thanks
On Thu, Apr 25, 2013 at 8:14 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote:I am not sure that it is your situation but take a look at this thread:
> I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i
> setup a hot standby by using pgbasebackup. Today i got the below alert from
> standby box :
>
> [1] (from line 412,723)
> 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC:
> _bt_restore_page: cannot add item to page
>
> When i check, the replication is terminated due to slave DB shutdown. From
> the logs i can see below messages :-
http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A@mail.gmail.com
There is a patch by Andres Freund in the end of the discussion. Three
weeks have passed after I installed the patched version and it looks
like the patch fixed my issue.--
>
> 2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR: could
> not open file "global/14078": No such file or directory
> 2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT:
> writing block 0 of relation global/14078
> 2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING: could
> not write block 0 of global/14078
> 2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL:
> Multiple failures --- write error might be permanent.
>
> I checked in global directory of master, the directory 14078 doesn't exist.
>
> Anyone has faced above issue ?
>
> Thanks
Kind regards,
Sergey Konoplev
Database and Software Consultant
Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com
If its really index corruption, then you should be able to fix it by reindexing. However, that doesn't explain what caused the corruption. Perhaps your hardware is bad in some way? On Wed, Apr 24, 2013 at 10:46 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote: > Thanks Sergey for such a quick response, but i dont think this is some patch > problem because we have other DB servers also running fine on same version > and message is also different : > > host= PANIC: _bt_restore_page: cannot add item to page > > And the whole day replication is working fine but at midnight when log > rotates it shows belows msg : > > 2013-04-24 00:00:00 UTC [26989]: [4945032-1] user= db= host= LOG: > checkpoint starting: time > 2013-04-24 00:00:00 UTC [26989]: [4945033-1] user= db= host= ERROR: could > not open file "global/14078": No such file or directory > > 2013-04-24 00:00:00 UTC [26989]: [4945034-1] user= db= host= CONTEXT: > writing block 0 of relation global/14078 > 2013-04-24 00:00:00 UTC [26989]: [4945035-1] user= db= host= WARNING: could > not write block 0 of global/14078 > > 2013-04-24 00:00:00 UTC [26989]: [4945036-1] user= db= host= DETAIL: > Multiple failures --- write error might be permanent. > > Looks like some index corruption. > > > Thanks > > > > > > > On Thu, Apr 25, 2013 at 8:14 AM, Sergey Konoplev <gray.ru@gmail.com> wrote: >> >> On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma <eddy.adarsh@gmail.com> >> wrote: >> > I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i >> > setup a hot standby by using pgbasebackup. Today i got the below alert >> > from >> > standby box : >> > >> > [1] (from line 412,723) >> > 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC: >> > _bt_restore_page: cannot add item to page >> > >> > When i check, the replication is terminated due to slave DB shutdown. >> > From >> > the logs i can see below messages :- >> >> I am not sure that it is your situation but take a look at this thread: >> >> >> http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A@mail.gmail.com >> >> There is a patch by Andres Freund in the end of the discussion. Three >> weeks have passed after I installed the patched version and it looks >> like the patch fixed my issue. >> >> > >> > 2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR: >> > could >> > not open file "global/14078": No such file or directory >> > 2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT: >> > writing block 0 of relation global/14078 >> > 2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING: >> > could >> > not write block 0 of global/14078 >> > 2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL: >> > Multiple failures --- write error might be permanent. >> > >> > I checked in global directory of master, the directory 14078 doesn't >> > exist. >> > >> > Anyone has faced above issue ? >> > >> > Thanks >> >> >> >> -- >> Kind regards, >> Sergey Konoplev >> Database and Software Consultant >> >> Profile: http://www.linkedin.com/in/grayhemp >> Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 >> Skype: gray-hemp >> Jabber: gray.ru@gmail.com > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand https://netllama.linux-sxs.org
On 2013-04-24 19:44:25 -0700, Sergey Konoplev wrote: > On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote: > > I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i > > setup a hot standby by using pgbasebackup. Today i got the below alert from > > standby box : > > > > [1] (from line 412,723) > > 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC: > > _bt_restore_page: cannot add item to page > > > > When i check, the replication is terminated due to slave DB shutdown. From > > the logs i can see below messages :- Does the global/14078 file exist on the primary? What exact commandline were you using to restore? Which exact version of postgres? > I am not sure that it is your situation but take a look at this thread: > > http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A@mail.gmail.com > > There is a patch by Andres Freund in the end of the discussion. The issues don't look related. > Three > weeks have passed after I installed the patched version and it looks > like the patch fixed my issue. Oh, cool! Thanks for verifying. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Sorry my bad , didn't mention the full DB version :
9.2.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit
Apart from these i am happy to inform , the issue is fixed now. Actually there are two Slave set up's on the standby box on different ports and are two stale processes ( logger and writer ) that are running with different parent id's on the box. After killing the processes and reloading conf file, db server is replaying logs properly.
@Andres : No the directory doesn't exist on master but exists on the other standby.
@Lonni , i was guessing because of the below message in the logs:-
_bt_restore_page: cannot add item to page
http://en.verysource.com/code/5191515_1/nbtxlog.c.html
Yes we faced H/w issues in master and we flip to slave and setup a new SR in which we are facing this issue.
Still don't know why this PANIC message came. Anywaz thanks u all for giving your crucial time into it.
Thanks
On Thu, Apr 25, 2013 at 7:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-04-24 19:44:25 -0700, Sergey Konoplev wrote:Does the global/14078 file exist on the primary? What exact commandline
> On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote:
> > I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i
> > setup a hot standby by using pgbasebackup. Today i got the below alert from
> > standby box :
> >
> > [1] (from line 412,723)
> > 2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC:
> > _bt_restore_page: cannot add item to page
> >
> > When i check, the replication is terminated due to slave DB shutdown. From
> > the logs i can see below messages :-
were you using to restore? Which exact version of postgres?The issues don't look related.
> I am not sure that it is your situation but take a look at this thread:
>
> http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8DhH8P8whhohbPjReymmQUrOcNT2A@mail.gmail.com
>
> There is a patch by Andres Freund in the end of the discussion.Oh, cool! Thanks for verifying.
> Three
> weeks have passed after I installed the patched version and it looks
> like the patch fixed my issue.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services