Thread: Data Recovery

Data Recovery

From

Alex Turner

Date:

16 March 2005, 19:43:16

I have a crashed database fileset, and I'm wondering if there is any
way to recover the data from a specific table.  I know which table got
corrupted, and it's not the table I am trying to recover.

I know this is a little vague, but I'm not really sure what
information would be pertinent..

Any help would be greatly appreciated!

Thanks very much,

Alex Turner
netEconomist

Re: Data Recovery

From

Lonni J Friedman

Date:

16 March 2005, 20:14:21

On Wed, 16 Mar 2005 14:43:09 -0500, Alex Turner <armtuk@gmail.com> wrote:
> I have a crashed database fileset, and I'm wondering if there is any
> way to recover the data from a specific table.  I know which table got
> corrupted, and it's not the table I am trying to recover.
>
> I know this is a little vague, but I'm not really sure what
> information would be pertinent..

Crashed how exactly?   If you can explain what led to the current
state, we could likely assist in recovering.  Which version of
PostgreSQL are you running, and on which OS?

I'm guessing that you don't have recent reliable backups of this data?


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: Data Recovery

From

Alex Turner

Date:

16 March 2005, 20:47:17

It's postgresl 8.01 on AMD 64 Suse 9.2.  The database didn't dump
succesfully for several days in a row so the backup is corrupted also.

The controller card crashed and we think caused data corruption.  I
rebooted the system the following day, and it came back up, but all
was not well, pg_dump all failed that day, and the following day.

Thanks,

Alex Turner
netEconomist


On Wed, 16 Mar 2005 12:14:03 -0800, Lonni J Friedman <netllama@gmail.com> wrote:
> On Wed, 16 Mar 2005 14:43:09 -0500, Alex Turner <armtuk@gmail.com> wrote:
> > I have a crashed database fileset, and I'm wondering if there is any
> > way to recover the data from a specific table.  I know which table got
> > corrupted, and it's not the table I am trying to recover.
> >
> > I know this is a little vague, but I'm not really sure what
> > information would be pertinent..
>
> Crashed how exactly?   If you can explain what led to the current
> state, we could likely assist in recovering.  Which version of
> PostgreSQL are you running, and on which OS?
>
> I'm guessing that you don't have recent reliable backups of this data?
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> L. Friedman                                    netllama@gmail.com
> LlamaLand                       http://netllama.linux-sxs.org
>

Re: Data Recovery

From

Lonni J Friedman

Date:

16 March 2005, 20:51:23

On Wed, 16 Mar 2005 15:46:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> It's postgresl 8.01 on AMD 64 Suse 9.2.  The database didn't dump
> succesfully for several days in a row so the backup is corrupted also.
>
> The controller card crashed and we think caused data corruption.  I
> rebooted the system the following day, and it came back up, but all
> was not well, pg_dump all failed that day, and the following day.

Failed how?  What options are you using and what kind of output are
you seeing?

What makes you think you had data corruption?  What kind of filesystem
are you using?


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: Data Recovery

From

Alex Turner

Date:

16 March 2005, 21:03:11

Fsync was off - we are using XFS and the Microcontroller on the RAID
card Crashed and took two filesystems offline at about 2 a.m.

There were some error messages in the postgres log, something like
Update failed - right part of branch is wrong (I'm guessing - I'm
trying to find the exact error - but we do 10 hits/second and there
are alot of logs).

The pg_dumpall would get to a certain table and crap out - it would
just not read anymore data.

Thanks,

Alex Turner
netEconomist

On Wed, 16 Mar 2005 12:51:21 -0800, Lonni J Friedman <netllama@gmail.com> wrote:
> On Wed, 16 Mar 2005 15:46:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> > It's postgresl 8.01 on AMD 64 Suse 9.2.  The database didn't dump
> > succesfully for several days in a row so the backup is corrupted also.
> >
> > The controller card crashed and we think caused data corruption.  I
> > rebooted the system the following day, and it came back up, but all
> > was not well, pg_dump all failed that day, and the following day.
>
> Failed how?  What options are you using and what kind of output are
> you seeing?
>
> What makes you think you had data corruption?  What kind of filesystem
> are you using?
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> L. Friedman                                    netllama@gmail.com
> LlamaLand                       http://netllama.linux-sxs.org
>

Re: Data Recovery

From

Alex Turner

Date:

16 March 2005, 21:05:26

Ok - I found the log messages:

ERROR:  duplicate key violates unique constraint "features_pkey"
STATEMENT:  insert into features
(propid,dtmodified,proptype,groupid,featid,group_desc,feat_desc)
values (449356005,'3/9/2005 12:03:59 AM',1,26,1,'Water','PublicWater')
PANIC:  right sibling's left-link doesn't match
STATEMENT:  insert into features
(propid,dtmodified,proptype,groupid,featid,group_desc,feat_desc)
values (449356005,'3/9/2005 12:04:00 AM',1,27,1,'Sewer','PublicSewer')
LOG:  server process (PID 13129) was terminated by signal 6
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
STATEMENT:  update features set dtmodified='9/16/2004 7:28:42
AM',proptype=3,group_desc='Primary Heating',feat_desc='GasHeat' where
propid=442448204 and groupid=15 and featid=2
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
STATEMENT:  select count(C.agentcode)  from propmain A,areacodes
B,members C,office D,type_xref E,prop_extra F where A.listprice<350000
and A.listprice>100000 and A.approx_age<300 and A.approx_age>0 and
A.areacode=B.areacode and A.listagent=C.agentcode and
C.officecode=D.officecode and A.type_of_prop=E.type_of_prop and
A.propid=F.propid  and lower(B.group_name) in ('bucks') and
A.school_dist in ('quakertown comm','u perkiomen') and A.type_of_prop
in ('SNG')
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2005-03-12 01:36:41 EST
LOG:  checkpoint record is at 2B/553CED30
LOG:  redo record is at 2B/553CED30; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 118419066; next OID: 50155349
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 2B/553CED78
FATAL:  the database system is starting up
FATAL:  the database system is starting up

On Wed, 16 Mar 2005 16:02:58 -0500, Alex Turner <armtuk@gmail.com> wrote:
> Fsync was off - we are using XFS and the Microcontroller on the RAID
> card Crashed and took two filesystems offline at about 2 a.m.
>
> There were some error messages in the postgres log, something like
> Update failed - right part of branch is wrong (I'm guessing - I'm
> trying to find the exact error - but we do 10 hits/second and there
> are alot of logs).
>
> The pg_dumpall would get to a certain table and crap out - it would
> just not read anymore data.
>
> Thanks,
>
> Alex Turner
> netEconomist
>
> On Wed, 16 Mar 2005 12:51:21 -0800, Lonni J Friedman <netllama@gmail.com> wrote:
> > On Wed, 16 Mar 2005 15:46:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> > > It's postgresl 8.01 on AMD 64 Suse 9.2.  The database didn't dump
> > > succesfully for several days in a row so the backup is corrupted also.
> > >
> > > The controller card crashed and we think caused data corruption.  I
> > > rebooted the system the following day, and it came back up, but all
> > > was not well, pg_dump all failed that day, and the following day.
> >
> > Failed how?  What options are you using and what kind of output are
> > you seeing?
> >
> > What makes you think you had data corruption?  What kind of filesystem
> > are you using?
> >
> >
> > --
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > L. Friedman                                    netllama@gmail.com
> > LlamaLand                       http://netllama.linux-sxs.org
> >
>

Re: Data Recovery

From

Lonni J Friedman

Date:

16 March 2005, 21:05:41

On Wed, 16 Mar 2005 16:02:58 -0500, Alex Turner <armtuk@gmail.com> wrote:
> Fsync was off - we are using XFS and the Microcontroller on the RAID
> card Crashed and took two filesystems offline at about 2 a.m.

Did you run xfs_repair afterwards?

> There were some error messages in the postgres log, something like
> Update failed - right part of branch is wrong (I'm guessing - I'm
> trying to find the exact error - but we do 10 hits/second and there
> are alot of logs).

Knowing/seeing those errors would be useful.

>
> The pg_dumpall would get to a certain table and crap out - it would
> just not read anymore data.

crap out meaning what exactly?  Is it hanging?  Is there an error message?

Seriously, you need to provide information here.  Too much is better
than the trickle that you've provided thus far.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: Data Recovery

From

Lonni J Friedman

Date:

16 March 2005, 21:11:05

On Wed, 16 Mar 2005 16:05:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> Ok - I found the log messages:
>
> ERROR:  duplicate key violates unique constraint "features_pkey"
> STATEMENT:  insert into features
> (propid,dtmodified,proptype,groupid,featid,group_desc,feat_desc)
> values (449356005,'3/9/2005 12:03:59 AM',1,26,1,'Water','PublicWater')
> PANIC:  right sibling's left-link doesn't match

See

http://groups-beta.google.com/group/comp.databases.postgresql.hackers/browse_thread/thread/115e69a0e5a66bb5/ed3bf8b7de3a6cc0?q=%22right+sibling%27s+left-link+doesn%27t+match%22#ed3bf8b7de3a6cc0
In short, you need to drop and rebuild the index to address that
error.  But this assumes that you've already successfully run
xfs_repair on the filesystem.  If your FS is hosed, all the recovery
in the world isn't going to help the DB.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: Data Recovery

From

Alex Turner

Date:

16 March 2005, 21:15:23

This is the message I get when I try to start the database:

LOG:  database system was interrupted while in recovery at 2005-03-16
16:07:58 EST
HINT:  This probably means that some data is corrupted and you will
have to use the last backup for recovery.
LOG:  checkpoint record is at 2B/553CED30
LOG:  redo record is at 2B/553CED30; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 118419066; next OID: 50155349
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 2B/553CED78
LOG:  record with zero length at 2B/62360AB8
LOG:  redo done at 2B/62360A88

Thanks,

Alex Turner
netEconomist


On Wed, 16 Mar 2005 16:05:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> Ok - I found the log messages:
>
> ERROR:  duplicate key violates unique constraint "features_pkey"
> STATEMENT:  insert into features
> (propid,dtmodified,proptype,groupid,featid,group_desc,feat_desc)
> values (449356005,'3/9/2005 12:03:59 AM',1,26,1,'Water','PublicWater')
> PANIC:  right sibling's left-link doesn't match
> STATEMENT:  insert into features
> (propid,dtmodified,proptype,groupid,featid,group_desc,feat_desc)
> values (449356005,'3/9/2005 12:04:00 AM',1,27,1,'Sewer','PublicSewer')
> LOG:  server process (PID 13129) was terminated by signal 6
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> STATEMENT:  update features set dtmodified='9/16/2004 7:28:42
> AM',proptype=3,group_desc='Primary Heating',feat_desc='GasHeat' where
> propid=442448204 and groupid=15 and featid=2
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> STATEMENT:  select count(C.agentcode)  from propmain A,areacodes
> B,members C,office D,type_xref E,prop_extra F where A.listprice<350000
> and A.listprice>100000 and A.approx_age<300 and A.approx_age>0 and
> A.areacode=B.areacode and A.listagent=C.agentcode and
> C.officecode=D.officecode and A.type_of_prop=E.type_of_prop and
> A.propid=F.propid  and lower(B.group_name) in ('bucks') and
> A.school_dist in ('quakertown comm','u perkiomen') and A.type_of_prop
> in ('SNG')
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> LOG:  all server processes terminated; reinitializing
> LOG:  database system was interrupted at 2005-03-12 01:36:41 EST
> LOG:  checkpoint record is at 2B/553CED30
> LOG:  redo record is at 2B/553CED30; undo record is at 0/0; shutdown TRUE
> LOG:  next transaction ID: 118419066; next OID: 50155349
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  redo starts at 2B/553CED78
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
>
>
> On Wed, 16 Mar 2005 16:02:58 -0500, Alex Turner <armtuk@gmail.com> wrote:
> > Fsync was off - we are using XFS and the Microcontroller on the RAID
> > card Crashed and took two filesystems offline at about 2 a.m.
> >
> > There were some error messages in the postgres log, something like
> > Update failed - right part of branch is wrong (I'm guessing - I'm
> > trying to find the exact error - but we do 10 hits/second and there
> > are alot of logs).
> >
> > The pg_dumpall would get to a certain table and crap out - it would
> > just not read anymore data.
> >
> > Thanks,
> >
> > Alex Turner
> > netEconomist
> >
> > On Wed, 16 Mar 2005 12:51:21 -0800, Lonni J Friedman <netllama@gmail.com> wrote:
> > > On Wed, 16 Mar 2005 15:46:16 -0500, Alex Turner <armtuk@gmail.com> wrote:
> > > > It's postgresl 8.01 on AMD 64 Suse 9.2.  The database didn't dump
> > > > succesfully for several days in a row so the backup is corrupted also.
> > > >
> > > > The controller card crashed and we think caused data corruption.  I
> > > > rebooted the system the following day, and it came back up, but all
> > > > was not well, pg_dump all failed that day, and the following day.
> > >
> > > Failed how?  What options are you using and what kind of output are
> > > you seeing?
> > >
> > > What makes you think you had data corruption?  What kind of filesystem
> > > are you using?
> > >
> > >
> > > --
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > L. Friedman                                    netllama@gmail.com
> > > LlamaLand                       http://netllama.linux-sxs.org
> > >
> >
>