Thread: missing data/global

missing data/global

From
Daniel Kalchev
Date:
Hello,

Is there ANY chance to recover data from a database system that suffered disk 
crash, and is not missing the data/global directory?

Version is 7.2.4. Database files seem to be intact as well as pg_clog and 
pg_xlog directories.

Thanks in advance for any ideas.

Daniel



Re: missing data/global

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> Is there ANY chance to recover data from a database system that suffered disk
> crash, and is not missing the data/global directory?
> Version is 7.2.4. Database files seem to be intact as well as pg_clog and 
> pg_xlog directories.

The hard part I think would be reconstructing pg_database, because you'd
need to get the database OIDs right.  I can't think of any way to do
that that doesn't involve poking at the file with a hex editor.

Here's a sketch of how I'd proceed:

1. Make a tar backup of what you have!  That way you can start over
after you screw up ;-)

2. I assume you know the names and properties of your databases, users,
and groups if any; also the SYSID numbers for the users and groups.
A recent pg_dumpall script would be a good place to get this info.

3. You're also going to need to figure out the OIDs of your databases
(the OIDs are the same as the names of their subdirectories under
$PGDATA/base).  Possibly you can do this just from directory/file sizes.
Note that template1 should be OID 1, and template0 will have the next
lowest number (probably 16555, in 7.2).

4. Initdb a scratch database in some other place (or move aside your
existing files, if that seems safer).  In this scratch DB, create
databases, users, and groups to match your old setup.  You should be
able to duplicate everything except the database OIDs using standard
SQL commands.

5. Shut down scratch postmaster, then hex-edit pg_database to insert the
correct OIDs.  Use pg_filedump or a similar tool to verify that you did
this properly.

6. Restart scratch postmaster, and VACUUM FREEZE pg_database, pg_shadow,
and pg_group (from any database).  This will make the next step safe.

7. Stop scratch postmaster, and then copy over its $PGDATA/global
directory into the old DB.

8. Cross your fingers and start postmaster ...

This will probably *not* work if you had been doing anything to
pg_database, pg_shadow, or pg_group between your last checkpoint and the
crash, because the reconstructed tables are not going to be physically
identical to what they were before, so any actions replayed from WAL
against those tables will be wrong.  Hopefully you won't have that
problem.  If you do, it might work to shut down the postmaster and again
copy the scratch $PGDATA/global directory into the old DB, thereby
overwriting what the WAL replay did.  This is getting into the realm of
speculation though.
        regards, tom lane


Re: missing data/global

From
Daniel Kalchev
Date:
Tom,

This is basically what I had in mind, but you described it better than I ever 
could :)

What I need from this database system is just one database and probably not 
all of the tables anyway (but some do seem valuable). This database happens to 
be second in the pg_dumpall script. The next databases are rather big (and I 
actually have more recent backup and could eventually recreate the data) The 
valuable database hasn't had significant structure changes since the backup.

Looking at the files, I am confident which is the proper database oid - if 
this cannot be properly fixed, is there .. reasonable way to dump data from 
the (heap) files?

Here is what I have:

su-2.02# du
1747    ./base/1
1693    ./base/16555
1       ./base/77573557/pgsql_tmp
127036  ./base/77573557
1       ./base/13255137/pgsql_tmp
1379190 ./base/13255137
11246   ./base/95521309
1781    ./base/96388007
1       ./base/133512058/pgsql_tmp
11933861        ./base/133512058
13456555        ./base
98209   ./pg_xlog
41315   ./pg_clog
13596100        .

My database should be with oid 77573557, template0 is apparently 16555

Let's see how all this works.

Daniel

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > Is there ANY chance to recover data from a database
systemthat suffered d    isk> > crash, and is not missing the data/global directory?> > Version is 7.2.4. Database
filesseem to be intact as well as pg_clog and > > pg_xlog directories.> > The hard part I think would be reconstructing
pg_database,because you'd> need to get the database OIDs right.  I can't think of any way to do> that that doesn't
involvepoking at the file with a hex editor.> > Here's a sketch of how I'd proceed:> > 1. Make a tar backup of what you
have! That way you can start over> after you screw up ;-)> > 2. I assume you know the names and properties of your
databases,users,> and groups if any; also the SYSID numbers for the users and groups.> A recent pg_dumpall script would
bea good place to get this info.> > 3. You're also going to need to figure out the OIDs of your databases> (the OIDs
arethe same as the names of their subdirectories under> $PGDATA/base).  Possibly you can do this just from
directory/filesizes.> Note that template1 should be OID 1, and template0 will have the next> lowest number (probably
16555,in 7.2).> > 4. Initdb a scratch database in some other place (or move aside your> existing files, if that seems
safer). In this scratch DB, create> databases, users, and groups to match your old setup.  You should be> able to
duplicateeverything except the database OIDs using standard> SQL commands.> > 5. Shut down scratch postmaster, then
hex-editpg_database to insert the> correct OIDs.  Use pg_filedump or a similar tool to verify that you did> this
properly.>> 6. Restart scratch postmaster, and VACUUM FREEZE pg_database, pg_shadow,> and pg_group (from any database).
This will make the next step safe.> > 7. Stop scratch postmaster, and then copy over its $PGDATA/global> directory into
theold DB.> > 8. Cross your fingers and start postmaster ...> > This will probably *not* work if you had been doing
anythingto> pg_database, pg_shadow, or pg_group between your last checkpoint and the> crash, because the reconstructed
tablesare not going to be physically> identical to what they were before, so any actions replayed from WAL> against
thosetables will be wrong.  Hopefully you won't have that> problem.  If you do, it might work to shut down the
postmasterand again> copy the scratch $PGDATA/global directory into the old DB, thereby> overwriting what the WAL
replaydid.  This is getting into the realm of> speculation though.> >             regards, tom lane
 




Re: missing data/global

From
Christopher Kings-Lynne
Date:
If you're not missing your data dir, clog or xlog then what's the problem?

Daniel Kalchev wrote:
> Hello,
> 
> Is there ANY chance to recover data from a database system that suffered disk 
> crash, and is not missing the data/global directory?
> 
> Version is 7.2.4. Database files seem to be intact as well as pg_clog and 
> pg_xlog directories.
> 
> Thanks in advance for any ideas.
> 
> Daniel
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: missing data/global

From
Daniel Kalchev
Date:
data/base/global is missing and this is where postgres gets all it's startup 
data from (database oids, next oid, transaction id etc).

Let's see how easy to recover from this it will turn to be.

Daniel

>>>Christopher Kings-Lynne said:> If you're not missing your data dir, clog or xlog then what's the problem?> > Daniel
Kalchevwrote:> > Hello,> > > > Is there ANY chance to recover data from a database system that suffered d    isk > >
crash,and is not missing the data/global directory?> > > > Version is 7.2.4. Database files seem to be intact as well
aspg_clog and > > pg_xlog directories.> > > > Thanks in advance for any ideas.> > > > Daniel> > > > > >
---------------------------(endof broadcast)---------------------------> > TIP 1: subscribe and unsubscribe commands go
tomajordomo@postgresql.org
 




Re: missing data/global

From
Christopher Kings-Lynne
Date:
Ah, you said 'is NOT missing'.

Chris

Daniel Kalchev wrote:

> data/base/global is missing and this is where postgres gets all it's startup 
> data from (database oids, next oid, transaction id etc).
> 
> Let's see how easy to recover from this it will turn to be.
> 
> Daniel
> 
> 
>>>>Christopher Kings-Lynne said:
> 
>  > If you're not missing your data dir, clog or xlog then what's the problem?
>  > 
>  > Daniel Kalchev wrote:
>  > > Hello,
>  > > 
>  > > Is there ANY chance to recover data from a database system that suffered d
>      isk 
>  > > crash, and is not missing the data/global directory?
>  > > 
>  > > Version is 7.2.4. Database files seem to be intact as well as pg_clog and 
>  > > pg_xlog directories.
>  > > 
>  > > Thanks in advance for any ideas.
>  > > 
>  > > Daniel
>  > > 
>  > > 
>  > > ---------------------------(end of broadcast)---------------------------
>  > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 


Re: missing data/global

From
Daniel Kalchev
Date:
Tom I did the following:

(found out 7.2.3 does not have pg_database)

1. saved old data etc.

2. created new database, and the database. database oid was 16556;

3. moved data/global to the old data directory.

4. though, that postmaster would actually use the database oid to locate the 
directory, then load everything from there.. old database oid was 77573557, so 
I just linked this to 16556 in the data/base direcotry. (this might be the 
first possible error)

Now I can connect to the 'old' database, but get the error 

FATAL 1:  Index pg_operator_oid_index is not a btree

(if I run postmaster with -P I get not errors, but no tables as well).

By the way, I had to copy over the 'new' files from pg_clog and pg_xlog (this 
is the second possible error) to get the postmaster running. Perhaps better 
would be to use pg_resetxlog or similar?

Daniel

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > Is there ANY chance to recover data from a database
systemthat suffered d    isk> > crash, and is not missing the data/global directory?> > Version is 7.2.4. Database
filesseem to be intact as well as pg_clog and > > pg_xlog directories.> > The hard part I think would be reconstructing
pg_database,because you'd> need to get the database OIDs right.  I can't think of any way to do> that that doesn't
involvepoking at the file with a hex editor.> > Here's a sketch of how I'd proceed:> > 1. Make a tar backup of what you
have! That way you can start over> after you screw up ;-)> > 2. I assume you know the names and properties of your
databases,users,> and groups if any; also the SYSID numbers for the users and groups.> A recent pg_dumpall script would
bea good place to get this info.> > 3. You're also going to need to figure out the OIDs of your databases> (the OIDs
arethe same as the names of their subdirectories under> $PGDATA/base).  Possibly you can do this just from
directory/filesizes.> Note that template1 should be OID 1, and template0 will have the next> lowest number (probably
16555,in 7.2).> > 4. Initdb a scratch database in some other place (or move aside your> existing files, if that seems
safer). In this scratch DB, create> databases, users, and groups to match your old setup.  You should be> able to
duplicateeverything except the database OIDs using standard> SQL commands.> > 5. Shut down scratch postmaster, then
hex-editpg_database to insert the> correct OIDs.  Use pg_filedump or a similar tool to verify that you did> this
properly.>> 6. Restart scratch postmaster, and VACUUM FREEZE pg_database, pg_shadow,> and pg_group (from any database).
This will make the next step safe.> > 7. Stop scratch postmaster, and then copy over its $PGDATA/global> directory into
theold DB.> > 8. Cross your fingers and start postmaster ...> > This will probably *not* work if you had been doing
anythingto> pg_database, pg_shadow, or pg_group between your last checkpoint and the> crash, because the reconstructed
tablesare not going to be physically> identical to what they were before, so any actions replayed from WAL> against
thosetables will be wrong.  Hopefully you won't have that> problem.  If you do, it might work to shut down the
postmasterand again> copy the scratch $PGDATA/global directory into the old DB, thereby> overwriting what the WAL
replaydid.  This is getting into the realm of> speculation though.> >             regards, tom lane> >
---------------------------(endof broadcast)---------------------------> TIP 2: you can get off all lists at once with
theunregister command>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 




Re: missing data/global

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> (found out 7.2.3 does not have pg_database)

You think not?

> By the way, I had to copy over the 'new' files from pg_clog and pg_xlog (this
> is the second possible error) to get the postmaster running.

That was *not* part of the recipe, and is guaranteed *not* to work.

It seems likely though that you are wasting your time --- the index
failure suggests strongly that you have more corruption than just the
loss of the /global subdirectory :-(
        regards, tom lane


Re: missing data/global

From
Daniel Kalchev
Date:
>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > (found out 7.2.3 does not have pg_database)> > You
thinknot?
 

Not as a file similar to pg_control. pg_database is indeed table in the system 
catalog.
> > By the way, I had to copy over the 'new' files from pg_clog and pg_xlog (t    his> > is the second possible error)
toget the postmaster running.> > That was *not* part of the recipe, and is guaranteed *not* to work.
 

I know that, but wondered if it would help in any way.. By the way, what would 
be the solution to sync WAL with the pg_control contents?
> > It seems likely though that you are wasting your time --- the index> failure suggests strongly that you have more
corruptionthan just the> loss of the /global subdirectory :-(
 

After spending some time to find possible ways to adjust pointers (could 
eventually save part of the data), I decided to move to plan B, which is to 
have few people manually re-enter the data - would have been more effective to 
waste my time anyway - but not if it will take days and the result be not 
guaranteed to be consistent.

Does such toll exist, that could dump data (records?) from the heap files 
given the table structure?

Regards,
Daniel



Re: missing data/global

From
Alvaro Herrera
Date:
On Wed, Aug 25, 2004 at 07:07:23PM +0300, Daniel Kalchev wrote:

> Does such toll exist, that could dump data (records?) from the heap files 
> given the table structure?

You may want to check pg_filedump (from http://sources.redhat.com/rhdb
IIRC).

(What happened to pg_fsck BTW?)

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)