Thread: strange disappearence of postgres file

strange disappearence of postgres file

From
Harald Armin Massa
Date:
I ran into a very strange disappearance of a postgresql data file.

The environment:

Windows XP professional

select version();
PostgreSQL 8.0.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2(mingw-special)

All datafiles are within c:\ghum\data

that part of the harddrive is only accessable to user account postgres (=3D
the one running the service) and Administrator.

the logentries:

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG: aborting startup due to startup process failure
2005-11-21 08:08:06 LOG: logger shutting down

<file://///Svs00004/K__Anw_Allg/massa/haase/pglog/postgresql-2005-11-21_084=
022.log>
postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20
16:46:29 Westeurop=E4ische Normalzeit
2005-11-21 08:40:24 LOG: checkpoint record is at 4/2C880BC0
2005-11-21 08:40:24 LOG: redo record is at 4/2C880BC0; undo record is at
0/0; shutdown TRUE
2005-11-21 08:40:24 LOG: next transaction ID: 421189; next OID: 381050
2005-11-21 08:40:24 LOG: database system is ready
2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:45:52 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:45:59 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:46:09 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:46:30 ERROR: could not open relation 1663/17253/43471: No such
file or directory

... and on and on and on...

My checking showed that... within directory data/base/17253 there is indeed
no file named 43471; but ones named 43470,43472,43473 .....

Additional Information: that is a laptop-computer, and it is quite possible
that it was restarted between 8:08 and 8:40 this morning.

But ... I see no line mentioning a "killing of a file". Can I do something
more to find out WHY this file disappeared? (and esp. make sure that no more
files disappear :( )

Harald


--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

Re: strange disappearence of postgres file

From
"Qingqing Zhou"
Date:
"Harald Armin Massa" <haraldarminmassa@gmail.com> wrote
>
> 2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No
> such
> file or directory
>
> My checking showed that... within directory data/base/17253 there is
> indeed
> no file named 43471; but ones named 43470,43472,43473 .....
>

This may be not a direct step to solve the problem, but let's get some
feelings first:

Is 43471 a temp table? To find out this: "select relname, nspname from
pg_class, pg_namespace where pg_namespace.oid=pg_class.relnamespace and
relfilenode = 43471;"

Regards,
Qingqing

Re: strange disappearence of postgres file

From
Harald Armin Massa
Date:
Quingqing,

no, it is definitely NOT a temp table. I know which table it is, because:

that "could not open relation ..." error message was exactly provokable with
"select * from repofeld", which is one of my tables in that database.

(And that was the way it was passed on to me: my application put that error
on screen)


Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

Re: strange disappearence of postgres file

From
Qingqing Zhou
Date:
On Wed, 23 Nov 2005, Harald Armin Massa wrote:
>
> no, it is definitely NOT a temp table. I know which table it is,
> because: that "could not open relation ..." error message was exactly
> provokable with "select * from repofeld", which is one of my tables in
> that database.
>

What is your file system, NTFS or FAT32? Is that table newly created?

So there is a valid record in pg_class but the representing data file is
lost ... a possible theory of what's happened could be:

1) create the table;
2) a checkpoint happens;
3) lost power;
[ restar the machine and database ]
4) file system recovery - unable to recovery your data file;
5) database recovery - don't play WAL and recreate your data file because
of the checkpoint;

Is that possible?

Regards,
Qingqing

Re: strange disappearence of postgres file

From
Harald Armin Massa
Date:
Hello QuingQing,

>
> What is your file system, NTFS or FAT32? Is that table newly created?


File System is NTFS. That table was created with database installation,
which was short after after release of PostgreSQL 8.0 - so that database was
in use for > 9 months. That table is a central table of the application, and
is used very often ... the application was running for the whole 9 months :)

So there is a valid record in pg_class but the representing data file is
> lost ... a possible theory of what's happened could be:
>
> 1) create the table;
> 2) a checkpoint happens;
> 3) lost power;
> [ restar the machine and database ]
> 4) file system recovery - unable to recovery your data file;
> 5) database recovery - don't play WAL and recreate your data file because
> of the checkpoint;


Is that possible?
>
Rather not ... that table was there from the beginning and for some months.
(it gets created with initial database install)

The only strange thing is that short 3 lines log before the big error. As to
power loss: that is a laptop; so "loosing power" would need to take out the
accumulator. Which is quite possible, but rather unlikely given the
technical level of the user.

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG:  startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG:  aborting startup due to startup process failure
2005-11-21 08:08:06 LOG:  logger shutting down

postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG:  database system was shut down at 2005-11-20
16:46:29 Westeurop=E4ische Normalzeit

--> can that exiting process with 128 be a sign of "system gets killed while
postgres is starting up"?  (and taking a file with it????)

As much as I learned, PostgreSQL only APPENDS to the data files; and only a
vacuum full can truncate them ... and on that machine there is no vacuum
full happening.

It still keeps being a mystery.

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

Re: strange disappearence of postgres file

From
Alvaro Herrera
Date:
Harald Armin Massa wrote:
> Hello QuingQing,
>
> > What is your file system, NTFS or FAT32? Is that table newly created?
>
> File System is NTFS. That table was created with database installation,
> which was short after after release of PostgreSQL 8.0 - so that database was
> in use for > 9 months. That table is a central table of the application, and
> is used very often ... the application was running for the whole 9 months :)

How many database-wide vacuums did you run during these 9 months?  I'm
smelling transaction Id wraparound in pg_class or some other system
catalog.

This has been known to happen.  Please see the archives.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: strange disappearence of postgres file

From
Harald Armin Massa
Date:
Alvaro,

How many database-wide vacuums did you run during these 9 months?  I'm
> smelling transaction Id wraparound in pg_class or some other system
> catalog.
>
> This has been known to happen.  Please see the archives.
>

"database wide" vacuums have been very very seldom; I fear the only one was
at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with
autovacuum enabled ... so in the long run THAT possible reason should
vanish.

And I understand correctly that your suspicion is: file gets killed because
of transaction id wraparound in system table?

I will scan teh archives (as soon as postgresql.org provices access to them
again :)

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

Re: strange disappearence of postgres file

From
Alvaro Herrera
Date:
Harald Armin Massa wrote:
> Alvaro,
>
> > How many database-wide vacuums did you run during these 9 months?  I'm
> > smelling transaction Id wraparound in pg_class or some other system
> > catalog.
>
> "database wide" vacuums have been very very seldom; I fear the only one was
> at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with
> autovacuum enabled ... so in the long run THAT possible reason should
> vanish.
>
> And I understand correctly that your suspicion is: file gets killed because
> of transaction id wraparound in system table?

No, the file doesn't get killed, but an old tuple on the pg_class system
catalog magically appears because it was deleted and marked with a very
old Xmax, that after the wraparound is seen as future; and thus the
tuple that was deleted is suddenly visible.  For this to happen you need
to have failed to vacuum pg_class in a very long time.

There may be other explanations, but we'd need more data, like how much
tuples for that table you got on pg_class and what do they look like.
Try something like

set enable_indexscan to off;
select xmin, xmax, relname from pg_class where relname = 'the table';

You can search the archives using http://www.pgsql.ru

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.