Thread: strange disappearence of postgres file
I ran into a very strange disappearance of a postgresql data file. The environment: Windows XP professional select version(); PostgreSQL 8.0.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2(mingw-special) All datafiles are within c:\ghum\data that part of the harddrive is only accessable to user account postgres (=3D the one running the service) and Administrator. the logentries: postgresql-2005-11-21_080758.log 2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected status 128 2005-11-21 08:08:05 LOG: aborting startup due to startup process failure 2005-11-21 08:08:06 LOG: logger shutting down <file://///Svs00004/K__Anw_Allg/massa/haase/pglog/postgresql-2005-11-21_084= 022.log> postgresql-2005-11-21_084022.log 2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20 16:46:29 Westeurop=E4ische Normalzeit 2005-11-21 08:40:24 LOG: checkpoint record is at 4/2C880BC0 2005-11-21 08:40:24 LOG: redo record is at 4/2C880BC0; undo record is at 0/0; shutdown TRUE 2005-11-21 08:40:24 LOG: next transaction ID: 421189; next OID: 381050 2005-11-21 08:40:24 LOG: database system is ready 2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No such file or directory 2005-11-21 09:45:52 ERROR: could not open relation 1663/17253/43471: No such file or directory 2005-11-21 09:45:59 ERROR: could not open relation 1663/17253/43471: No such file or directory 2005-11-21 09:46:09 ERROR: could not open relation 1663/17253/43471: No such file or directory 2005-11-21 09:46:30 ERROR: could not open relation 1663/17253/43471: No such file or directory ... and on and on and on... My checking showed that... within directory data/base/17253 there is indeed no file named 43471; but ones named 43470,43472,43473 ..... Additional Information: that is a laptop-computer, and it is quite possible that it was restarted between 8:08 and 8:40 this morning. But ... I see no line mentioning a "killing of a file". Can I do something more to find out WHY this file disappeared? (and esp. make sure that no more files disappear :( ) Harald -- GHUM Harald Massa persuasion python postgresql Harald Armin Massa Reinsburgstra=DFe 202b 70197 Stuttgart 0173/9409607
"Harald Armin Massa" <haraldarminmassa@gmail.com> wrote > > 2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No > such > file or directory > > My checking showed that... within directory data/base/17253 there is > indeed > no file named 43471; but ones named 43470,43472,43473 ..... > This may be not a direct step to solve the problem, but let's get some feelings first: Is 43471 a temp table? To find out this: "select relname, nspname from pg_class, pg_namespace where pg_namespace.oid=pg_class.relnamespace and relfilenode = 43471;" Regards, Qingqing
Quingqing, no, it is definitely NOT a temp table. I know which table it is, because: that "could not open relation ..." error message was exactly provokable with "select * from repofeld", which is one of my tables in that database. (And that was the way it was passed on to me: my application put that error on screen) Harald -- GHUM Harald Massa persuasion python postgresql Harald Armin Massa Reinsburgstra=DFe 202b 70197 Stuttgart 0173/9409607
On Wed, 23 Nov 2005, Harald Armin Massa wrote: > > no, it is definitely NOT a temp table. I know which table it is, > because: that "could not open relation ..." error message was exactly > provokable with "select * from repofeld", which is one of my tables in > that database. > What is your file system, NTFS or FAT32? Is that table newly created? So there is a valid record in pg_class but the representing data file is lost ... a possible theory of what's happened could be: 1) create the table; 2) a checkpoint happens; 3) lost power; [ restar the machine and database ] 4) file system recovery - unable to recovery your data file; 5) database recovery - don't play WAL and recreate your data file because of the checkpoint; Is that possible? Regards, Qingqing
Hello QuingQing, > > What is your file system, NTFS or FAT32? Is that table newly created? File System is NTFS. That table was created with database installation, which was short after after release of PostgreSQL 8.0 - so that database was in use for > 9 months. That table is a central table of the application, and is used very often ... the application was running for the whole 9 months :) So there is a valid record in pg_class but the representing data file is > lost ... a possible theory of what's happened could be: > > 1) create the table; > 2) a checkpoint happens; > 3) lost power; > [ restar the machine and database ] > 4) file system recovery - unable to recovery your data file; > 5) database recovery - don't play WAL and recreate your data file because > of the checkpoint; Is that possible? > Rather not ... that table was there from the beginning and for some months. (it gets created with initial database install) The only strange thing is that short 3 lines log before the big error. As to power loss: that is a laptop; so "loosing power" would need to take out the accumulator. Which is quite possible, but rather unlikely given the technical level of the user. postgresql-2005-11-21_080758.log 2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected status 128 2005-11-21 08:08:05 LOG: aborting startup due to startup process failure 2005-11-21 08:08:06 LOG: logger shutting down postgresql-2005-11-21_084022.log 2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20 16:46:29 Westeurop=E4ische Normalzeit --> can that exiting process with 128 be a sign of "system gets killed while postgres is starting up"? (and taking a file with it????) As much as I learned, PostgreSQL only APPENDS to the data files; and only a vacuum full can truncate them ... and on that machine there is no vacuum full happening. It still keeps being a mystery. Harald -- GHUM Harald Massa persuasion python postgresql Harald Armin Massa Reinsburgstra=DFe 202b 70197 Stuttgart 0173/9409607
Harald Armin Massa wrote: > Hello QuingQing, > > > What is your file system, NTFS or FAT32? Is that table newly created? > > File System is NTFS. That table was created with database installation, > which was short after after release of PostgreSQL 8.0 - so that database was > in use for > 9 months. That table is a central table of the application, and > is used very often ... the application was running for the whole 9 months :) How many database-wide vacuums did you run during these 9 months? I'm smelling transaction Id wraparound in pg_class or some other system catalog. This has been known to happen. Please see the archives. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro, How many database-wide vacuums did you run during these 9 months? I'm > smelling transaction Id wraparound in pg_class or some other system > catalog. > > This has been known to happen. Please see the archives. > "database wide" vacuums have been very very seldom; I fear the only one was at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with autovacuum enabled ... so in the long run THAT possible reason should vanish. And I understand correctly that your suspicion is: file gets killed because of transaction id wraparound in system table? I will scan teh archives (as soon as postgresql.org provices access to them again :) Harald -- GHUM Harald Massa persuasion python postgresql Harald Armin Massa Reinsburgstra=DFe 202b 70197 Stuttgart 0173/9409607
Harald Armin Massa wrote: > Alvaro, > > > How many database-wide vacuums did you run during these 9 months? I'm > > smelling transaction Id wraparound in pg_class or some other system > > catalog. > > "database wide" vacuums have been very very seldom; I fear the only one was > at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with > autovacuum enabled ... so in the long run THAT possible reason should > vanish. > > And I understand correctly that your suspicion is: file gets killed because > of transaction id wraparound in system table? No, the file doesn't get killed, but an old tuple on the pg_class system catalog magically appears because it was deleted and marked with a very old Xmax, that after the wraparound is seen as future; and thus the tuple that was deleted is suddenly visible. For this to happen you need to have failed to vacuum pg_class in a very long time. There may be other explanations, but we'd need more data, like how much tuples for that table you got on pg_class and what do they look like. Try something like set enable_indexscan to off; select xmin, xmax, relname from pg_class where relname = 'the table'; You can search the archives using http://www.pgsql.ru -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.