Thread: Loosing files after backend crash
Konstantin Solodovnikov (ks@tcnet.ru) reports a bug with a severity of 2 The lower the number the more severe it is. Short Description Loosing files after backend crash Long Description After a backend crash the file for my sequence cannot be found. I use PostgreSQL 7.1RC1 compiled by GCC 2.95.2 on FreeBSD 4.2-STABLE and that's what I encountered. During execution of some sql code backend crashed (the reason for it's crashing is not the actual topic of this report). I successfully re-run psql without restarting the postmaster. And tried to delete the sequence, which i created in the previous session. That's what pgsql told me: ---8<--- Netflow_Test=# drop SEQUENCE seq_i___data_buffer ; ERROR: cannot open seq_i___data_buffer: No such file or directory ---8<--- I examined the system tables (ps_class) and found the tuple, describing my sequence. The "relfilenode" attribute of the tuple contained the value "326210". I tried to find the file with that name in the databasedirectory - it really was not there. Thus, the backend crash has led not only to impossibility to complete my task, but also to loosing some data. The sequencewas actually created in an already committed transaction, not in the one, that crashed. Another problem is that I cannot fix the situation (drop that sequence) by standard PGSQL means. I had to create an empty file with that name ("326210") for "DROP SEQUENCE" command to successfully complete. As i guess, the situation is not reprodusable. Backend crashed many times before w/o such problems. A little additional question/proposition: can there be any tool, which could find and resolve the errors in the system tables? Best Regards, Konstantin Solodovnikov. Sample Code No file was uploaded with this report
pgsql-bugs@postgresql.org writes: > During execution of some sql code backend crashed (the reason for it's > crashing is not the actual topic of this report). > And tried to delete the sequence, which i created in the previous session. > That's what pgsql told me: > ---8<--- > Netflow_Test=# drop SEQUENCE seq_i___data_buffer ; > ERROR: cannot open seq_i___data_buffer: No such file or directory Was that a system-wide crash, or just a backend crash? It seems to me that your kernel must have lost that file for you; Postgres wouldn't have gone out and deleted it. > Another problem is that I cannot fix the situation (drop that > sequence) by standard PGSQL means. I had to create an empty file with > that name ("326210") for "DROP SEQUENCE" command to successfully > complete. This used to work pre-7.1, and it seems to me it should still work; relcache.c is being inappropriately paranoid about finding the file. I have corrected that part of the problem. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, April 03, 2001 3:35 AM > To: ks@tcnet.ru; pgsql-bugs@postgresql.org > Subject: Re: [BUGS] Loosing files after backend crash > > > pgsql-bugs@postgresql.org writes: > > During execution of some sql code backend crashed (the > reason for it's > > crashing is not the actual topic of this report). > > And tried to delete the sequence, which i created in the > previous session. > > That's what pgsql told me: > > ---8<--- > > Netflow_Test=# drop SEQUENCE seq_i___data_buffer ; > > ERROR: cannot open seq_i___data_buffer: No such file or directory > > Was that a system-wide crash, or just a backend crash? It was just a backend crash. > It seems to me that your kernel must have lost that file for you; Postgres > wouldn't have gone out and deleted it. How could kernel _loose_ the file, which was already created? I would agree with the possibility of loosing some data in the file during the backend crash. But loosing the whole file ... The sequence was previously created in an already committed transaction. Correct me if I am wrong, but after I said "commit" the file would be on my disk. If so, how can backend's crash "erase" the file from disk? BTW, I have some more facts to report. Just about half an hour ago the backend crashed one more time in the same situation. Now it has not only lost the file for my sequence (the same very sequence), but it also has TWO tuples in the pg_class table with the name of my sequence. All other attribites of the tuples also hold identical values. And PGSQL doesn't consider it to be a sequence. When I say 'DROP SEQUENCE ...' it says that it is not a sequence and advises me to use 'DROP TABLE'. When I say 'DROP TABLE ...' it says something like 'cannot drop system table pg_temp.XXXXX.X' (I changed the actial digits into 'X'-es. Not sure about exact words.) The "relkind" attribute of the corresponding two tuples in the pg_class table hold letter 'S' (The uppercase 's'). Regards, Konstantin Solodovnikov.
Солодовников Константин <ks@tcnet.ru> writes: >> It seems to me that your kernel must have lost that file for you; Postgres >> wouldn't have gone out and deleted it. > How could kernel _loose_ the file, which was already created? > I would agree with the possibility of loosing some data in the file during > the backend crash. > But loosing the whole file ... > The sequence was previously created in an already committed transaction. > Correct me if I am wrong, but after I said "commit" the file would be on my > disk. Sooner than that: a create call is issued to the kernel as soon as you say CREATE SEQUENCE. > If so, how can backend's crash "erase" the file from disk? I'd like to know that too. > BTW, I have some more facts to report. > Just about half an hour ago the backend crashed one more time in the same > situation. Backtrace from core file, please? > Now it has not only lost the file for my sequence (the same very sequence), > but it also has TWO tuples in the pg_class table with the name of my > sequence. > All other attribites of the tuples also hold identical values. > And PGSQL doesn't consider it to be a sequence. Could we see the exact contents of pg_class for the sequence? (both tuples including OIDs) > When I say 'DROP SEQUENCE ...' it says that it is not a sequence and advises > me to use 'DROP TABLE'. > When I say 'DROP TABLE ...' it says something like 'cannot drop system table > pg_temp.XXXXX.X' (I changed the actial digits into 'X'-es. Not sure about > exact words.) Hm, that sounds like some sort of conflict with a temp table. Is it possible that you have been using a temp table name that matches the sequence name? Are there any pg_class entries whose names begin with pg_temp, and if so could we see details on those too? regards, tom lane
> The "relfilenode" attribute of the tuple contained the value > "326210". I tried to find the file with that name in the > database directory - it really was not there. Did you look in lost+found? > Thus, the backend crash has led not only to impossibility to > complete my task, but also to loosing some data. The sequence > was actually created in an already committed transaction, not > in the one, that crashed. How much time passed after sequence creation till crash? Did you use -F (fsync = NO)? Vadim