Thread: Missing tables in postgresql 7.2.4
Hi, we recently discovered on our production database an a little bit bizarre problem (after two years stable operations). Some tables are simply missing, or sometimes the affected table(s) is/are there but not listed in pg_tables. An example (I am looking for the table kog_blasthit_tables): dev_db=# select version(); version ------------------------------------------------------------------------- PostgreSQL 7.2.4 on sparc-sun-solaris2.8, compiled by GCC gcc (GCC) 3.1 (1 row) dev_db=# \d kog_ <TAB EXPANSION> kog_blasthits_template kog_hits_obj_2374_q411_db7 kog_hits_obj_1016_q344_db7 kog_hits_obj_2396_q833_db7 kog_hits_obj_1341_q475_db7 kog_hits_obj_2491_q505_db7 kog_hits_obj_1362_q412_db7 kog_hits_obj_2516_q900_db7 kog_hits_obj_1364_q413_db7 kog_hits_obj_2559_q78_db7 kog_hits_obj_1584_q570_db7 kog_hits_obj_822_q369_db7 kog_hits_obj_1604_q574_db7 kog_hits_obj_834_q371_db7 kog_hits_obj_1660_q608_db7 kog_hits_obj_846_q375_db7 kog_hits_obj_1725_q650_db7 kog_hits_obj_880_q339_db7 kog_hits_obj_1737_q605_db7 kog_info kog_hits_obj_2186_q686_db7 kog_org // No kog_blasthit_tables !!! BUT it seems to be there: dev_db=# select * FROM kog_blasthit_tables LIMIT 1; id | object_id | query_set_id | db_set_id | num_of_query_sequences | table_name | table_desc | number_of_entries | blast_parameters | blast_prg | evalue_cutoff | owner | project | created | status | number_of_unique_entries | reciprocal_first_id | reciprocal_second_id -----+-----------+--------------+-----------+------------------------+----------------------------+------------+-------------------+-------------------------------------------+-----------+---------------+----------+---------+-------------------------------+----------+--------------------------+---------------------+---------------------- 398 | 1341 | 475 | 7 | 26187 | kog_hits_obj_1341_q475_db7 | na | 0 | -a 1 -m 7 -T F -F T -p blastx -e 0.100000 | kogblast | 0.1 | mbeckste | 23 | 2004-02-26 18:36:48.048881+01 | finished | 27426 | -1 | -1 (1 row) dev_db=# // The table seems to be there but not in pg_tables dev_db=# select * FROM pg_tables WHERE tablename='kog_blasthit_tables'; tablename | tableowner | hasindexes | hasrules | hastriggers -----------+------------+------------+----------+------------- (0 rows) dev_db=# \d pg_ta <TAB EXPANSION> // nothing but dev_db=# select count(*) FROM pg_tables; count ------- 875 (1 row) The case which is even worth, is that the table is missing completely. We have a catalog table that keeps track of generated tables, so we know that it has to be there. The postmaster logs (with high debug level) show no DROP TABLE between the time of the CREATE TABLE statement of the affected table and the time where we noticed that it is missing. I know that this is probably not enough information for a precise analysis. May be someone can give me a hint how to further investigate this issue. regards Michael -- ------------------------------------------------------------------------------ Dipl.-Inform. Michael Beckstette Office: M3-129 AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de Universitaet Bielefeld Fon: +49-521-106-2914 Postfach 100131 Fax: +49-521-106-6411 D-33501 BIELEFELD Germany
"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > we recently discovered on our production database an a little bit bizarre > problem (after two years stable operations). Some tables are simply missing, or > sometimes the affected table(s) is/are there but not listed in pg_tables. This sounds a bit like a transaction ID wraparound problem. Have you been vacuuming your whole database on a reasonable schedule? The missing tables might conceivably be old enough that their pg_class rows have wrapped around "into the future". It'd be useful to look at SELECT datname, age(datfrozenxid) FROM pg_database; regards, tom lane
Hi Tom, this is the output from 'SELECT datname, age(datfrozenxid) FROM pg_database;' datname | age -------------+------------- xgc | -1950241750 dev_db | -1886587214 template1 | -1884294460 template0 | -1884294460 promo_db | -1884294460 snap_db_new | -1884294460 gendev_db | 1887538137 (7 rows) dev_db=# The affected DB is 'dev_db', although it looks like the others except 'gendev_db' have a wraparound problem too (?). To answer your question about the VACUUM: We VACUUM FULL a few tables with a high amount of INSERT/DEL operations once per hour, but I have to admit that we perform a VACUUM of the whole DB not on a regulary basis. I think the last one was several monthes ago. Further on we use transactions at several places and we have at least 20 transactions per minute. Does now a normal VACUUM FULL of the whole DB(s) fix our problem? Michael On May 11, 11:51am, Tom Lane wrote: > Subject: Re: [BUGS] Missing tables in postgresql 7.2.4 > "Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > > we recently discovered on our production database an a little bit bizarre > > problem (after two years stable operations). Some tables are simply missing, or > > sometimes the affected table(s) is/are there but not listed in pg_tables. > > This sounds a bit like a transaction ID wraparound problem. Have you > been vacuuming your whole database on a reasonable schedule? The > missing tables might conceivably be old enough that their pg_class rows > have wrapped around "into the future". It'd be useful to look at > SELECT datname, age(datfrozenxid) FROM pg_database; > > regards, tom lane -- ------------------------------------------------------------------------------ Dipl.-Inform. Michael Beckstette Office: M3-129 AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de Universitaet Bielefeld Fon: +49-521-106-2914 Postfach 100131 Fax: +49-521-106-6411 D-33501 BIELEFELD Germany
Hi, after reading the docs (I know it was a little bit late), I am now relatively sure that I trapped into a transaction ID wraparound problem. For me its now a little bit unclear, how to proceed in order to minimize the caused damage. I checked all tables in the affected DB. Till now, the results are as follows: 6 user tables are completely lost. 8 user tables are not listed in pg_tables but still accessible by a SELECT. The 6 completely lost tables are not so dramatical, because they contain only static data, that I can restore from the development system. But what happens with the 8 tables that are still accessable, but not listed in pg_tables, after a VACUUM? Will they be removed completely or 'reinserted' into pg_tables? Does anyone has an advise how to proceed in this situation? Regards Michael On May 11, 6:11pm, Michael Beckstette wrote: > Subject: Re: [BUGS] Missing tables in postgresql 7.2.4 > Hi Tom, > > this is the output from 'SELECT datname, age(datfrozenxid) FROM pg_database;' > > datname | age > -------------+------------- > xgc | -1950241750 > dev_db | -1886587214 > template1 | -1884294460 > template0 | -1884294460 > promo_db | -1884294460 > snap_db_new | -1884294460 > gendev_db | 1887538137 > (7 rows) > > dev_db=# > > The affected DB is 'dev_db', although it looks like the others except > 'gendev_db' have a wraparound problem too (?). To answer your question about > the VACUUM: We VACUUM FULL a few tables with a high amount of INSERT/DEL > operations once per hour, but I have to admit that we perform a VACUUM of the > whole DB not on a regulary basis. I think the last one was several monthes ago. > Further on we use transactions at several places and we have at least 20 > transactions per minute. > > Does now a normal VACUUM FULL of the whole DB(s) fix our problem? > > Michael > > > On May 11, 11:51am, Tom Lane wrote: > > Subject: Re: [BUGS] Missing tables in postgresql 7.2.4 > > "Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > > > we recently discovered on our production database an a little bit bizarre > > > problem (after two years stable operations). Some tables are simply > missing, or > > > sometimes the affected table(s) is/are there but not listed in pg_tables. > > > > This sounds a bit like a transaction ID wraparound problem. Have you > > been vacuuming your whole database on a reasonable schedule? The > > missing tables might conceivably be old enough that their pg_class rows > > have wrapped around "into the future". It'd be useful to look at > > SELECT datname, age(datfrozenxid) FROM pg_database; > > > > regards, tom lane > > > > -- > > ------------------------------------------------------------------------------ > Dipl.-Inform. Michael Beckstette Office: M3-129 > AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de > Universitaet Bielefeld Fon: +49-521-106-2914 > Postfach 100131 Fax: +49-521-106-6411 > D-33501 BIELEFELD > Germany > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >-- End of excerpt from Michael Beckstette -- ------------------------------------------------------------------------------ Dipl.-Inform. Michael Beckstette Office: M3-129 AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de Universitaet Bielefeld Fon: +49-521-106-2914 Postfach 100131 Fax: +49-521-106-6411 D-33501 BIELEFELD Germany
"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > The 6 completely lost tables are not so dramatical, because they contain only > static data, that I can restore from the development system. But what happens > with the 8 tables that are still accessable, but not listed in pg_tables, after > a VACUUM? Will they be removed completely or 'reinserted' into pg_tables? > Does anyone has an advise how to proceed in this situation? What I would recommend as a first step is to stop the postmaster and then take a tarball backup of the entire $PGDATA tree. This will at least provide a chance to go back if subsequent tries mess things up completely. After that, I'd try a plain VACUUM (not FULL, not FREEZE) of pg_class and see if that restores the missing tables to view in pg_tables. If it does, go ahead and do a database-wide plain VACUUM, and you should be OK. If it doesn't, we'll need to think of another plan. regards, tom lane
Hi, On May 11, 2:39pm, Tom Lane wrote: > What I would recommend as a first step is to stop the postmaster and > then take a tarball backup of the entire $PGDATA tree. This will at > least provide a chance to go back if subsequent tries mess things up > completely. Done. This was probably the biggest tar ball I have ever build (~450GB) ;) > > After that, I'd try a plain VACUUM (not FULL, not FREEZE) of pg_class > and see if that restores the missing tables to view in pg_tables. Done. The tables that were missing in pg_tables (but accessible with SELECT) are now listed in pg_tables. > If it does, go ahead and do a database-wide plain VACUUM, and you > should be OK. Done. As far as I can tell, everything is OK again. Thanks a lot Tom! P.S.:A TODO for me: CRON Script for weekly VACUUM ;) Michael -- ------------------------------------------------------------------------------ Dipl.-Inform. Michael Beckstette Office: M3-129 AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de Universitaet Bielefeld Fon: +49-521-106-2914 Postfach 100131 Fax: +49-521-106-6411 D-33501 BIELEFELD Germany
> P.S.:A TODO for me: CRON Script for weekly VACUUM ;) on heavy use databases, mine generally does a light vacuum every 4 hours, and a once a day full on everything. also, a weekly full reindex on a really really heavy use systems like this one message board server I ad-mangle something like... 7 */4 * * * vacuumdb busy_database >/dev/null 2>/dev/null 37 0 * * * vacuumdb -a -z >/dev/null 2>/dev/null 37 2 * * Sun reindexdb -a > /dev/null 2>/dev/null many might say this is total overkill, I dunno. keeps this server happy. btw, reindexdb is a script from contrib.
"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > On May 11, 2:39pm, Tom Lane wrote: >> If it does, go ahead and do a database-wide plain VACUUM, and you >> should be OK. > Done. As far as I can tell, everything is OK again. Sweet ;-) In the words of my former business partner, a private pilot with more hours aloft than many airline captains: "Walked away from another one ..." For the benefit of onlookers, the gambit being played here went like this: the missing pg_class rows must have fairly recently wrapped around the 2G transaction mark with respect to the current XID counter. That made them "in the future" not "in the past" as far as normal queries go. However, a VACUUM will freeze-as-good any tuples that are "in the past" with respect to the vacuum freeze time, which for a plain VACUUM is 1G transactions ago. So as long as Michael notices he has a problem within 1 billion transactions of having a problem, he can get out of it. I cannot claim that this behavior was operating-as-designed, because I'm pretty sure we hadn't thought it through when planning the wraparound XID behavior. But we walked away from another one. regards, tom lane
Hello, two last questions about the transaction ID wraparound problem of my DBs I had yesterday (see former postings). After applying the 'recovery procedure' Tom suggested now my XIDs are looking almost fine again, except for the template0 DB. dev_db=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age -------------+------------- xgc | 1075434814 dev_db | 1074296718 template1 | 1075486644 template0 | -1882457315 promo_db | 1075528357 snap_db_new | 1075528467 gendev_db | 1075512627 Is this negligible or can it cause any harm in the future? Is there a way to VACUUM template0 as well? My second point is more a suggestion. After Toms strategy worked quite well (at least for me) it is maybe worthwhile to put it somewhere in the docs. Maybe together with other things in a section called 'Disaster recovery strategies'? regards Michael -- ------------------------------------------------------------------------------ Dipl.-Inform. Michael Beckstette Office: M3-129 AG-PI / Technische Fakultaet EMail:mbeckste@techfak.uni-bielefeld.de Universitaet Bielefeld Fon: +49-521-106-2914 Postfach 100131 Fax: +49-521-106-6411 D-33501 BIELEFELD Germany
"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes: > two last questions about the transaction ID wraparound problem of my DBs I had > yesterday (see former postings). After applying the 'recovery procedure' Tom > suggested now my XIDs are looking almost fine again, except for the template0 > DB. You don't have to worry about template0 -- it was "frozen" during initdb and does not need vacuuming. > My second point is more a suggestion. After Toms strategy worked quite > well (at least for me) it is maybe worthwhile to put it somewhere in > the docs. Maybe together with other things in a section called > 'Disaster recovery strategies'? I'm not sure it can be claimed to be tested well enough to publish as a recovery strategy. I'm glad it worked for you, but ... regards, tom lane