Thread: recovery from xid wraparound
Hi I'm running 7.4 on RHAS 4, and I think I've had a transaction id wraparound issue. Running the command below gives the suitably worrying negative number: emystery=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age ------------------+------------- [maindbname] | -2081610471 [otherdbname] | 1075601025 [otherdbname] | 1257289757 [otherdbname] | 1074582099 [otherdbname] | 1257289757 Which is weird - because I have vacuumed the database quite a lot - both individual tables and I thought a vacuum of the whole database a month or so ago. Anyway - not noticed any data loss yet and was hoping it would be such that if all tables had been vacuumed recently (including system catalog tables), that there would be no remaining rows that would appear to have a future xid and so the database should be ok? Obviously I'm now doing the write thing with a vacuumdb -a - however this has been running 9 hours now and looks like at least 7 hours to go just on this one monstrous table in the interests of risk reduction I've just knocked up a script to run ahead and quickl vacuum all the other tables. But my questions are thus... a) is my assumption about the database being ok correct - assuming all tables have been vacuumed recently, including catalog tables? b) is it possible to safely abort my whole table vacuum now so I can run it at the weekend when there's less traffic? c) if I have experienced data loss, on the assumption all the table structure remains (looks like it does), and I have a working backup from before the xid wraparound (I do), can I just reinsert any detected-missing data at the application level without needing a dump/reload? Any help appreciated in this really not-fun time, thanks S
Incidentally, how many passes of a table can vacuum make! Its currently on its third trip through the 20Gb of indices, meaning another 7 hours till completion [of this table]!. Assume it only does three passes? (it chooses based on the table continuing to be updated while vacuum is running) S -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: 24 October 2006 10:24 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote: > Anyway - not noticed any data loss yet and was hoping it would be such > that if all tables had been vacuumed recently (including system > catalog tables), that there would be no remaining rows that would > appear to have a future xid and so the database should be ok? Running vacuum is the right solution, but I think you have to let it finish. In particular, in that version a database-wide vacuum has to complete before it will update the datfrozenxid (it's not tracked per table). > a) is my assumption about the database being ok correct - assuming all > tables have been vacuumed recently, including catalog tables? Should be ok, but apparently you missed one, or didn't do a database wide vacuum. > b) is it possible to safely abort my whole table vacuum now so I can > run it at the weekend when there's less traffic? Aborting vacuum is safe, but you have to do a database-wide vacuum at some point. > c) if I have experienced data loss, on the assumption all the table > structure remains (looks like it does), and I have a working backup > from before the xid wraparound (I do), can I just reinsert any > detected-missing data at the application level without needing a > dump/reload? A VACUUM will recover any data that slipped beyond the horizon less than 1 billion transactions ago, which I think covers you completely. The only issue is that unique indexes may be confused because new conflicting data may have been inserted while the old data was invisible. Only you can say if that's going to be an issue. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate.
Shane Wright wrote: > > Incidentally, how many passes of a table can vacuum make! Its currently > on its third trip through the 20Gb of indices, meaning another 7 hours > till completion [of this table]!. > > Assume it only does three passes? (it chooses based on the table > continuing to be updated while vacuum is running) As many passes at it needs to. It is limited by maintenance_work_mem (in 7.4 I think it was sort_mem). It needs to collect an array of tuple pointers, and it keeps them in memory. When the array grows to maintenance_work_mem, it stops scanning the table and scans the indexes, removing everything that points to those tuple pointers. Then it goes back to scanning the table. So the problem is that it scans the whole indexes many times. If you increase maintenance_work_mem way up for this vacuum task, it will need to wholly scan the indexes less times (hopefully only one), making the process a lot faster. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Shane Wright" <shane.wright@edigitalresearch.com> writes: > Incidentally, how many passes of a table can vacuum make! Lots, especially if the table hasn't been vacuumed in a long time... Perhaps you should be using a higher maintenance_work_mem? (Um, in 7.4 make that vacuum_mem.) Larger work memory translates directly to fewer passes over the indexes. regards, tom lane
On Tue, Oct 24, 2006 at 03:47:52PM +0100, Shane Wright wrote: > > Incidentally, how many passes of a table can vacuum make! Its currently > on its third trip through the 20Gb of indices, meaning another 7 hours > till completion [of this table]!. > > Assume it only does three passes? (it chooses based on the table > continuing to be updated while vacuum is running) It depends on how many tuples it needs to process and how much memory you gave it (the maintainence_work_mem settings). The more memory you give it, the less passes it needs to do... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Aw :( Its at the default of 8Mb. The table contains 220 million rows and 6 indices. It has a few deleted rows... If I change vacuum_mem I'll need to at least 'pg_ctl reload' - will it apply straightaway with the next vacuum query or doesit need a full restart? Does vacuum_mem need shared memory? (i.e. is it subject to the OS's limit) - have looked in the docs and googled but can'tsee detail on this If I have managed to vacuum all the catalog tables, and my script has ensured all user tables other than this one have beenvacuumed, then... will the first pass of vacuum on this have set the xid to FrozenXID for all rows - i.e. is the tablesafe? What's the relative safety of restarting this vacuum with a bigger vacuum_mem, say at the end of the week when traffic isquieter? Basically if its just datfrozenxid that's not updated I can live with delaying the vacuum a few days. But if things aremore serious then obviously I can't wait. Is it safe to say that if the catalog tables are ok and an individual tables has been vacuumed then its data is safe? S -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: 24 October 2006 15:52 To: Shane Wright Cc: pgsql-general@postgresql.org; Martijn van Oosterhout Subject: Re: [GENERAL] recovery from xid wraparound "Shane Wright" <shane.wright@edigitalresearch.com> writes: > Incidentally, how many passes of a table can vacuum make! Lots, especially if the table hasn't been vacuumed in a long time... Perhaps you should be using a higher maintenance_work_mem?(Um, in 7.4 make that vacuum_mem.) Larger work memory translates directly to fewer passes over theindexes. regards, tom lane
On Tue, Oct 24, 2006 at 04:18:09PM +0100, Shane Wright wrote: > If I change vacuum_mem I'll need to at least 'pg_ctl reload' - will > it apply straightaway with the next vacuum query or does it need a > full restart? You can control it per session I think. So you can start psql and type: # set vacuum_mem=<huge>; SET # VACUUM VERBOSE; <blah> You don't have to change the main config, unless you want it to apply forever. Although, 8MB is small in general so you might want to up it anyway. But for this one-off vacuum of this large table you could give a much larger amount of memory. > Does vacuum_mem need shared memory? (i.e. is it subject to the OS's > limit) - have looked in the docs and googled but can't see detail on > this It's just ordinary memory. If you have a few gig to spare, you can give it all to the vacuum. > If I have managed to vacuum all the catalog tables, and my script has > ensured all user tables other than this one have been vacuumed, > then... will the first pass of vacuum on this have set the xid to > FrozenXID for all rows - i.e. is the table safe? Pass. Although I think the point is that it hasn't scanned to whole table yet because it ran out of memory... > Is it safe to say that if the catalog tables are ok and an individual > tables has been vacuumed then its data is safe? Yes... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
"Shane Wright" <shane.wright@edigitalresearch.com> writes: > If I change vacuum_mem I'll need to at least 'pg_ctl reload' - will it apply straightaway with the next vacuum query ordoes it need a full restart? reload is enough. > Basically if its just datfrozenxid that's not updated I can live with delaying the vacuum a few days. But if things aremore serious then obviously I can't wait. The question is how close to the wraparound horizon is any of your data. We don't really know that --- the datfrozenxid provides a lower bound but we don't know where things are in reality. Also, are you prepared to tolerate wrong answers (missing rows) for awhile? As Martijn mentioned, the vacuum will retrieve rows that have slid past the wrap horizon, but they'll have been invisible to your queries meanwhile. regards, tom lane