Re: Speeding up an in-progress wraparound-preventing vacuum - Mailing list pgsql-general
From | Vincent de Phily |
---|---|
Subject | Re: Speeding up an in-progress wraparound-preventing vacuum |
Date | |
Msg-id | 3303554.B2OmElZI50@moltowork Whole thread Raw |
In response to | Re: Speeding up an in-progress wraparound-preventing vacuum (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Speeding up an in-progress wraparound-preventing vacuum
Re: Speeding up an in-progress wraparound-preventing vacuum |
List | pgsql-general |
On Monday 08 December 2014 10:17:37 Jeff Janes wrote: > On Mon, Dec 8, 2014 at 4:54 AM, Vincent de Phily < > > > bloat, which I'd like to get back asap). Currently about 80% of the IO is > > devoted to the vacuum process (on average throughout the day, as > > extrapolated > > from atop output). > > Is that 80% of the actually occurring IO, or 80% of the maximum possible IO? 80% of max possible IO, although I'm not sure how atop determines what the max is. It's a fairly reliable metric of "the drive is a busy as can be" in my experience. > Increasing maintenance_work_mem even further, at least temporarily and > locally for this operation, might be a good idea. Ok, I thought 512M was already overgenerous, but I'll try increasing that too. > > My first question is: is there a way to speedup the currently-running > > vacuum > > process ? I guess killing it to let it be replaced by a more agressively- > > configured one would work, but I'd hate to lose 2 weeks of processing (is > > there a way to estimate how much more vacuuming work remains to be done > > ?), > > and I'm being a bit more cautious with the wraparound-preventing kind. > > I don't know of a way to speed it up gracefully. That has frustrated me a > few times, and a solution would really be nice. > > If you kill it, the new process will have to re-read the entire table, but > it will have much less work to do since the killed process already 'plowed > a path' for it. In a sense, killing the process will throw away all of the > sequential read work on the table that has already been done, but the index > reads and the writing workload is not all lost, it will save the new > process time on those. Ah good point, even if it has to re-read all the pages, it'll only have to write fozenxid for the pages that weren't processed before, that's good to hear. Isn'there also something about vacuum marking a page as "all empty", so that it can be skiped by the next run ? I don't get what index read is saved between vacuum runs ? I have 64G of RAM on this box, so there's no hope of the index staying in memory. > You can `strace` for the lseek command to see which file handles it is > currently working on, and > use lsof to turn those into names. You want to look at where it is in the > table files, not the index files. Neat trick, thanks. I'll try it in the morning when I'm fresh. Are the files read in C-sorting order ? > > Second question is: how come we reached the wraparound threshold on this > > table > > at all ? We've only been keeping 28 days of data in this table, doing > > incremental deletes every day (there are no updates at all). I find it > > very > > unlikely that we'd go through 2M transactions in that timespan (that'd > > need > > 890 transactions per second, we're well below 100). The > > pg_class.relfozenxid > > on that table is at 680M, while most other tables are around 860M. Could > > it be > > that routine vacuums haven't been able to update the relfrozenxid in a > > long > > long time, or am I missing something else ? > > PostgreSQL doesn't know what your delete routine is like. It has to verify > with its own eyes that there are no rows over a certain age. My assumption was that routine vacuuming was able to update the table's relfroxenxid, but I realize now that PG needs to do a full scan before updating that. > I don't think that routine vacuums even attempts to update relfrozenxid, or > at least doesn't try very hard. AFAIU it does (the 6th paragraph in 23.1.4 seems to imply it), and vacuum_freeze_min_age controls when. Perhaps lowering that value would help prepare a lot of the anti-wraparound work. Pity there's no "frozen pages map" (or is there ?) to keep track of pages will all-frozen tuples, it could speed up that anti-wraparound vacuum. > Are you sure that routine vacuums have been running to completion on this > table, as opposed to getting interrupted by something before finishing each > time? I checked pg_stat_user_tables.last_{auto_,}vacuum for this table and they are both null. This is seriously worrying. I've seen autovacuum take a few days on this table but always assumed that it finished properly. And I'm pretty sure vacuuming does some work, otherwise my disk usage woul grow much faster. I have no idea what could cause vacuuming to systematically stop before the end. Maybe I'll find something in the logs. Thanks for your answers. -- Vincent de Phily
pgsql-general by date: