Re: Resumable vacuum proposal and design overview - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Resumable vacuum proposal and design overview |
Date | |
Msg-id | 1172651791.3760.828.camel@silverbirch.site Whole thread Raw |
In response to | Re: Resumable vacuum proposal and design overview (Galy Lee <lee.galy@oss.ntt.co.jp>) |
Responses |
Re: Resumable vacuum proposal and design overview
Re: Resumable vacuum proposal and design overview Re: Resumable vacuum proposal and design overview |
List | pgsql-hackers |
On Wed, 2007-02-28 at 13:53 +0900, Galy Lee wrote: > > Tom Lane wrote: > > Huh? There is no extra cost in what I suggested; it'll perform > > exactly the same number of index scans that it would do anyway. > > The things I wanted to say is that: > If we can stop at any point, we can make maintenance memory large > sufficient to contain all of the dead tuples, then we only need to > clean index for once. No matter how many times vacuum stops, > indexes are cleaned for once. I agree that the cycle-at-a-time approach could perform more poorly with repeated stop-start. The reason for the suggestion was robustness, not performance. If you provide the wrong dead-tuple-list to VACUUM, you will destroy the integrity of a table, which can result in silent data loss. You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. > But in your proposal, indexes will be scan as many as vacuum stops. > Those extra indexes cleaning are thought as the extra cost compared > with stop-on-dime approach. To vacuum a large table by stopping 8 > times, tests show the extra cost can be one third of the stop-on-dime > approach. But the VACUUM is being run during your maintenance window, so why do you care about performance of VACUUM during that time? There is some inefficiency in the VACUUM process, but seems like a high price to pay for more robustness. Does the loss of efficiency during VACUUM translate directly into reduced performance during operational periods? I think not. Deferring completion of VACUUM means deferring refreshing the FSM. Allowing cycle-at-a-time VACUUM would allow the FSM to be updated after each run, thus releasing space for reuse again. ISTM that the saving-dead-list approach would defer the updating of the FSM for many days in your situation. If you would like to reduce VACUUM times have you considered partitioning? It can be very effective at isolating changes and is designed specifically to cope with large data maintenance issues. If there are issues that prevent the use of partitioning in your case, perhaps we should be discussing those instead? Migration from a non-partitioned environment to a partitioned one is quite simple from 8.2 onwards. > >So I'm not really convinced that being able to stop a table > > vacuum halfway is critical. > To run vacuum on the same table for a long period, it is critical > to be sure: > 1. not to eat resources that foreground processes needs > 2. not to block vacuuming of hot-updated tables > 3. not to block any transaction, not to block any backup activities > > In the current implementation of concurrent vacuum, the third is not > satisfied obviously, the first issue comes to my mind is the > lazy_truncate_heap, it takes AccessExclusiveLock for a long time, > that is problematic. Are you saying you know for certain this lock is held for a long time, or are you just saying you think it is? If you have some evidence for long truncation times then that would be a separate issue of concern, since that might starve out normal users. Please say more? ISTM that if you can refresh the FSM more frequently you will have less need to truncate the relation at the end of each run. After some time, I would expect that no truncation would be required because of the cyclic reuse of space within the table, rather than extension/truncation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: