Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update |
Date | |
Msg-id | as2rhg$178u$1@news.hub.org Whole thread Raw |
In response to | [PERFORM] Realtime VACUUM, was: performance of insert/delete/update ("Curtis Faith" <curtis@galtair.com>) |
List | pgsql-hackers |
How about OPTIMIZE? eg. optimize customers instead of analyze, could be paired with agressive so, OPTIMIZE AGREESSIVE very much a glass half empty, half full type thing. vacuum is not a problem, its a solution. Merlin ""Curtis Faith"" <curtis@galtair.com> wrote in message news:DMEEJMCDOJAKPPFACMPMIEIDCFAA.curtis@galtair.com... > tom lane wrote: > > Sure, it's just shuffling the housekeeping work from one place to > > another. The thing that I like about Postgres' approach is that we > > put the housekeeping in a background task (VACUUM) rather than in the > > critical path of foreground transaction commit. > > Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM > was not required (or was done automagically). The need for periodic VACUUM > just gives ammunition to the PostgreSQL opponents who can claim we are > deferring work but that it amounts to the same thing. > > A fully automatic background VACUUM will significantly reduce but will not > eliminate this perceived weakness. > > However, it always seemed to me there should be some way to reuse the space > more dynamically and quickly than a background VACUUM thereby reducing the > percentage of tuples that are expired in heavy update cases. If only a very > tiny number of tuples on the disk are expired this will reduce the aggregate > performance/space penalty of MVCC into insignificance for the majority of > uses. > > Couldn't we reuse tuple and index space as soon as there are no transactions > that depend on the old tuple or index values. I have imagined that this was > always part of the long-term master plan. > > Couldn't we keep a list of dead tuples in shared memory and look in the list > first when deciding where to place new values for inserts or updates so we > don't have to rely on VACUUM (even a background one)? If there are expired > tuple slots in the list these would be used before allocating a new slot from > the tuple heap. > > The only issue is determining the lowest transaction ID for in-process > transactions which seems relatively easy to do (if it's not already done > somewhere). > > In the normal shutdown and startup case, a tuple VACUUM could be performed > automatically. This would normally be very fast since there would not be many > tuples in the list. > > Index slots would be handled differently since these cannot be substituted > one for another. However, these could be recovered as part of every index > page update. Pages would be scanned before being written and any expired > slots that had transaction ID's lower than the lowest active slot would be > removed. This could be done for non-leaf pages as well and would result in > only reorganizing a page that is already going to be written thereby not > adding much to the overall work. > > I don't think that internal pages that contain pointers to values in nodes > further down the tree that are no longer in the leaf nodes because of this > partial expired entry elimination will cause a problem since searches and > scans will still work fine. > > Does VACUUM do something that could not be handled in this realtime manner? > > - Curtis > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
pgsql-hackers by date: