Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update - Mailing list pgsql-hackers
From | Jim Beckstrom |
---|---|
Subject | Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update |
Date | |
Msg-id | 3DE4D9F5.2070203@sbcglobal.net Whole thread Raw |
In response to | Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
Just for the humor of it, as well as to confirm Nick's perspective, years ago on our inhouse developed Burroughs mainframe dbms, we had a process called "garbage collect". Nicolai Tufar wrote: >I always wandered if VACUUM is the right name for the porcess. Now, when >PostgreSQL >is actively challenging in Enterprise space, it might be a good idea to give >it a more >enterprise-like name. Try to think how it is looking for an outside person >to see >us, database professionals hold lenghty discussions about the ways we >vacuum a database. Why should you need to vacuum a database? Is it >dirty? In my personal opinion, something like "space reclaiming daemon", >"free-list organizer", "tuple recyle job" or "segment coalesce process" >would >sound more business-like . > >Regards, >Nick > > >----- Original Message ----- >From: "Bruce Momjian" <pgman@candle.pha.pa.us> >To: "Curtis Faith" <curtis@galtair.com> >Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; "Ron Johnson" <ron.l.johnson@cox.net>; >"PgSQL Performance ML" <pgsql-performance@postgresql.org>; ><pgsql-hackers@postgresql.org> >Sent: Tuesday, November 26, 2002 9:09 PM >Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of >insert/delete/update > > > > >>Good ideas. I think the master solution is to hook the statistics >>daemon information into an automatic vacuum that could _know_ which >>tables need attention. >> >>-------------------------------------------------------------------------- >> >> >- > > >>Curtis Faith wrote: >> >> >>>tom lane wrote: >>> >>> >>>>Sure, it's just shuffling the housekeeping work from one place to >>>>another. The thing that I like about Postgres' approach is that we >>>>put the housekeeping in a background task (VACUUM) rather than in the >>>>critical path of foreground transaction commit. >>>> >>>> >>>Thinking with my marketing hat on, MVCC would be a much bigger win if >>> >>> >VACUUM > > >>>was not required (or was done automagically). The need for periodic >>> >>> >VACUUM > > >>>just gives ammunition to the PostgreSQL opponents who can claim we are >>>deferring work but that it amounts to the same thing. >>> >>>A fully automatic background VACUUM will significantly reduce but will >>> >>> >not > > >>>eliminate this perceived weakness. >>> >>>However, it always seemed to me there should be some way to reuse the >>> >>> >space > > >>>more dynamically and quickly than a background VACUUM thereby reducing >>> >>> >the > > >>>percentage of tuples that are expired in heavy update cases. If only a >>> >>> >very > > >>>tiny number of tuples on the disk are expired this will reduce the >>> >>> >aggregate > > >>>performance/space penalty of MVCC into insignificance for the majority >>> >>> >of > > >>>uses. >>> >>>Couldn't we reuse tuple and index space as soon as there are no >>> >>> >transactions > > >>>that depend on the old tuple or index values. I have imagined that this >>> >>> >was > > >>>always part of the long-term master plan. >>> >>>Couldn't we keep a list of dead tuples in shared memory and look in the >>> >>> >list > > >>>first when deciding where to place new values for inserts or updates so >>> >>> >we > > >>>don't have to rely on VACUUM (even a background one)? If there are >>> >>> >expired > > >>>tuple slots in the list these would be used before allocating a new slot >>> >>> >from > > >>>the tuple heap. >>> >>>The only issue is determining the lowest transaction ID for in-process >>>transactions which seems relatively easy to do (if it's not already done >>>somewhere). >>> >>>In the normal shutdown and startup case, a tuple VACUUM could be >>> >>> >performed > > >>>automatically. This would normally be very fast since there would not be >>> >>> >many > > >>>tuples in the list. >>> >>>Index slots would be handled differently since these cannot be >>> >>> >substituted > > >>>one for another. However, these could be recovered as part of every >>> >>> >index > > >>>page update. Pages would be scanned before being written and any expired >>>slots that had transaction ID's lower than the lowest active slot would >>> >>> >be > > >>>removed. This could be done for non-leaf pages as well and would result >>> >>> >in > > >>>only reorganizing a page that is already going to be written thereby not >>>adding much to the overall work. >>> >>>I don't think that internal pages that contain pointers to values in >>> >>> >nodes > > >>>further down the tree that are no longer in the leaf nodes because of >>> >>> >this > > >>>partial expired entry elimination will cause a problem since searches >>> >>> >and > > >>>scans will still work fine. >>> >>>Does VACUUM do something that could not be handled in this realtime >>> >>> >manner? > > >>>- Curtis >>> >>> >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 4: Don't 'kill -9' the postmaster >>> >>> >>> >>-- >> Bruce Momjian | http://candle.pha.pa.us >> pgman@candle.pha.pa.us | (610) 359-1001 >> + If your life is a hard drive, | 13 Roberts Road >> + Christ can be your backup. | Newtown Square, Pennsylvania >> >> >19073 > > >>---------------------------(end of broadcast)--------------------------- >>TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> >> > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > >
pgsql-hackers by date: