Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Plans for solving the VACUUM problem |
Date | |
Msg-id | 27248.990232035@sss.pgh.pa.us Whole thread Raw |
In response to | RE: Plans for solving the VACUUM problem ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>) |
List | pgsql-hackers |
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> If a tuple is dead, we care not whether its index entries are still >> around or not; so there's no risk to logical consistency. > What does this sentence mean? We canNOT remove dead heap tuple untill > we know that there are no index tuples referencing it and your A,B,C > reflect this, so ..? Sorry if it wasn't clear. I meant that if the vacuum process fails after removing an index tuple but before removing the (dead) heap tuple it points to, there's no need to try to undo. That state is OK, and when we next get a chance to vacuum we'll still be able to finish removing the heap tuple. >> Another place where lazy VACUUM may be unable to do its job completely >> is in compaction of space on individual disk pages. It can physically >> move tuples to perform compaction only if there are not currently any >> other backends with pointers into that page (which can be tested by >> looking to see if the buffer reference count is one). Again, we punt >> and leave the space to be compacted next time if we can't do it right >> away. > We could keep share buffer lock (or add some other kind of lock) > untill tuple projected - after projection we need not to read data > for fetched tuple from shared buffer and time between fetching > tuple and projection is very short, so keeping lock on buffer will > not impact concurrency significantly. Or drop the pin on the buffer to show we no longer have a pointer to it. I'm not sure that the time to do projection is short though --- what if there are arbitrary user-defined functions in the quals or the projection targetlist? > Or we could register callback cleanup function with buffer so bufmgr > would call it when refcnt drops to 0. Hmm ... might work. There's no guarantee that the refcnt would drop to zero before the current backend exits, however. Perhaps set a flag in the shared buffer header, and the last guy to drop his pin is supposed to do the cleanup? But then you'd be pushing VACUUM's work into productive transactions, which is probably not the way to go. >> This is mainly a problem of a poorly chosen API. The index AMs >> should offer a "bulk delete" call, which is passed a sorted array >> of main-table TIDs. The loop over the index tuples should happen >> internally to the index AM. > I agreed with others who think that the main problem of index cleanup > is reading all index data pages to remove some index tuples. For very small numbers of tuples that might be true. But I'm not convinced it's worth worrying about. If there aren't many tuples to be freed, perhaps VACUUM shouldn't do anything at all. > Well, probably it's ok for first implementation and you'll win some CPU > with "bulk delete" - I'm not sure how much, though, and there is more > significant issue with index cleanup if table is not locked exclusively: > concurrent index scan returns tuple (and unlock index page), heap_fetch > reads table row and find that it's dead, now index scan *must* find > current index tuple to continue, but background vacuum could already > remove that index tuple => elog(FATAL, "_bt_restscan: my bits moved..."); Hm. Good point ... > Two ways: hold index page lock untill heap tuple is checked or (rough > schema) > store info in shmem (just IndexTupleData.t_tid and flag) that an index tuple > is used by some scan so cleaner could change stored TID (get one from prev > index tuple) and set flag to help scan restore its current position on > return. Another way is to mark the index tuple "gone but not forgotten", so to speak --- mark it dead without removing it. (We could know that we need to do that if we see someone else has a buffer pin on the index page.) In this state, the index scan coming back to work would still be allowed to find the index tuple, but no other index scan would stop on the tuple. Later passes of vacuum would eventually remove the index tuple, whenever vacuum happened to pass through at an instant where no one has a pin on that index page. None of these seem real clean though. Needs more thought. > Well, my current TODO looks as (ORDER BY PRIORITY DESC): > 1. UNDO; > 2. New SMGR; > 3. Space reusing. > and I cannot commit at this point anything about 3. So, why not to refine > vacuum if you want it. I, personally, was never be able to convince myself > to spend time for this. Okay, good. I was worried that this idea would conflict with what you were doing, but it seems it won't. regards, tom lane
pgsql-hackers by date: