Thread: Re: online debloatification (was: extending relations more efficiently)
On Wed, May 2, 2012 at 1:06 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mié may 02 12:55:17 -0400 2012: >> On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >> > Agreed. Perhaps to solve this issue what we need is a way to migrate >> > tuples from later pages into earlier ones. (This was one of the points, >> > never resolved, that we discussed during the VACUUM FULL rework.) >> >> Yeah, I agree. And frankly, we need to find a way to make it work >> without taking AccessExclusiveLock on the relation. Having to run >> VACUUM FULL is killing actual users and scaring off potential ones. > > And ideally without bloating the indexes while at it. Yeah. Brainstorming wildly, how about something like this: 1. Insert a new copy of the tuple onto some other heap page. The new tuple's xmin will be that of the process doing the tuple move, and we'll also set a flag indicating that a move is in progress. 2. Set a flag on the old tuple, indicating that a tuple move is in progress. Set its TID to the new location of the tuple. Set xmax to the tuple mover's XID. Optionally, truncate away the old tuple data, leaving just the tuple header. 3. Scan all indexes and replace any references to the old tuple's TID with references to the new tuple's TID. 4. Commit. 5. Once the XID of the tuple mover is all-visible, nuke the old TID and clear the flag on the new tuple indicating a move-in-progress (these two operations must be done together, atomically, with a single WAL record covering both). Any scan that encounters the old tuple will decide whether or not it can see the tuple based on the xmin & xmax in the old tuple's header. If it decides it can see it, it follows the TID pointer and does its work using the new tuple instead. Scans that encounter the new tuple need no special handling; the existing visibility rules are fine for that case. Prune operations must not truncate away tuples that are being moved into or out of the page, and vacuum must not mark pages containing such tuples as all-visible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Brainstorming wildly, how about something like this: > 1. Insert a new copy of the tuple onto some other heap page. The new > tuple's xmin will be that of the process doing the tuple move, and > we'll also set a flag indicating that a move is in progress. > 2. Set a flag on the old tuple, indicating that a tuple move is in > progress. Set its TID to the new location of the tuple. Set xmax to > the tuple mover's XID. Optionally, truncate away the old tuple data, > leaving just the tuple header. > 3. Scan all indexes and replace any references to the old tuple's TID > with references to the new tuple's TID. > 4. Commit. What happens when you crash partway through that? Also, what happens if somebody wishes to update the tuple before the last step is complete? In any case, this doesn't address the fundamental problem with unlocked tuple movement, which is that you can't just arbitrarily change a tuple's TID when there might be other operations relying on the TID to hold still. regards, tom lane
On Wed, May 2, 2012 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Brainstorming wildly, how about something like this: > >> 1. Insert a new copy of the tuple onto some other heap page. The new >> tuple's xmin will be that of the process doing the tuple move, and >> we'll also set a flag indicating that a move is in progress. >> 2. Set a flag on the old tuple, indicating that a tuple move is in >> progress. Set its TID to the new location of the tuple. Set xmax to >> the tuple mover's XID. Optionally, truncate away the old tuple data, >> leaving just the tuple header. >> 3. Scan all indexes and replace any references to the old tuple's TID >> with references to the new tuple's TID. >> 4. Commit. > > What happens when you crash partway through that? Well, there are probably a few loose ends here, but the idea is that if we crash after step 2 is complete, the next vacuum is responsible for performing steps 3 and 4. As written, there's probably a problem if we crash between (1) and (2); I think those would need to be done atomically, or at least we need to make sure that the moving-in flag is set on the new tuple if and only if there is actually a redirect pointing to it. > Also, what happens if > somebody wishes to update the tuple before the last step is complete? Then we let them. The idea is that they see the redirect tuple at the old TID, follow it to the new copy of the tuple, and update that instead. > In any case, this doesn't address the fundamental problem with unlocked > tuple movement, which is that you can't just arbitrarily change a > tuple's TID when there might be other operations relying on the TID > to hold still. Well, that's why I invented the redirect tuple, so that anyone who was relying on the TID to hold still would see the redirect and say, oh, I need to go look at this other TID instead. It's entirely possible there's some reason why that can't work, but at the moment I'm not seeing it. I see that there's a problem if the old TID gets freed while someone's relying on it, but replacing it with a pointer to some other TID seems like it ought to be workable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company