Re: UNDO and in-place update - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: UNDO and in-place update |
Date | |
Msg-id | CAA4eK1JMw=Eupx9OXB808fQ6kn3jAgBMUu=XaAKSDis5fJpXnA@mail.gmail.com Whole thread Raw |
In response to | Re: UNDO and in-place update (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: UNDO and in-place update
|
List | pgsql-hackers |
On Thu, Dec 1, 2016 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 29, 2016 at 12:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > I see what you're going for, but I'm not sure it's worth it. I mean, > say you just have one bit per index tuple. If it's set, the heap > tuple is definitely all-visible; if it's clear, then the tuple may or > may not be visible to your snapshot. You insert tuples with the bit > clear, and clear the bit again when the tuple is deleted or the > indexed column is updated. You set the bit when you follow the index > pointer and discover that the tuple is all-visible, so that next time > you don't need to follow it again. Of the advantages you mention > above, this still allows (a), but less efficiently. It allows (b). > It does not allow (c). So, it's not as good. But, it saves you the > overhead of storing xmin and/or xmax for each tuple in the page. Even > if you have an optimization to list the XIDs once per page and then > refer to them from the tuples, you are going to need a byte in each > tuple for the xmin-index and a byte for the xmax-index, or something > like that, plus the space for the XID list itself. That's not a ton > of space, maybe, but it's definitely more. It's also a significantly > bigger change to the tuple format, I think. What I'm proposing could > be done by repurposing the LP_* bits available in the line pointer. > So I think the questions are (1) will the extra space consumption hurt > us more than the benefit we will get? and (2) is it really worth the > work to change the page format to that degree? > > As you can probably tell, I'm leaning toward "no". I think a bit > per-tuple -- whether we call it delete-mark or definitely-all-visible > -- should be enough to get most of the benefit that is available here > for substantially less work and no extra space. > I can very well understand the reason for not doing so (IIUC, it is about complexity and time to get it right when we are already trying to solve one big and complex problem of the system), but saying most of the benefit can be realized by having simple optimization seems less convincing. I think having simple optimizations won't solve the multi-pass vacuum problem. However, having the visibility information in the index will solve that and avoid index bloat much aggressively. >> I think with proposed design there is a good chance that for many of >> the index scans, there is a need for the extra hop to decide >> visibility of index tuple. I think as of today, during index scan if >> the corresponding heap page is not-all-visible, then we check the >> corresponding heap tuple to decide about the visibility of index >> tuple, whereas with proposed design, I think it first needs to check >> heap page and then TPD, if there is more than one transaction >> operating on page. > > There's a couple of possible designs here, but there is the > possibility for extra hops in some cases. But there are things we can > do to mitigate that. > > 1. If each tuple has a definitely-all-visible bit, you can check that > first; if it's set, the tuple is definitely all-visible you don't need > to do anything further. > > 2. If we still maintain a visibility map, you can check that next. If > the bit is set for the target page, then the tuple is definitely > all-visible and you don't need to do anything further. > > 3. Otherwise, you need to look at the heap page. But if the heap > page's UNDO pointer is invalid, then the whole page is all-visible, so > you're done. (You can also set the visibility map bit and/or the > index tuple's definitely-all-visible bit to avoid having to recheck > the heap page in the future.) > > 4. Each tuple will have a bit indicating whether it's been recently > modified; that is, whether the transaction whose UNDO log is > referenced by the page's UNDO pointer did anything to that tuple; or > if the page points to a TPD entry, whether any of the transactions in > the TPD modified that tuple. If that bit is not set for that > particular tuple, it's all-visible and you're done. > I think 4th optimization is especially good and can cover many cases, but I am not sure how do we unset it once it is set. For example, once the effect of transaction that has touched that row is all-visible, how will we unset it. It might be better to store the offset of transaction that has last modified the tuple, if the last transaction that has touched the row is visible to snapshot, then we don't need to process the UNDO. > 5. Otherwise you have to look at the TPD entry (if any) and UNDO logs. > > If we put in the visibility information in the index as you are > proposing, then we can always resolve the visibility of the index > tuple at step 1; we just test xmin and xmax against the snapshot. For > index-only scans, that's great, because we never need to consult the > heap at all. For regular index scans, it's not nearly as great, > because we're still going to need to consult the TPD and UNDO logs to > determine which version of the tuple is visible to that snapshot. > Yeah, but we might think of extending it in future so that index also has UNDO logs which can help such a case. Also, I think the cases where the tuple is visible but not marked as definitely-all-visible will be much cheaper. One more thing that needs some thoughts is how will the rollback work if keep just one bit for delete marking. I mean for heap we can directly apply from UNDO, but for the index, I think we need to also take care of undoing it in some way. For example, search the tree to remove the deleting marking from old value and delete marking the new value. Now if that is what we need to do then first we need to traverse the tree twice which is okay considering rollback is a seldom operation, the second challenge would be how to make such an operation atomic with respect to WAL. >>>>> That's OK, but we're in real trouble if >>>>> step-3 inserted an additional index tuple pointing to that chain >>>>> rather than simply noting that one already exists. If it adds an >>>>> additional one, then we'll visit two index tuples for that TID. Each >>>>> time, the visibility information in UNDO will cause us to return the >>>>> correct tuple, but we've erred by returning it twice (once per index >>>>> entry) rather than just once. >>>> >>>> Why will scan traverse the UNDO chain if it starts after step-3 commit? >>> >>> Why wouldn't it? I think that if an index scan hits a page with an >>> UNDO chain, it always need to traverse the whole thing. >> >> If you notice the scan has started after all the writes have >> committed, so what is the guarantee that there will be a UNDO chain? >> Yes, it could be there if there is some live transaction which started >> before step, otherwise, I am not sure if we can guarantee that UNDO >> chain will be present. > > The UNDO chain has to remain until all transactions that modified the > page are all-visible, not just until the transactions are committed. > Okay, I think this will work if we avoid the second insertion of the same value-TID pair in the index as you mentioned above. Without that, I think we might not distinguish which of the two rows are visible for a transaction to which the latest (step-3) row is visible. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: