Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-Wz=Ypc1PDdk8OVJDChGJBjT06=A0Mbv9HyTLCsOknGcUFg@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Victor Yegorov <vyegorov@gmail.com>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
Re: Deleting older versions in unique indexes to avoid page splits |
List | pgsql-hackers |
On Wed, Nov 11, 2020 at 6:17 AM Victor Yegorov <vyegorov@gmail.com> wrote: > I've looked at the latest (v7) patchset. > I've decided to use a quite common (in my practice) setup with an indexed mtime column over scale 1000 set: Thanks for testing! > We can see that: > - unused index is not suffering from not-HOT updates at all, which is the point of the patch > - we have ordinary queries performing on the same level as on master > - we have 5,2% slowdown in UPDATE speed I think that I made a mistake with v7: I changed the way that we detect low cardinality data during bottom-up deletion, which made us do extra/early deduplication in more cases than we really should. I suspect that this partially explains the slowdown in UPDATE latency that you reported. I will fix this in v8. I don't think that the solution is to go back to the v6 behavior in this area, though. I now believe that this whole "proactive deduplication for low cardinality data" thing only made sense as a way of compensating for deficiencies in earlier versions of the patch. Deficiencies that I've since fixed anyway. The best solution now is to simplify. We can have generic criteria for "should we dedup the page early after bottom-up deletion finishes without freeing up very much space?". This seemed to work well during my latest testing. Probably because heapam.c is now smart about the requirements from nbtree, as well as the cost of accessing heap pages. > I'm not sure if this should be counted as regression, though, as graphs go on par pretty much. > Still, I would like to understand why this combination of indexes and queries slows down UPDATEs. Another thing that I'll probably add to v8: Prefetching. This is probably necessary just so I can have parity with the existing heapam.c function that the new code is based on, heap_compute_xid_horizon_for_tuples(). That will probably help here, too. > During compilation I got one warning for make -C contrib: Oops. > I agree with the rename to "bottom-up index deletion", using "vacuuming" generally makes users think > that functionality is used only during VACUUM (misleading). Yeah. That's kind of a problem already, because sometimes we use the word VACUUM when talking about the long established LP_DEAD deletion stuff. But I see that as a problem to be fixed. Actually, I would like to fix it very soon. > I haven't looked at the code yet. It would be helpful if you could take a look at the nbtree patch -- particularly the changes related to deprecating the page-level BTP_HAS_GARBAGE flag. I would like to break those parts out into a separate patch, and commit it in the next week or two. It's just refactoring, really. (This commit can also make nbtree only use the word VACUUM for things that strictly involve VACUUM. For example, it'll rename _bt_vacuum_one_page() to _bt_delete_or_dedup_one_page().) We almost don't care about the flag already, so there is almost no behavioral change from deprecated BTP_HAS_GARBAGE in this way. Indexes that use deduplication already don't rely on BTP_HAS_GARBAGE being set ever since deduplication was added to Postgres 13 (the deduplication code doesn't want to deal with LP_DEAD bits, and cannot trust that no LP_DEAD bits can be set just because BTP_HAS_GARBAGE isn't set in the special area). Trusting the BTP_HAS_GARBAGE flag can cause us to miss out on deleting items with their LP_DEAD bits set -- we're better off "assuming that BTP_HAS_GARBAGE is always set", and finding out if there really are LP_DEAD bits set for ourselves each time. Missing LP_DEAD bits like this can happen when VACUUM unsets the page-level flag without actually deleting the items at the same time, which is expected when the items became garbage (and actually had their LP_DEAD bits set) after VACUUM began, but before VACUUM reached the leaf pages. That's really wasteful, and doesn't actually have any upside -- we're scanning all of the line pointers only when we're about to split (or dedup) the same page anyway, so the extra work is practically free. -- Peter Geoghegan
pgsql-hackers by date: