Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com Whole thread Raw |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
|
List | pgsql-hackers |
Attached is a POC patch that teaches nbtree to delete old duplicate versions from unique indexes. The optimization targets non-HOT duplicate version bloat. Although the patch is rather rough, it nevertheless manages to more or less eliminate a whole class of index bloat: Unique index bloat from non-HOT updates in workloads where no transaction lasts for more than a few seconds. For example, it eliminates index bloat with a custom pgbench workload that uses an INCLUDE unique index on pgbench_accounts.aid (with abalance as the non-key attribute), instead of the usual accounts primary key. Similarly, with a standard pgbench_accounts primary key alongside an extra non-unique index on abalance, the primary key will never have any page splits with the patch applied. It's almost as if the updates were actually HOT updates, at least if you focus on the unique index (assuming that there are no long-running transactions). The patch repurposes the deduplication infrastructure to delete duplicates within unique indexes, provided they're actually safe to VACUUM. This is somewhat different to the _bt_unique_check() LP_DEAD bit setting stuff, in that we have to access heap pages that we probably would not have to access otherwise -- it's something that we go out of our way to make happen at the point that the page is about to split, not something that happens in passing at no extra cost. The general idea is to exploit the fact that duplicates in unique indexes are usually deadwood. We only need to "stay one step ahead" of the bloat to avoid all page splits in many important cases. So we usually only have to access a couple of heap pages to avoid a page split in each case. In traditional serial/identity column primary key indexes, any page split that happens that isn't a split of the current rightmost page must be caused by version churn. It should be possible to avoid these "unnecessary" page splits altogether (again, barring long-running transactions). I would like to get early feedback on high level direction. While the patch seems quite promising, I am uncertain about my general approach, and how it might fit into some much broader effort to control bloat in general. There are some clear downsides to my approach. The patch has grotty heuristics that try to limit the extra work performed to avoid page splits -- the cost of accessing additional heap pages while a buffer lock is held on the leaf page needs to be kept. under control. No doubt this regresses some workloads without giving them a clear benefit. Also, the optimization only ever gets used with unique indexes, since they're the only case where a duplicate naturally suggests version churn, which can be targeted fairly directly, and without wasting too many cycles when it doesn't work out. It's not at all clear how we could do something like this with non-unique indexes. One related-though-distinct idea that might be worth considering occurs to me: teach nbtree to try to set LP_DEAD bits in non-unique indexes, in about the same way as it will in _bt_check_unique() for unique indexes. Perhaps the executor could hint to btinsert()/aminsert() that it's inserting a duplicate caused by a non-HOT update, so it's worth trying to LP_DEAD nearby duplicates -- especially if they're on the same heap page as the incoming item. There is a wholly separate question about index bloat that is of long term strategic importance to the Postgres project: what should we do about long running transactions? I tend to think that we can address problems in that area by indicating that it is safe to delete "intermediate" versions -- tuples that are not too old to be seen by the oldest transaction, that are nevertheless not needed (they're too new to be interesting to the old transaction's snapshot, but also too old to be interesting to any other snapshot). Perhaps this optimization could be pursued in a phased fashion, starting with index AMs, where it seems less scary. I recently read a paper that had some ideas about what we could do here [1]. IMV it is past time that we thrashed together a "remove useless intermediate versions" design that is compatible with the current heapam design. [1] https://dl.acm.org/doi/pdf/10.1145/3318464.3389714 -- Peter Geoghegan
Attachment
pgsql-hackers by date: