Re: partial heap only tuples - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: partial heap only tuples |
Date | |
Msg-id | 20210210224344.GB22163@momjian.us Whole thread Raw |
In response to | partial heap only tuples ("Bossart, Nathan" <bossartn@amazon.com>) |
Responses |
Re: partial heap only tuples
|
List | pgsql-hackers |
On Tue, Feb 9, 2021 at 06:48:21PM +0000, Bossart, Nathan wrote: > Hello, > > I'm hoping to gather some early feedback on a heap optimization I've > been working on. In short, I'm hoping to add "partial heap only > tuple" (PHOT) support, which would allow you to skip updating indexes > for unchanged columns even when other indexes require updates. Today, I think it is great you are working on this. I think it is a major way to improve performance and I have been disappointed it has not moved forward since 2016. > HOT works wonders when no indexed columns are updated. However, as > soon as you touch one indexed column, you lose that optimization > entirely, as you must update every index on the table. The resulting > performance impact is a pain point for many of our (AWS's) enterprise > customers, so we'd like to lend a hand for some improvements in this > area. For workloads involving a lot of columns and a lot of indexes, > an optimization like PHOT can make a huge difference. I'm aware that > there was a previous attempt a few years ago to add a similar > optimization called WARM [0] [1]. However, I only noticed this > previous effort after coming up with the design for PHOT, so I ended > up taking a slightly different approach. I am also aware of a couple > of recent nbtree improvements that may mitigate some of the impact of > non-HOT updates [2] [3], but I am hoping that PHOT serves as a nice > complement to those. I've attached a very early proof-of-concept > patch with the design described below. How is your approach different from those of [0] and [1]? It is interesting you still see performance benefits even after the btree duplication improvements. Did you test with those improvements? > As far as performance is concerned, it is simple enough to show major > benefits from PHOT by tacking on a large number of indexes and columns > to a table. For a short pgbench run where each table had 5 additional > text columns and indexes on every column, I noticed a ~34% bump in > TPS with PHOT [4]. Theoretically, the TPS bump should be even higher That's a big improvement. > Next, I'll go into the design a bit. I've commandeered the two > remaining bits in t_infomask2 to use as HEAP_PHOT_UPDATED and > HEAP_PHOT_TUPLE. These are analogous to the HEAP_HOT_UPDATED and > HEAP_ONLY_TUPLE bits. (If there are concerns about exhausting the > t_infomask2 bits, I think we could only use one of the remaining bits > as a "modifier" bit on the HOT ones. I opted against that for the > proof-of-concept patch to keep things simple.) When creating a PHOT > tuple, we only create new index tuples for updated columns. These new > index tuples point to the PHOT tuple. Following is a simple > demonstration with a table with two integer columns, each with its own > index: Whatever solution you have, you have to be able to handle adding/removing columns, and adding/removing indexes. > When it is time to scan through a PHOT chain, there are a couple of > things to account for. Sequential scans work out-of-the-box thanks to > the visibility rules, but other types of scans like index scans > require additional checks. If you encounter a PHOT chain when > performing an index scan, you should only continue following the chain > as long as none of the columns the index indexes are modified. If the > scan does encounter such a modification, we stop following the chain > and continue with the index scan. Even if there is a tuple in that I think in patch [0] and [1], if an index column changes, all the indexes had to be inserted into, while you seem to require inserts only into the index that needs it. Is that correct? > PHOT chain that should be returned by our index scan, we will still > find it, as there will be another matching index tuple that points us > to later in the PHOT chain. My initial idea for determining which > columns were modified was to add a new bitmap after the "nulls" bitmap > in the tuple header. However, the attached patch simply uses > HeapDetermineModifiedColumns(). I've yet to measure the overhead of > this approach versus the bitmap approach, but I haven't noticed > anything too detrimental in the testing I've done so far. A bitmap is an interesting approach, but you are right it will need benchmarking. I wonder if you should create a Postgres wiki page to document all of this. I agree PG 15 makes sense. I would like to help with this if I can. I will need to study this email more later. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
pgsql-hackers by date: