Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoDqO7v5cvFUyWD7_pwceJ1XfhA81FVLTspKn0sFxtqNPA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Sat, Feb 11, 2023 at 2:33 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > I didn't get any closer to radix-tree regression, Me neither. It seems that in v26, inserting chunks into node-32 is slow but needs more analysis. I'll share if I found something interesting. > but I did find some inefficiencies in tidstore_add_tids() that are worth talking about first, addressed in a rough fashionin the attached .txt addendums that I can clean up and incorporate later. > > To start, I can reproduce the regression with this test as well: > > select * from bench_tidstore_load(0, 10 * 1000 * 1000); > > v15 + v26 store + adjustments: > mem_allocated | load_ms > ---------------+--------- > 98202152 | 1676 > > v26 0001-0008 > mem_allocated | load_ms > ---------------+--------- > 98202032 | 1826 > > ...and reverting to the alternate way to update the parent didn't help: > > v26 0001-6, 0008, insert_inner w/ null parent > > mem_allocated | load_ms > ---------------+--------- > 98202032 | 1825 > > ...and I'm kind of glad that wasn't the problem, because going back to that would be a pain for the shmem case. > > Running perf doesn't show anything much different in the proportions (note that rt_set must have been inlined when declaredlocally in v26): > > v15 + v26 store + adjustments: > 65.88% postgres postgres [.] tidstore_add_tids > 10.74% postgres postgres [.] rt_set > 9.20% postgres postgres [.] palloc0 > 6.49% postgres postgres [.] rt_node_insert_leaf > > v26 0001-0008 > 78.50% postgres postgres [.] tidstore_add_tids > 8.88% postgres postgres [.] palloc0 > 6.24% postgres postgres [.] local_rt_node_insert_leaf > > v2699-0001: The first thing I noticed is that palloc0 is taking way more time than it should, and it's because the compilerdoesn't know the values[] array is small. One reason we need to zero the array is to make the algorithm agnosticabout what order the offsets come in, as I requested in a previous review. Thinking some more, I was way too paranoidabout that. As long as access methods scan the line pointer array in the usual way, maybe we can just assert thatthe keys we create are in order, and zero any unused array entries as we find them. (I admit I can't actually think ofa reason we would ever encounter offsets out of order.) I can think that something like traversing a HOT chain could visit offsets out of order. But fortunately we prune such collected TIDs before heap vacuum in heap case. > Also, we can keep track of the last key we need to consider for insertion into the radix tree, and ignore the rest. Thatmight shave a few cycles during the exclusive lock when the max offset of an LP_DEAD item < 64 on a given page, whichI think would be common in the wild. I also got rid of the special case for non-encoding, since shifting by zero shouldwork the same way. These together led to a nice speedup on the v26 branch: > > mem_allocated | load_ms > ---------------+--------- > 98202032 | 1386 > > v2699-0002: The next thing I noticed is forming a full ItemIdPointer to pass to tid_to_key_off(). That's bad for tidstore_add_tids()because ItemPointerSetBlockNumber() must do this in order to allow the struct to be SHORTALIGN'd: > > static inline void > BlockIdSet(BlockIdData *blockId, BlockNumber blockNumber) > { > blockId->bi_hi = blockNumber >> 16; > blockId->bi_lo = blockNumber & 0xffff; > } > > Then, tid_to_key_off() calls ItemPointerGetBlockNumber(), which must reverse the above process: > > static inline BlockNumber > BlockIdGetBlockNumber(const BlockIdData *blockId) > { > return (((BlockNumber) blockId->bi_hi) << 16) | ((BlockNumber) blockId->bi_lo); > } > > There is no reason to do any of this if we're not reading/writing directly to/from an on-disk tid etc. To avoid this, Icreated a new function encode_key_off() [name could be better], which deals with the raw block number that we already have.Then turn tid_to_key_off() into a wrapper around that, since we still need the full conversion for tidstore_lookup_tid(). > > v2699-0003: Get rid of all the remaining special cases for encoding/or not. I am unaware of the need to optimize that caseor treat it in any way differently. I haven't tested this on an installation with non-default blocksize and didn't measurethis separately, but 0002+0003 gives: > > mem_allocated | load_ms > ---------------+--------- > 98202032 | 1259 > > If these are acceptable, I can incorporate them into a later patchset. These are nice improvements! I agree with all changes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: