Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation |
Date | |
Msg-id | CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA@mail.gmail.com Whole thread Raw |
Responses |
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
|
List | pgsql-hackers |
-- Targeting PG15; if too early / noise then please ignore. I've noticed there are a lot of places in the btree index infrastructure (and also some other index AMs) that effectively iterate over the attributes of the index tuple, but won't use index_deform_tuple for reasons. However, this implies that they must repeatedly call index_getattr, which in the worst case is O(n) for the n-th attribute, slowing down extraction of multi-column indexes significantly. As such, I've added some API that allows for iteration (ish) over index attributes. Please find attached patch 0001 that improves the runtime complexity of many of these places by storing and reusing the offset of the last extracted attribute. This improves the worst-case runtime of extracting all attributes to O(n) for incremental attribute extraction (from O(n*n)). Note that finding the first offsets is still an O(n) worst case for starting at the n-th attribute, but nothing can be done about that. Almost all workloads for multi-column nbtree indexes that cannot use attcacheoff should see a benefit from this patch; only those that only use row scans cannot use this optimization. Additionally, multi-column gist indexes could also see some (albeit limited) benefit, which is indeed useful when considering the newly added INCLUDE support in the gist AM. Also attached is 0002, which dynamically truncates attribute prefixes of tuples whilst _binsrch-ing through a nbtree page. It greatly uses the improved performance of 0001; they work very well together. The problems that Peter (cc-ed) mentions in [0] only result in invalid search bounds when traversing the tree, but on the page level valid bounds can be constructed. This is patchset 1 of a series of patches I'm starting for eventually adding static prefix truncation into nbtree infrastructure in PostgreSQL. I've put up a wiki page [1] with my current research and thoughts on that topic. Performance ----------- I've run some tests with regards to performance on my laptop; which tests nbtree index traversal. The test is based on a recent UK land registry sales prices dataset (25744780 rows), being copied from one table into an unlogged table with disabled autovacuum, with one index as specified by the result. Master @ 99964c4a, patched is with both 0001 and 0002. The results are averages over 3 runs, with plain configure, compiled by gcc (Debian 6.3.0-18+deb9u1). INSERT (index definition) | master (s) | patched (s) | improv(%) UNIQUE (transaction) | 256851 | 251705 | 2.00 (county, city, locality) | 154529 | 147495 | 4.55 (county COLLATE "en_US", city, locality) | 174028 | 164165 | 5.67 (always_null, county, city, locality) | 173090 | 166851 | 3.60 Some testing for reindex indicates improvements there as well: Same compiled version; all indexes on an unlogged table; REINDEX run 4 times on each index, last 3 were averaged. REINDEX (index definition) | master (s) | patched (s) | improv(%) UNIQUE (transaction) | 11623 | 11692 | -0.6 (county, city, locality) | 58299 | 54770 | 6.1 (county COLLATE "en_US", city, locality) | 61790 | 55887 | 9.6 (always_null, county, city, locality) | 69703 | 63925 | 8.3 I am quite suprised with the results for the single-column unique index insertions, as that was one of the points where I was suspecting a slight decrease in performance for inserts. I haven't really checked why the performance increased, but I suspect it has to do with an improved fast-path for finding the first attribute (we know it always starts at offset 0 of the data section), but it might also just as well be due to throttling (sadly, I do not have a stable benchmarking machine, so my laptop will do). I'm also slightly disappointed with the results of the always_null insert load; I had hoped for better results there, seeing the results for the other 2 multi-column indexes. With regards, Matthias van de Meent. [0] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com [1] https://wiki.postgresql.org/wiki/NBTree_Prefix_Truncation
Attachment
pgsql-hackers by date: