[patch] _bt_binsrch* improvements - equal-prefix-skip binary search - Mailing list pgsql-hackers

From Matthias van de Meent
Subject [patch] _bt_binsrch* improvements - equal-prefix-skip binary search
Date
Msg-id CAEze2WjnrkzT9tfqFhSCPeH6LV-UUj5_oO5vX038dvfK7fDuyQ@mail.gmail.com
Whole thread Raw
Responses Re: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

I've not yet been involved in postgresql's development process, so here's a first. Please find attached a patch for improving the BT binary search speeds, with accompanying performance data.

v1-0001 adapts _bt_compare and _bt_binsrch* to make use of static properties of L&Y-style nbtree indexes to speed up finding an initial position in the nbtee.

I alter the logic from _bt_compare to accept a 'start-comparing-at-column' argument, and to also return which column the comparison result came from. This allows us to keep track of which columns we've already checked for equality, and when getting deeper into the index this allows us to skip checking the already equal key columns.

Specifically, when binary-searching through a page, we keep track of which column was checked for the high-key, and which for the low-key. The first column of these that is not yet equal will be used for the next comparison. Any columns before this column must be equal, as both the high- and the low-keys in that page have already been validated as having an equal prefix. The column number is then passed on through the insertion key, allowing the optimization to be used in the climbing of the nbtree index.

v1-0001 will be especially performant in nbtree indexes with a relatively low initial cardinality and high compare cost. My own performance testing was done on a laptop (4c/8t), after first getting it warm with other benchmarks & compilations, so the results are a bit unstable.

While testing this I also noticed that there were a lot of pipeline stalls around the arguments and results of _bt_compare in the hot loops of _bt_binsrch* where the new functionality would be used, so I've moved the core logic of _bt_compare to a static inline function _bt_compare_inline, which helped the code to go from a 2% TPS decrease for single-column indexes to ~ 8% TPS increase for an insert-only workload, and for 3-column text all-collated indexes a TPS increase of 20% on my system. This also allowed me to keep the signature of _bt_compare the same as it was, limiting the scope of the changes to only the named functions.

Finally, this could be a great start on prefix truncation for btree indexes, though that is _not_ the goal of this patch. This patch skips, but does not truncate, the common prefix.

Kind regards,

Matthias van de Meent

P.S. One more thing I noticed in benchmarking is that a significant part of the costs of non-default collations is in looking up the collation twice in the collation cache. My guess from flame graphs is that there could be a large throughput increase for short strings if the collation lookup from lc_collate_is_c() in varstr_cmp could be reused in the subsequent call to pg_newlocale_from_collation()

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Logical Replication - detail message with names of missing columns
Next
From: 曾文旌
Date:
Subject: Re: [Proposal] Global temporary tables