On Fri, 9 Apr 2021 at 16:58, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 4/9/21 7:33 PM, James Coleman wrote:
> > A specific area where this is particularly painful is btree index reads.
> > Walking the tree to leaf pages isn't naturally prefetchable, and so for
> > each level you pay the random page cost. Of course higher levels in the
> > tree will almost certainly exhibit emergent behavior such that they
> > (just by fact of the LRU caching) will be in the buffer cache, but for a
> > large index lower levels likely won't be.
We've talked before about buffering inserts even just for disk-based
indexes. Much like how GIN buffers inserts and periodically flushes
them out. We talked about doing a local buffer in each session since
no other session even needs to see these buffered inserts until commit
anyways. And we can more efficiently merge in multiple keys at once
than doing them one by one.
But that was just for disk i/o. For something longer-latency it would
be an even bigger win. Buffer the inserted keys in local memory in
case you do lookups in this same session and start the i/o to insert
the rows into the index but handle that in the background or in a
separate process without blocking the transaction until commit.
> What do you consider a large index level?
>
> Consider a 1TB table, with just a single UUID column - that's ~25B rows,
> give or take. Real tables will have more columns, so this seems like a
> reasonable model of the largest number of rows per relation. With ~32B
> per index tuple, that's about 100M leaf pages, and with ~256 branches
> per internal page, that's still only ~5 levels. I think it's quite rare
> to see indexes with more than 6 or 7 levels.
That's a good model for a well-designed schema with an efficient
index. There are plenty of less-than-optimal schemas with indexes on
longer column lists or fairly large text fields....
--
greg