Home > mailing lists

Re: Processing btree walks as a batch to parallelize IO - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Processing btree walks as a batch to parallelize IO
Date	May 8, 2021 01:33:19
Msg-id	CAM-w4HNJN7HvYkB+W_WB5BSQLEd=VjQe3rXd+1e08ZtqnD_a7g@mail.gmail.com Whole thread Raw
In response to	Re: Processing btree walks as a batch to parallelize IO (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Processing btree walks as a batch to parallelize IO
List	pgsql-hackers

Tree view

On Fri, 9 Apr 2021 at 16:58, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 4/9/21 7:33 PM, James Coleman wrote:

> > A specific area where this is particularly painful is btree index reads.
> > Walking the tree to leaf pages isn't naturally prefetchable, and so for
> > each level you pay the random page cost. Of course higher levels in the
> > tree will almost certainly exhibit emergent behavior such that they
> > (just by fact of the LRU caching) will be in the buffer cache, but for a
> > large index lower levels likely won't be.

We've talked before about buffering inserts even just for disk-based
indexes. Much like how GIN buffers inserts and periodically flushes
them out. We talked about doing a local buffer in each session since
no other session even needs to see these buffered inserts until commit
anyways. And we can more efficiently merge in multiple keys at once
than doing them one by one.

But that was just for disk i/o. For something longer-latency it would
be an even bigger win. Buffer the inserted keys in local memory in
case you do lookups in this same session and start the i/o to insert
the rows into the index but handle that in the background or in a
separate process without blocking the transaction until commit.

> What do you consider a large index level?
>
> Consider a 1TB table, with just a single UUID column - that's ~25B rows,
> give or take. Real tables will have more columns, so this seems like a
> reasonable model of the largest number of rows per relation. With ~32B
> per index tuple, that's about 100M leaf pages, and with ~256 branches
> per internal page, that's still only ~5 levels. I think it's quite rare
> to see indexes with more than 6 or 7 levels.

That's a good model for a well-designed schema with an efficient
index. There are plenty of less-than-optimal schemas with indexes on
longer column lists or fairly large text fields....

-- 
greg

pgsql-hackers by date:

From: Tomas Vondra
Date: 08 May 2021, 00:16:54
Subject: Re: plan with result cache is very slow when work_mem is not enough

From: Peter Geoghegan
Date: 08 May 2021, 01:56:12
Subject: Re: Processing btree walks as a batch to parallelize IO

Re: Processing btree walks as a batch to parallelize IO - Mailing list pgsql-hackers

Previous

Next