Re: BTScanOpaqueData size slows down tests - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: BTScanOpaqueData size slows down tests
Date
Msg-id CAH2-WznTtOn9Tek409P8YynXsrPD7NsZHq194M9o81QXQN78+Q@mail.gmail.com
Whole thread Raw
In response to Re: BTScanOpaqueData size slows down tests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BTScanOpaqueData size slows down tests
List pgsql-hackers
On Wed, Apr 2, 2025 at 11:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ouch!  I had no idea it had gotten that big.  Yeah, we ought to
> do something about that.

Tomas Vondra talked about this recently, in the context of his work on
prefetching.

> > And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole
> > only when mark/restore are used?
>
> That'd be an easy way of removing about half of the problem, but
> 14kB is still too much.  How badly do we need this items array?
> Couldn't we just reference the on-page items?

I'm not sure what you mean by that. The whole design of _bt_readpage
is based on the idea that we read a whole page, in one go. It has to
batch up the items that are to be returned from the page somewhere.
The worst case is that there are about 1350 TIDs to return from any
single page (assuming default BLCKSZ). It's very pessimistic to start
from the assumption that that worst case will be hit, but I don't see
a way around doing it at least some of the time.

The first thing I'd try is some kind of simple dynamic allocation
scheme, with a small built-in array that avoided any allocation
penalty in the common case where there weren't too many tuples to
return from the page.

The way that we allocate BLCKSZ twice for index-only scans (one for
so->currTuples, the other for so->markTuples) is also pretty
inefficient. Especially because any kind of use of mark and restore is
exceedingly rare.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Incorrect result of bitmap heap scan.
Next
From: Tomas Vondra
Date:
Subject: Re: Improve monitoring of shared memory allocations