Hi,
On 2025-04-02 11:36:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Looking at the size of BTScanOpaqueData I am less surprised:
> > /* size: 27352, cachelines: 428, members: 17 */
> > allocating, zeroing and freeing 28kB of memory for every syscache miss, yea,
> > that's gonna hurt.
>
> Ouch! I had no idea it had gotten that big. Yeah, we ought to
> do something about that.
It got a bit bigger a few years back, in
commit 0d861bbb702
Author: Peter Geoghegan <pg@bowt.ie>
Date: 2020-02-26 13:05:30 -0800
Add deduplication to nbtree.
Because the posting list is a lot more dense, more items can be stored on each
page.
Not that it was small before either:
BTScanPosData currPos __attribute__((__aligned__(8))); /* 88 4128 */
/* --- cacheline 65 boundary (4160 bytes) was 56 bytes ago --- */
BTScanPosData markPos __attribute__((__aligned__(8))); /* 4216 4128 */
/* size: 8344, cachelines: 131, members: 16 */
/* sum members: 8334, holes: 3, sum holes: 10 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 24 bytes */
} __attribute__((__aligned__(8)));
But obviously ~3.2x can qualitatively change something.
> > And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole
> > only when mark/restore are used?
>
> That'd be an easy way of removing about half of the problem, but
> 14kB is still too much. How badly do we need this items array?
> Couldn't we just reference the on-page items?
I think that'd require acquiring the buffer lock and/or pin more frequently.
But I know very little about nbtree.
I'd assume it's extremely rare for there to be this many items on a page. I'd
guess that something like storing having BTScanPosData->items point to an
in-line 4-16 BTScanPosItem items_inline[N] and dynamically allocate a
full-length BTScanPosItem[MaxTIDsPerBTreePage] just in the cases it's needed.
I'm a bit confused by the "MUST BE LAST" comment:
BTScanPosItem items[MaxTIDsPerBTreePage]; /* MUST BE LAST */
Not clear why? Seems to be from rather long back:
commit 09cb5c0e7d6
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2006-05-07 01:21:30 +0000
Rewrite btree index scans to work a page at a time in all cases (both
Greetings,
Andres Freund