Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id CAAKRu_Z8nTdGEqNUFGgCaAvRjEv=2RzEt5MKYft9u_BL==cFBA@mail.gmail.com
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Thanks for the review!

While editing this, I found a few things I could improve. For one, I
actually wasn't using the parallel state member that I had added as a
parameter to one of the table AM callbacks. So, I was able to remove
that.

On Thu, Feb 20, 2025 at 7:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Feb 21, 2025 at 11:17 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > [v31-0001-Delay-extraction-of-TIDBitmap-per-page-offsets.patch]
>
> +/*
> + * The tuple OffsetNumbers extracted from a single page in the bitmap.
> + */
> +typedef OffsetNumber TBMOffsets[TBM_MAX_TUPLES_PER_PAGE];
>
> It's used for a stack object, but it's also used as a degraded pointer here:
>
> +extern void tbm_extract_page_tuple(TBMIterateResult *iteritem,
> +
> TBMOffsets offsets);
>
> Maybe a personal style question but degraded pointers aren't my
> favourite C feature.  A couple of alternatives, I am not sure how good
> they are:  Would this output variable be better wrapped up in a new
> struct for better API clarity?  If so, shouldn't such a struct also
> hold ntuples itself, since that really goes with the offsets it's
> holding?  If so, could iteritem->ntuples then be replaced with a
> boolean iteritem->lossy, since all you really need to be able to check
> is whether it has offsets to give you, instead of using -1 and -2 for
> that?

Yea, I think this is a good point. I made a new struct which has
ntuples and the offsets array as you suggested. I think it's much
nicer than using -2 :)

I do worry a bit about the ntuples value in the new struct being
uninitialized. Before, tbm_iterate() filled in
TBMIterateResult.ntuples, so you knew that once you had a
TBMIterateResult at all, ntuples was valid. Now you make a
TBMPageOffsets and until you call tbm_extract_page_tuple() on it,
TBMPageOffsets->ntuples is garbage. It's probably okay though. I have
the caller initialize TBMPageOffsets to -1 when I think this might be
an issue.

> Or as a variation without a new struct, would it be better to
> take that output array as a plain pointer to OffsetNumber and
> max_offsets, and return ntuples (ie how many were filled in, cf commit
> f6bef362), for the caller to use in a loop or stash somewhere, with an
> additional comment that TBM_MAX_TUPLES_PER_PAGE is guaranteed to be
> enough, for convenience eg for arrays on the stack?

I thought about doing this, but in the end I decided against this
approach. If we wanted to make it easy to palloc arrays of different
sizes and have tbm_extract_page_tuple extract that many tuples into
the array, we'd have to change more of the code anyway because
tbm_extract_page_tuple is what tells us how many tuples there are to
begin with. We could save this somewhere while filling in the
PagetableEntries initially, but that is a bigger change.

And, passing a length would also mean more callers would have to know
about TBM_MAX_TUPLES_PER_PAGE, which I think is already kind of a hack
since it is defined as MaxHeapTuplesPerPage.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: explain analyze rows=%.0f
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export