Re: Tid scan improvements - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Tid scan improvements
Date
Msg-id 20210203211919.u7q3vq24g4pd63gv@alap3.anarazel.de
Whole thread Raw
In response to Re: Tid scan improvements  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Tid scan improvements
List pgsql-hackers
Hi,

On 2021-01-26 14:22:42 +1300, David Rowley wrote:
> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 33bffb6815..d1c608b176 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -325,6 +325,26 @@ typedef struct TableAmRoutine
>                                       ScanDirection direction,
>                                       TupleTableSlot *slot);
>  
> +    /*
> +     * Return next tuple from `scan` where TID is within the defined range.
> +     * This behaves like scan_getnextslot but only returns tuples from the
> +     * given range of TIDs.  Ranges are inclusive.  This function is optional
> +     * and may be set to NULL if TID range scans are not supported by the AM.
> +     *
> +     * Implementations of this function must themselves handle ItemPointers
> +     * of any value. i.e, they must handle each of the following:
> +     *
> +     * 1) mintid or maxtid is beyond the end of the table; and
> +     * 2) mintid is above maxtid; and
> +     * 3) item offset for mintid or maxtid is beyond the maximum offset
> +     * allowed by the AM.
> +     */
> +    bool        (*scan_getnextslot_inrange) (TableScanDesc scan,
> +                                             ScanDirection direction,
> +                                             TupleTableSlot *slot,
> +                                             ItemPointer mintid,
> +                                             ItemPointer maxtid);
> +
>  
>      /* ------------------------------------------------------------------------
>       * Parallel table scan related functions.
> @@ -1015,6 +1035,36 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
>      return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
>  }
>  
> +/*
> + * Return next tuple from defined TID range from `scan` and store in slot.
> + */
> +static inline bool
> +table_scan_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
> +                               TupleTableSlot *slot, ItemPointer mintid,
> +                               ItemPointer maxtid)
> +{
> +    /*
> +     * The planner should never make a plan which uses this function when the
> +     * table AM has not defined any function for this callback.
> +     */
> +    Assert(sscan->rs_rd->rd_tableam->scan_getnextslot_inrange != NULL);
> +
> +    slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
> +
> +    /*
> +     * We don't expect direct calls to table_scan_getnextslot_inrange with
> +     * valid CheckXidAlive for catalog or regular tables.  See detailed
> +     * comments in xact.c where these variables are declared.
> +     */
> +    if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
> +        elog(ERROR, "unexpected table_scan_getnextslot_inrange call during logical decoding");
> +
> +    return sscan->rs_rd->rd_tableam->scan_getnextslot_inrange(sscan,
> +                                                              direction,
> +                                                              slot,
> +                                                              mintid,
> +                                                              maxtid);
> +}


I don't really like that this API relies on mintid/maxtid to stay the
same across multiple scan_getnextslot_inrange() calls. I think we'd at
least need to document that that's required and what needs to be done to
scan a different set of min/max tid (or re-scan the same min/max from
scratch).

Perhaps something like

typedef struct TableScanTidRange TableScanTidRange;

TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, ItemPointer mintid, ItemPointer maxtid);
bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange *range, TupleTableSlot *slot);
void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);

would work better? That'd allow an AM to have arbitrarily large state
for a tid range scan, would make it clear what the lifetime of the
ItemPointer mintid, ItemPointer maxtid are etc.  Wouldn't, on the API
level, prevent multiple tid range scans from being in progress at the
same times though :(. Perhaps we could add a TableScanTidRange* pointer
to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Can we have a new SQL callable function to get Postmaster PID?
Next
From: David Rowley
Date:
Subject: Re: Tid scan improvements