Re: track needed attributes in plan nodes for executor use - Mailing list pgsql-hackers

From Amit Langote
Subject Re: track needed attributes in plan nodes for executor use
Date
Msg-id CA+HiwqFLBzdY7tNSPKa3t8o-bqJmTQoeJT9YxCkJv0jNAEh+2A@mail.gmail.com
Whole thread Raw
In response to Re: track needed attributes in plan nodes for executor use  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: track needed attributes in plan nodes for executor use
List pgsql-hackers
Thanks for the comments.

On Fri, Jul 11, 2025 at 11:09 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 11/7/2025 10:16, Amit Langote wrote:
> > Hi,
> >
> > I’ve been experimenting with an optimization that reduces executor
> > overhead by avoiding unnecessary attribute deformation. Specifically,
> > if the executor knows which attributes are actually needed by a plan
> > node’s targetlist and qual, it can skip deforming unused columns
> > entirely.
> Sounds promising. However, I'm not sure we're on the same page. Do you
> mean by the proposal an optimisation of slot_deform_heap_tuple() by
> providing it with a bitmapset of requested attributes? In this case,
> tuple header requires one additional flag to indicate a not-null, but
> unfilled column, to detect potential issues.

Not quite -- the optimization doesn’t require changes to the tuple
header or representation. The existing deforming code already stops
once all requested attributes are filled, using tts_nvalid to track
that. What I’m proposing is to additionally allow the slot to skip
ahead to the first needed attribute, rather than always starting
deformation from attno 0. That lets us avoid alignment/null checks for
preceding fixed-width attributes that are guaranteed to be unused.

To support that efficiently, the slot can store a new tts_min_valid
field to indicate the lowest attno that needs deforming.
Alternatively, we could use a per-attribute flag array (with
TupleDesc->natts elements), though that adds some memory and
complexity. The first option seems simpler and should be sufficient in
most cases.

> > In a proof-of-concept patch, I initially computed the needed
> > attributes during ExecInitSeqScan by walking the plan’s qual and
> > targetlist to support deforming only what’s needed when evaluating
> > expressions in ExecSeqScan() or the variant thereof (I started with
> > SeqScan to keep the initial patch minimal). However, adding more work
> > to ExecInit* adds to executor startup cost, which we should generally
> > try to reduce. It also makes it harder to apply the optimization
> > uniformly across plan types.
>
> I'm not sure if a lot of work will be added. However, cached generic
> plan execution should avoid any unnecessary overhead.

True, and that's exactly why I moved the computation from ExecInit* to
the planner. Doing it during plan construction ensures we avoid the
cost even in generic plan execution, which wouldn’t benefit if the
work were deferred to executor startup.

> > I’d now like to propose computing the needed attributes at planning
> > time instead. This can be done at the bottom of create_plan_recurse,
> > after the plan node has been constructed. A small helper like
> > record_needed_attrs(plan) can walk the node’s targetlist and qual
> > using pull_varattnos() and store the result in a new Bitmapset
> > *attr_used field in the Plan struct. System attributes returned by
> > pull_varattnos() can be filtered out during this step, since they're
> > either not relevant to deformation or not performance sensitive.
>
> Why do you choose the Plan node? It seems it is relevant to only Scan
> nodes. Does it mean extension of the CustomScan API?

It’s true that the biggest win is for Scan nodes, since that’s where
the tuple is fetched from storage and first deformed. But upper nodes
like Agg also deform tuples to evaluate expressions. For example, in a
plan like Agg over Sort over SeqScan, the Agg node will receive
MinimalTuples from the Sort and need to deform them to extract just
the attributes required for aggregation. So the optimization could
help there too.

I wasn’t quite sure what you meant about the CustomScan API, could you
elaborate?

> > With both patches in place, heap tuple deforming can skip over unused
> > attributes entirely. For example, on a 30-column table where the first
> > 15 columns are fixed-width, the query:
> >
> > select sum(a_1) from foo where a_10 = $1;
> >
> > which references only two fixed-width columns, ran nearly 2x faster
> > with the optimization in place (with heap pages prewarmed into
> > shared_buffers).
> It may be profitable. However, I often encounter cases where a table has
> 20-40 columns, with arbitrarily mixed fixed and variable-width columns.
> And fetching columns by index on a 30-something column is painful. And
> in this area, Postgres may gain more profit by adding cost on the column
> number in the order_qual_clauses() - in [1] I attempted to explain how
> and why it should work.
>
> [1]
> https://open.substack.com/pub/danolivo/p/on-expressions-reordering-in-postgres

Thanks, Andrei. Yes, I agree that clause ordering to minimize
deformation cost is a worthwhile idea and I appreciate the pointer to
your post. This patch aims to eliminate unnecessary work mechanically,
without depending on clause order or planner heuristics. It's
motivated by recent discussions that I am interested in around
improving the CPU characteristics of execution, especially by shaving
off predictable overheads in tight loops like tuple deformation.

--
Thanks, Amit Langote



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Logical Replication of sequences
Next
From: Ashutosh Bapat
Date:
Subject: Re: Changing shared_buffers without restart