Re: track needed attributes in plan nodes for executor use - Mailing list pgsql-hackers
From | Japin Li |
---|---|
Subject | Re: track needed attributes in plan nodes for executor use |
Date | |
Msg-id | ME0P300MB044517051BC22AC60DA2EAAEB64BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Whole thread Raw |
In response to | track needed attributes in plan nodes for executor use (Amit Langote <amitlangote09@gmail.com>) |
List | pgsql-hackers |
On Fri, 11 Jul 2025 at 17:16, Amit Langote <amitlangote09@gmail.com> wrote: > Hi, > > I’ve been experimenting with an optimization that reduces executor > overhead by avoiding unnecessary attribute deformation. Specifically, > if the executor knows which attributes are actually needed by a plan > node’s targetlist and qual, it can skip deforming unused columns > entirely. > > In a proof-of-concept patch, I initially computed the needed > attributes during ExecInitSeqScan by walking the plan’s qual and > targetlist to support deforming only what’s needed when evaluating > expressions in ExecSeqScan() or the variant thereof (I started with > SeqScan to keep the initial patch minimal). However, adding more work > to ExecInit* adds to executor startup cost, which we should generally > try to reduce. It also makes it harder to apply the optimization > uniformly across plan types. > > I’d now like to propose computing the needed attributes at planning > time instead. This can be done at the bottom of create_plan_recurse, > after the plan node has been constructed. A small helper like > record_needed_attrs(plan) can walk the node’s targetlist and qual > using pull_varattnos() and store the result in a new Bitmapset > *attr_used field in the Plan struct. System attributes returned by > pull_varattnos() can be filtered out during this step, since they're > either not relevant to deformation or not performance sensitive. > > This also lays the groundwork for a related executor-side optimization > that David Rowley suggested to me off-list. The idea is to remember, > in the TupleDesc, either the attribute number or the byte offset of > the first variable-length attribute. Then, if the minimum required > attribute (as provided by attr_used) lies before that, the executor > can safely jump directly to it using the cached offset, rather than > starting deformation from attno 0 as it currently does. That avoids > walking through fixed-length attributes that aren't needed -- > specifically, skipping per-attribute alignment, null checking, and > offset tracking for unused columns -- which reduces CPU work and > avoids loading irrelevant tuple bytes into cache. > > With both patches in place, heap tuple deforming can skip over unused > attributes entirely. For example, on a 30-column table where the first > 15 columns are fixed-width, the query: > > select sum(a_1) from foo where a_10 = $1; > > which references only two fixed-width columns, ran nearly 2x faster > with the optimization in place (with heap pages prewarmed into > shared_buffers). > > In more complex plans, for example those involving a Sort or Join > between the scan and aggregation, the CPU cost of the intermediate > node may dominate, making deforming-related savings at the top less > visible in overall performance. Still, I don't think that's a reason > to avoid enabling this optimization more broadly across plan nodes. > > I'll post the PoC patches and performance measurements. Posting this > in advance to get feedback on the proposed direction and where best to > place attr_used. > That's interesting. If I understand correctly, this approach wouldn't work if the first attribute is variable-length, right? -- Regards, Japin Li
pgsql-hackers by date: