Hi,
On 2023-11-19 14:41:47 -0800, Andres Freund wrote:
> It doesn't make it algorithmically better, you're right - but I think it's
> quite noticeable even in the case of the other columns having values.
>
> I changed the test to insert 0 insto all columns other than y, and changed the
> WHERE clause to IS NOT NULL, to avoid the overhead of the aggregation
> path. Profile:
>
> - 91.27% 0.00% postgres postgres [.] ExecProcNode (inlined)
> ExecProcNode (inlined)
> - ExecScan
> - 60.21% ExecQual (inlined)
> - ExecEvalExprSwitchContext (inlined)
> - 59.70% ExecInterpExpr
> - 54.05% slot_getsomeattrs (inlined)
> - 53.52% slot_getsomeattrs_int
> - 52.93% tts_buffer_heap_getsomeattrs
> - 52.88% slot_deform_heap_tuple (inlined)
> + 12.39% fetch_att (inlined)
> + 12.26% att_isnull (inlined)
> + 0.14% asm_sysvec_apic_timer_interrupt
> + 1.60% BoolGetDatum (inlined)
> + 0.01% asm_sysvec_apic_timer_interrupt
> 0.35% MemoryContextSwitchTo (inlined)
> - 30.47% ExecScanFetch (inlined)
> + 29.87% SeqNext
> + 0.01% asm_sysvec_apic_timer_interrupt
> 0.23% MemoryContextReset
> + 0.01% asm_sysvec_apic_timer_interrupt
>
>
> So even here we spend a decent amount of the time in null bitmap handling.
If I put prewarm the data into shared buffers and change the table so there
is a NULL in one of the leading columns, this changes to:
- 94.15% 0.00% postgres postgres [.] ExecProcNode (inlined)
ExecProcNode (inlined)
- ExecScan
- 79.15% ExecQual (inlined)
- ExecEvalExprSwitchContext (inlined)
- 78.50% ExecInterpExpr
- 71.62% slot_getsomeattrs (inlined)
- 70.52% slot_getsomeattrs_int
- 69.79% tts_buffer_heap_getsomeattrs
- 69.56% slot_deform_heap_tuple (inlined)
+ 20.25% att_isnull (inlined)
+ 9.45% fetch_att (inlined)
+ 0.39% asm_sysvec_apic_timer_interrupt
+ 1.88% BoolGetDatum (inlined)
+ 0.09% asm_sysvec_apic_timer_interrupt
0.51% MemoryContextSwitchTo (inlined)
+ 0.02% asm_sysvec_apic_timer_interrupt
+ 14.39% ExecScanFetch (inlined)
0.14% MemoryContextReset
+ 0.03% asm_sysvec_apic_timer_interrupt
Reducing the time spent in att_isnull() wouldn't get us to < 10 timings, but
it'd certainly help to close the gap. Of course you can make the difference
more extreme by adding a lot more leading columns, but still.
Greetings,
Andres Freund