Re: BUG #18205: Performance regression with NOT NULL checks. - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #18205: Performance regression with NOT NULL checks.
Date
Msg-id 20231119224145.jurvrc73gb66l7xl@awork3.anarazel.de
Whole thread Raw
In response to Re: BUG #18205: Performance regression with NOT NULL checks.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18205: Performance regression with NOT NULL checks.  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Hi,

On 2023-11-19 16:30:49 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > We actually did add fastpaths for a few similar cases: ExecJustInnerVar() etc
> > will just use slot_getattr(). These can be used when the result is just a
> > single variable. However, the goal there was more to avoid "interpreter
> > startup" overhead, rather than evaluation overhead.
>
> Yeah.  Also, if I'm reading the example appropriately, Daniel's case
> *does* involve fetching more than a single column --- but the other ones
> are up near the start so we didn't use to have to deform very much of
> the tuple.

Hm - I see. The initial response didn't make it sound like that, but the
follow-up clarified that aspect.


> > What if we instead load 8 bytes of the bitmap into a uint64 before entering
> > the loop, and shift an "index" mask into the bitmap by one each iteration
> > through the loop?
>
> Meh.  Seems like a micro-optimization that does nothing for the big-O
> problem.  One thing to think about is that I suspect "all the columns
> are null" is just a simple test case and not very representative of
> the real-world problem.  In the real case, probably quite a few of
> the leading columns are non-null, which would make Daniel's issue
> even worse because slot_deform_tuple would have to do significantly
> more work that it didn't do before.  Shaving cycles off the null-column
> fast path would be proportionally less useful too.

It doesn't make it algorithmically better, you're right - but I think it's
quite noticeable even in the case of the other columns having values.

I changed the test to insert 0 insto all columns other than y, and changed the
WHERE clause to IS NOT NULL, to avoid the overhead of the aggregation
path. Profile:

-   91.27%     0.00%  postgres  postgres                 [.] ExecProcNode (inlined)
     ExecProcNode (inlined)
   - ExecScan
      - 60.21% ExecQual (inlined)
         - ExecEvalExprSwitchContext (inlined)
            - 59.70% ExecInterpExpr
               - 54.05% slot_getsomeattrs (inlined)
                  - 53.52% slot_getsomeattrs_int
                     - 52.93% tts_buffer_heap_getsomeattrs
                        - 52.88% slot_deform_heap_tuple (inlined)
                           + 12.39% fetch_att (inlined)
                           + 12.26% att_isnull (inlined)
                           + 0.14% asm_sysvec_apic_timer_interrupt
               + 1.60% BoolGetDatum (inlined)
               + 0.01% asm_sysvec_apic_timer_interrupt
              0.35% MemoryContextSwitchTo (inlined)
      - 30.47% ExecScanFetch (inlined)
         + 29.87% SeqNext
         + 0.01% asm_sysvec_apic_timer_interrupt
        0.23% MemoryContextReset
      + 0.01% asm_sysvec_apic_timer_interrupt


So even here we spend a decent amount of the time in null bitmap handling.


I also am not so sure that you're right accessing multiple columns makes the
slot_attisnull() path even more advantageous - it's considerably slower when
accessing multiple columns, so if you actually access more columns, the
benefit will shrink. Particularly if there's variable width columns that need
to be skipped over and such.


> I have a wide table with 81 columns, and I am using column 1,43,18,75 and filter by attribute 82,42, and 24.

Projecting four columns and filtering by three gets a fair bit more expensive
if you actually deform them individually.


> It might well be that what you suggest is worth doing just to cut
> the cost of slot_deform_tuple across the board, but I don't think
> it's an answer to this complaint specifically.

I did suggest multiple things for a reason :).  slot_deform_tuple() is
generally a pretty big bottleneck, so efficiency improvements would be
interesting... And if were to only deform columns we actually need, handling
the null bitmap more efficiently becomes more important, so it could even help
cases like this.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #18205: Performance regression with NOT NULL checks.
Next
From: Andres Freund
Date:
Subject: Re: BUG #18205: Performance regression with NOT NULL checks.