Re: Query is slower with a large proportion of NULLs in several columns - Mailing list pgsql-performance

From Tom Lane
Subject Re: Query is slower with a large proportion of NULLs in several columns
Date
Msg-id 2688019.1640051519@sss.pgh.pa.us
Whole thread Raw
In response to Query is slower with a large proportion of NULLs in several columns  (Lars Bergeson <larsavatar@gmail.com>)
Responses Re: Query is slower with a large proportion of NULLs in several columns
List pgsql-performance
Lars Bergeson <larsavatar@gmail.com> writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT



pgsql-performance by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Query is slower with a large proportion of NULLs in several columns
Next
From: Lars Bergeson
Date:
Subject: Re: Query is slower with a large proportion of NULLs in several columns