Thread: No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )

No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )

From
Christophe Courtois
Date:
Hi,

I've found that EXPLAIN (ANALYZE,BUFFERS) does not show any access to the TOAST tables when a toasted column is only
SELECTedand not used in any way in the query.
 
pg_statio_all_tables only shows access to the heap.
I imagine that some optimization does not bother to go into the TOAST table, as its content is neither printed neither
used.

As a consequence, EXPLAIN plans can show huge differences against the actual query results, in buffers count and query
time.As EXPLAIN is supposed to  "display the true row counts and true run time accumulated within each plan node", I
thinkthere may be a problem.
 
I didn't see any related caveats in https://www.postgresql.org/docs/14/using-explain.html#USING-EXPLAIN-CAVEATS



For example, a simple "SELECT" from my one-column table takes ~ 20 ms and reads 9000 buffers, but the EXPLAIN ANALYZE
ismuch much quicker:
 

                               QUERY PLAN                               
------------------------------------------------------------------------
 Seq Scan on public.noises (actual time=0.025..0.393 rows=3000 loops=1)
   Output: noise
   Buffers: shared hit=20
 Query Identifier: 2512658633781838584
 Planning Time: 0.030 ms
 Execution Time: 0.689 ms


If I force the use of the result in the query (concatenation, length…), I see the 9000 missing hits again in the
EXPLAINplan.
 

This seems to be this way since 9.1 at least.
I'm wondering if this can be changed, or at least documented.



SQL script for a full demo is joined. The main parts of the output are below (I hope this is readable):


"8 MB test table"
                                    List of relations
 Schema |  Name  | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+--------+-------+----------+-------------+---------------+---------+-------------
 public | noises | table | postgres | unlogged    | heap          | 8384 kB | 

"** Reset stats **"
 

"**"
"** Query : SELECT noise FROM noises  : ~ about 10-20 ms **"
"**"

Time: 16.350 ms
 

  relid  | schemaname |     relname      | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit |
toast_blks_read| toast_blks_hit | tidx_blks_read | tidx_blks_hit 
 

---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
 4697174 | public     | noises           |              0 |            20 |               |              |
0 |           3000 |              0 |          6001
 
 4697177 | pg_toast   | pg_toast_4697174 |              0 |          3000 |             0 |         6001 |
  |                |                |              
 

"** Reset stats **"
 

"**"
"** Same query + EXPLAIN ANALYZE : < 1 ms , 20 hits **"
"**"
                               QUERY PLAN                               
------------------------------------------------------------------------
 Seq Scan on public.noises (actual time=0.006..0.199 rows=3000 loops=1)
   Output: noise
   Buffers: shared hit=20
 Query Identifier: 1313741123942831781
 Planning Time: 0.028 ms
 Execution Time: 0.334 ms

 pg_sleep 
----------
 

  relid  | schemaname |     relname      | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit |
toast_blks_read| toast_blks_hit | tidx_blks_read | tidx_blks_hit 
 

---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
 4697174 | public     | noises           |              0 |            20 |               |              |
0 |              0 |              0 |             0
 
 4697177 | pg_toast   | pg_toast_4697174 |              0 |             0 |             0 |            0 |
  |                |                |              
 

"** Reset stats **"

"**"
"** Force use of toast data + EXPLAIN ANALYZE : 10-20 ms, 9020 hits **"
"**"
                               QUERY PLAN                               
------------------------------------------------------------------------
 Seq Scan on public.noises (actual time=0.023..9.679 rows=3000 loops=1)
   Output: (noise || ''::text)
   Buffers: shared hit=9020
 Query Identifier: -6122032322937607654
 Planning Time: 0.026 ms
 Execution Time: 9.843 ms

 

  relid  | schemaname |     relname      | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit |
toast_blks_read| toast_blks_hit | tidx_blks_read | tidx_blks_hit 
 

---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
 4697174 | public     | noises           |              0 |            20 |               |              |
0 |           3000 |              0 |          6000
 
 4697177 | pg_toast   | pg_toast_4697174 |              0 |          3000 |             0 |         6000 |
  |                |                |              
 





-- 
Christophe Courtois
Consultant Dalibo
https://dalibo.com/
Attachment

Re: No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )

From
Tom Lane
Date:
Christophe Courtois <christophe.courtois@dalibo.com> writes:
> I've found that EXPLAIN (ANALYZE,BUFFERS) does not show any access to the TOAST tables when a toasted column is only
SELECTedand not used in any way in the query. 

This is probably because of the documented, long-standing behavior that
EXPLAIN does not convert the query's output rows to text, nor send them
to the client.  If the datatype output functions aren't called, the
output datums won't get detoasted either.

I don't see anything here we want to change.  The argument for not
bothering to model output costs has always been that they'd be the
same for every possible query plan, and I think that that applies to
detoasting costs as much as the actual output conversions.  Moreover,
if we're not sending data to the client, the skipped network I/O could
easily represent a larger cost than anything else --- but there's no
reasonable way to account for that.

If you do care about those costs, a possible answer is to run
auto_explain, allowing you to capture data behind-the-scenes for
queries that really are sending data to clients.

            regards, tom lane