Re: EXPLAIN (ANALYZE, BUFFERS) - puzzling numbers for a simple query. - Mailing list pgsql-general

From Vijaykumar Jain
Subject Re: EXPLAIN (ANALYZE, BUFFERS) - puzzling numbers for a simple query.
Date
Msg-id CAM+6J97chj6Gyea_h03ec2YyN-3LeF4d0_NdFE0NRT_PY5pTJQ@mail.gmail.com
Whole thread Raw
In response to EXPLAIN (ANALYZE, BUFFERS) - puzzling numbers for a simple query.  (Pól Ua Laoínecháin <linehanp@tcd.ie>)
List pgsql-general
how is cost calculated?

row estimation 

FYI, I just read it when I had queries like why rows were estimated when the table has now rows.
but i think it requires a deeper reading into the source for cost calculation which i did not do at that time as i did not understand :)
but incase above helps to start.



On Fri, 4 Jun 2021 at 16:29, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Hi all,

Noticed this today - relatively simple query - table with 7 records
(all code is shown at the bottom of this post and on the fiddle here):

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=efe73a37d29af43f33d2bc79de2b6c97

Sample (2 of 7 records);

======================
INSERT INTO test
VALUES
(1, 'Pól'  , '2021-06-01', '2021-06-06'),
(2, 'Bill' , '2021-06-02', '2021-06-10');
=========================

Query:

========================
SELECT
  id,
  GENERATE_SERIES
    (t.start_date, t.end_date, '1 DAY')::DATE AS sd,
  t.end_date
FROM test t
ORDER BY t.id, t.start_date;
=========================

Now, the EXPLAIN (ANALYZE, BUFFERS) of this query is as follow - see
fiddle - with 7 records:

====================
QUERY PLAN
Sort (cost=165708.62..168608.62 rows=1160000 width=10) (actual
time=0.083..0.087 rows=42 loops=1)
Sort Key: id, (((generate_series((start_date)::timestamp with time
zone, (end_date)::timestamp with time zone, '1
day'::interval)))::date)
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=1
-> Result (cost=0.00..29036.10 rows=1160000 width=10) (actual
time=0.019..0.056 rows=42 loops=1)
Buffers: shared hit=1
-> ProjectSet (cost=0.00..5836.10 rows=1160000 width=14) (actual
time=0.018..0.042 rows=42 loops=1)
Buffers: shared hit=1
-> Seq Scan on test t (cost=0.00..21.60 rows=1160 width=10) (actual
time=0.008..0.010 rows=7 loops=1)
Buffers: shared hit=1
Planning Time: 0.061 ms
Execution Time: 0.131 ms
12 rows
===========================

Now, the first line of the PLAN has

> Sort (cost=165708.62..168608.62 rows=1160000 width=10)

and two other lines below this also contain the figure 1,160,000.

Where *_on earth_* is PostgreSQL obtaining 1.16M rows? And where do
the cost numbers come from for a query on 7 records?

This query - a recursive CTE (no GENERATE_SERIES) as follows:

===================
WITH RECURSIVE cte (id, sd, ed) AS
(
  SELECT t.id, t.start_date, t.end_date
  FROM test t
  UNION ALL
  SELECT  c.id, (c.sd + INTERVAL '1 DAY')::DATE, c.ed
  FROM cte c
  WHERE c.sd < (SELECT z.end_date FROM test z WHERE z.id = c.id)
)
SELECT * FROM cte c2
ORDER BY c2.id, c2.sd, c2.ed;
=============================

gives a PLAN as follows:

==========================
QUERY PLAN
Sort (cost=955181.47..955281.05 rows=39830 width=10) (actual
time=0.262..0.266 rows=42 loops=1)
Sort Key: c2.id, c2.sd, c2.ed
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=85
CTE cte
-> Recursive Union (cost=0.00..951341.55 rows=39830 width=10) (actual
time=0.010..0.216 rows=42 loops=1)
Buffers: shared hit=85
-> Seq Scan on test t (cost=0.00..21.60 rows=1160 width=10) (actual
time=0.009..0.010 rows=7 loops=1)
Buffers: shared hit=1
-> WorkTable Scan on cte c (cost=0.00..95052.33 rows=3867 width=10)
(actual time=0.006..0.020 rows=4 loops=9)
Filter: (sd < (SubPlan 1))
Rows Removed by Filter: 1
Buffers: shared hit=84
SubPlan 1
-> Index Scan using test_pkey on test z (cost=0.15..8.17 rows=1
width=4) (actual time=0.003..0.003 rows=1 loops=42)
Index Cond: (id = c.id)
Buffers: shared hit=84
-> CTE Scan on cte c2 (cost=0.00..796.60 rows=39830 width=10) (actual
time=0.011..0.233 rows=42 loops=1)
Buffers: shared hit=85
Planning Time: 0.137 ms
Execution Time: 0.324 ms
21 rows
===============================

Now, this PLAN is more complicated - and I totally get that! However,
where do these numbers:

> (cost=955181.47..955281.05 rows=39830 width=10)

come from?

Now, we've gone from 1,160,000 to 39,830 rows (progress? :-) ) and a
cost of ~ 1M (compared with 168k for the first query).

I probably need to read up on the EXPLAIN (ANALYZE, BUFFERS)
functionality - but I would appreciate:

a) an (overview) explanation of what's going on here in particular and

b) any good references to URLs, papers whatever which would be of
benefit to a (hopefully) reasonably competent SQL programmer with a
desire to grasp internals - how to interpret PostgreSQL's EXPLAIN
functionality.

TIA and rgs,


Pól...




--
Thanks,
Vijay
Mumbai, India

pgsql-general by date:

Previous
From: Pól Ua Laoínecháin
Date:
Subject: EXPLAIN (ANALYZE, BUFFERS) - puzzling numbers for a simple query.
Next
From: Ram Pratap Maurya
Date:
Subject: RE: BUG #17046: Upgrade postgres 11 to 13 version