Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: fix cost subqueryscan wrong parallel cost
Date
Msg-id CAKFQuwYqXCS=Hu4=kXmKwactpqK2v9cqJifz1gWX-RniFJRnnw@mail.gmail.com
Whole thread Raw
In response to Re: fix cost subqueryscan wrong parallel cost  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: fix cost subqueryscan wrong parallel cost
List pgsql-hackers
On Fri, Apr 29, 2022 at 11:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, these SubqueryScans are being labeled as producing 60000 rows
when their input only produces 25000 rows, which is surely insane.

So: even though the SubqueryScan itself isn't parallel-aware, the number
of rows it processes has to be de-rated according to the number of workers
involved.

Right, so why does baserel.rows show 60,000 here when path->subpath->rows only shows 25,000?  Because if you substitute path->subpath->rows for baserel.rows in cost_subquery you get (with your cost change above):

 Incremental Sort  (cost=27875.50..45577.57 rows=120000 width=12) (actual time=165.285..235.749 rows=60000 loops=1)
   Sort Key: "*SELECT* 1".a, "*SELECT* 1".c
   Presorted Key: "*SELECT* 1".a
   Full-sort Groups: 10  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
   Pre-sorted Groups: 10  Sort Method: quicksort  Average Memory: 521kB  Peak Memory: 521kB
   ->  Unique  (cost=27794.85..28994.85 rows=120000 width=12) (actual time=157.882..220.501 rows=60000 loops=1)
         ->  Sort  (cost=27794.85..28094.85 rows=120000 width=12) (actual time=157.881..187.232 rows=120000 loops=1)
               Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
               Sort Method: external merge  Disk: 2600kB
               ->  Gather  (cost=0.00..1400.00 rows=120000 width=12) (actual time=0.197..22.705 rows=120000 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Append  (cost=0.00..1400.00 rows=50000 width=12) (actual time=0.015..13.101 rows=40000 loops=3)
                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..6.864 rows=30000 loops=2)
                                 ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..3.708 rows=30000 loops=2)
                           ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..6.918 rows=30000 loops=2)
                                 ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..3.769 rows=30000 loops=2)
 Planning Time: 0.137 ms
 Execution Time: 239.958 ms
(19 rows)

Which shows your 1400 cost goal from union all, and the expected row counts, for gather-atop-append.

The fact that (baserel.rows > path->subpath->rows) here seems like a straight bug: there are no filters involved in this case but in the presence of filters baserel->rows should be strictly (<= path->subpath->rows), right?

David J.

pgsql-hackers by date:

Previous
From: Cary Huang
Date:
Subject: Re: allow specifying action when standby encounters incompatible parameter settings
Next
From: Andres Freund
Date:
Subject: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module