On Mon, Apr 1, 2019 at 12:00 PM Andres Freund <andres@anarazel.de> wrote:
> > Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8)
> > (actual time=0.034..24981.895 rows=90017507 loops=1)
> > Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
> > -> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504
> > width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
> > -> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual
> > time=0.000..0.000 rows=1 loops=90017507)
> > -> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4)
> > (actual time=0.006..0.006 rows=1 loops=1)
>
> note the estimated rows=100000 vs the actual rows=1 in the seqscan /
> materialize. That's what makes the planner think this is much more
> expensive than it is, which in turn triggers the use of a parallel scan.
Yeah, I just noticed that. The sequential scan on the inner side of
the nestloop join is a problem.
More generally, as somebody familiar with the TPC-C schema, I cannot
make sense of the query itself. Why would anybody want to join "Image
ID associated to Item" from the item table to the district column of
the orderlines table? It simply makes no sense.
--
Peter Geoghegan