Re: Parallel Query - Mailing list pgsql-performance

From Luís Roberto Weck
Subject Re: Parallel Query
Date
Msg-id d5a6a219-48c7-6da3-f96f-afac9cd38a1a@siscobra.com.br
Whole thread Raw
In response to Re: Parallel Query  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Parallel Query  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance

Em 13/11/2019 17:40, Jeff Janes escreveu:
On Wed, Nov 13, 2019 at 3:11 PM Luís Roberto Weck <luisroberto@siscobra.com.br> wrote:
Hi!

Is there a reason query 3 can't use parallel workers? Using q1 and q2
they seem very similar but can use up to 4 workers to run faster:

q1: https://pastebin.com/ufkbSmfB
q2: https://pastebin.com/Yt32zRNX
q3: https://pastebin.com/dqh7yKPb

The sort node on q3 takes almost 12 seconds, making the query run on 68 
if I had set enough work_mem to make it all in memory.

The third one thinks it is going find 3454539 output rows.  If it run in parallel, it thinks it will be passing lots of rows up from the parallel workers, and charges a high price (parallel_tuple_cost = 0.1) for doing so.  So you can try lowering  parallel_tuple_cost, or figuring out why the estimate is so bad.

Cheers,

Jeff
 Hi Jeff,

I don't think the "HAVING" clause is havin any effect on the estimates:

WITHOUT "HAVING":
Group  (cost=1245134.08..1279680.28 rows=3454620 width=14)
  Group Key: c.concod, cp.conparnum, cp.conpardatven
  ->  Sort  (cost=1245134.08..1253770.63 rows=3454620 width=14)
        Sort Key: c.concod, cp.conparnum, cp.conpardatven
        ->  Hash Join  (cost=34366.64..869960.70 rows=3454620 width=14)
              Hash Cond: (cp.concod = c.concod)
              ->  Seq Scan on contrato_parcela cp  (cost=0.00..804248.08 rows=11941308 width=14)
              ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8)
                    ->  Index Only Scan using contrato_iu0004 on contrato c  (cost=0.43..23436.55 rows=874407 width=8)
                          Index Cond: (carcod = 100)

WITH "HAVING":
GroupAggregate  (cost=1245144.88..1322874.51 rows=3454650 width=14)
  Group Key: c.concod, cp.conparnum, cp.conpardatven
  Filter: (count(*) > 1)
  ->  Sort  (cost=1245144.88..1253781.51 rows=3454650 width=14)
        Sort Key: c.concod, cp.conparnum, cp.conpardatven
        ->  Hash Join  (cost=34366.64..869968.02 rows=3454650 width=14)
              Hash Cond: (cp.concod = c.concod)
              ->  Seq Scan on contrato_parcela cp  (cost=0.00..804255.13 rows=11941413 width=14)
              ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8)
                    ->  Index Only Scan using contrato_iu0004 on contrato c  (cost=0.43..23436.55 rows=874407 width=8)
                          Index Cond: (carcod = 100)

Maybe PostgreSQL can't find a way to calculate having estimates?

pgsql-performance by date:

Previous
From: Luís Roberto Weck
Date:
Subject: Re: Parallel Query
Next
From: Jeff Janes
Date:
Subject: Re: Parallel Query