Re: parallel joins, and better parallel explain - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: parallel joins, and better parallel explain
Date
Msg-id CAFiTN-ti8PS7Ku8a63P=ePVWtjCAYvpidfD1+sEs+GAfjeJeKw@mail.gmail.com
Whole thread Raw
In response to Re: parallel joins, and better parallel explain  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Jan 6, 2016 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 4, 2016 at 8:52 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> One strange behaviour, after increasing number of processor for VM,
> max_parallel_degree=0; is also performing better.

So, you went from 6 vCPUs to 8?  In general, adding more CPUs means
that there is less contention for CPU time, but if you already had 6
CPUs and nothing else running, I don't know why the backend running
the query would have had a problem getting a whole CPU to itself.  If
you previously only had 1 or 2 CPUs then there might have been some
CPU competition with background processes, but if you had 6 then I
don't know why the max_parallel_degree=0 case got faster with 8.

I am really not sure about this case, may be CPU allocation in virtual machine had problem.. but can't say anything
 
Anyway, I humbly suggest that this query isn't the right place to put
our attention.  There's no reason why we can't improve things further
in the future, and if it turns out that lots of people have problems
with the cost estimates on multi-batch parallel hash joins, then we
can revise the cost model.  We wouldn't treat a single query where a
non-parallel multi-batch hash join run slower than the costing would
suggest as a reason to revise the cost model for that case, and I
don't think this patch should be held to a higher standard.  In this
particular case, you can easily make the problem go away by tuning
configuration parameters, which seems like an acceptable answer for
people who run into this,

Yes, i agree with this point, cost model can always be improved. And anyway in most of the queries even in TPC-H benchmark we have seen big improvement with parallel join.

I have done further testing for observing the plan time, using TPC-H queries and some other many table join queries(7-8 tables)..

I did not find any visible regression in planning time...

*There are many combinations of queries i have tested, and because of big size of query and result did not attached in the mail... let me know if anybody want to know the details of queries...


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Relation extension scalability
Next
From: Ashutosh Bapat
Date:
Subject: code to deparse parameter in postgres_fdw is duplicated