Thread: enable parallel query by default?

enable parallel query by default?

From
Robert Haas
Date:
Hi,

One of the questions I have about parallel query is whether it should
be enabled by default.  That is, should we make the default value of
max_parallel_degree to a value higher than 0?  Perhaps 1, say?

There are some good reasons why this might be a bad idea, such as:

- As discussed on a nearby thread, there is every possibility of nasty bugs.
- Parallel query uses substantially more resources than a regular
query, which might overload your system.

On the other hand:

- Features that are on by default get more testing and thus might get
less buggy more quickly.
- A lot of people don't change the default configuration and thus
wouldn't get any benefit from the feature if it's not on by default.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: enable parallel query by default?

From
"Joshua D. Drake"
Date:
On 02/08/2016 01:07 PM, Robert Haas wrote:
> Hi,
>
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?

O.k. after some googling where I found your fantastic blog on the 
subject, max_parallel_degree looks like it should be 
max_parallel_workers. Am I correct in assuming that given the 
opportunity to use parallel workers, postgres will launch N number of 
workers up to the value of max_parallel_degree ?

If so, then I think 1 or 2 would be reasonable. By far the majority of 
servers are going to have at least two cores.

>
> There are some good reasons why this might be a bad idea, such as:
>
> - As discussed on a nearby thread, there is every possibility of nasty bugs.

Which we won't find if it doesn't get turned on.


> - Parallel query uses substantially more resources than a regular
> query, which might overload your system.

How much of a reality is that? Isn't this something we could just cover 
in the release notes?

>
> On the other hand:
>
> - Features that are on by default get more testing and thus might get
> less buggy more quickly.

Correct.

> - A lot of people don't change the default configuration and thus
> wouldn't get any benefit from the feature if it's not on by default.
>

Correct.

> Thoughts?
>

+1 on enabling by default.

Sincerely,

JD


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: enable parallel query by default?

From
Andres Freund
Date:
Hi,

On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?
> 
> There are some good reasons why this might be a bad idea, such as:
> 
> - As discussed on a nearby thread, there is every possibility of nasty
> bugs.

I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.

Andres



Re: enable parallel query by default?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?

I'm not sure I'm on board with that as a releaseable default, but there
certainly would be an argument for turning it on from now to say mid
beta, so as to improve test coverage.  I think we've done similar things
in the past.

I don't understand however how that doesn't break the regression tests?
Surely we've got lots of EXPLAIN queries that would change.
        regards, tom lane



Re: enable parallel query by default?

From
Peter Geoghegan
Date:
On Mon, Feb 8, 2016 at 1:24 PM, Andres Freund <andres@anarazel.de> wrote:
> I think that's an argument to enable it till at least beta1. Let's
> change the default, and add an item to the open items list to reconsider
> then.

+1.

Reminds me of what happened with the num_xloginsert_locks GUC (it was
eventually replaced with a #define before release, though).



-- 
Peter Geoghegan



Re: enable parallel query by default?

From
"David G. Johnston"
Date:
On Monday, February 8, 2016, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?
>
> There are some good reasons why this might be a bad idea, such as:
>
> - As discussed on a nearby thread, there is every possibility of nasty
> bugs.

I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.


I'd rather phrase that as: I cannot think of any reason to not make it on by default so let's do so until experience tells us differently.  If experience says it is too buggy then I'd be concerned about allowing it be to enabled at all let alone by default.  If there are usage concerns then ideally the postmaster could detect them and configure itself rather than provide yet another knob for people to research.  I don't know enough about the specifics on that end myself.  IOW I could not convince myself that the end of beta was somehow important to this decision - but maybe the hash join bug has me soured...

As a user I'd like it to just work within the confines of what system resources I've told PostgreSQL as a whole it can use and that it detects it has available to it by asking the O/S.

So, for me the quality of the feature is a on/off decision and not a "defaults" one.  the argument that our default configuration is geared toward low resource setups and since this is resource intensive it should be disabled resonates though I'd rather it work reasonably (of self toggle off) in both low and large resource environments.  Is that possible?

David J.

Re: enable parallel query by default?

From
Corey Huinker
Date:
I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.



+1 during the beta, +0.95 for default thereafter.

I think that most databases in the past have defaulted to single-core unless otherwise stated because machines that had multiple cores were uncommon, and the query that could intelligently use parallel was even more uncommon. So for them, the name of the game was "plan stability".

Machines are architected to be multicore now, and that will be the norm going forward, as will larger workloads that can easily overwhelm a single CPU. So I think Postgres should just enable parallel out of the box.

Having said that, it seems like the sort of thing I'd want set-able on a per-user basis.

ALTER ROLE overly_needy_web_client SET max_parallel_degree = 1;
ALTER ROLE moar_powarrr SET max_parallel_degree = 32;

And of course this is my chance to re-ask that we not block the possibility of one day being able to set this value relative to the number of cores available on the machine, i.e. this user can have a parallel degree 2x the number of CPUs, this one can only have 0.25x as many CPUs. It would be nice to have our configurations adapt with the hardware.