Thread: enable parallel query by default?
Hi, One of the questions I have about parallel query is whether it should be enabled by default. That is, should we make the default value of max_parallel_degree to a value higher than 0? Perhaps 1, say? There are some good reasons why this might be a bad idea, such as: - As discussed on a nearby thread, there is every possibility of nasty bugs. - Parallel query uses substantially more resources than a regular query, which might overload your system. On the other hand: - Features that are on by default get more testing and thus might get less buggy more quickly. - A lot of people don't change the default configuration and thus wouldn't get any benefit from the feature if it's not on by default. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/08/2016 01:07 PM, Robert Haas wrote: > Hi, > > One of the questions I have about parallel query is whether it should > be enabled by default. That is, should we make the default value of > max_parallel_degree to a value higher than 0? Perhaps 1, say? O.k. after some googling where I found your fantastic blog on the subject, max_parallel_degree looks like it should be max_parallel_workers. Am I correct in assuming that given the opportunity to use parallel workers, postgres will launch N number of workers up to the value of max_parallel_degree ? If so, then I think 1 or 2 would be reasonable. By far the majority of servers are going to have at least two cores. > > There are some good reasons why this might be a bad idea, such as: > > - As discussed on a nearby thread, there is every possibility of nasty bugs. Which we won't find if it doesn't get turned on. > - Parallel query uses substantially more resources than a regular > query, which might overload your system. How much of a reality is that? Isn't this something we could just cover in the release notes? > > On the other hand: > > - Features that are on by default get more testing and thus might get > less buggy more quickly. Correct. > - A lot of people don't change the default configuration and thus > wouldn't get any benefit from the feature if it's not on by default. > Correct. > Thoughts? > +1 on enabling by default. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
Hi, On 2016-02-08 16:07:05 -0500, Robert Haas wrote: > One of the questions I have about parallel query is whether it should > be enabled by default. That is, should we make the default value of > max_parallel_degree to a value higher than 0? Perhaps 1, say? > > There are some good reasons why this might be a bad idea, such as: > > - As discussed on a nearby thread, there is every possibility of nasty > bugs. I think that's an argument to enable it till at least beta1. Let's change the default, and add an item to the open items list to reconsider then. Andres
Robert Haas <robertmhaas@gmail.com> writes: > One of the questions I have about parallel query is whether it should > be enabled by default. That is, should we make the default value of > max_parallel_degree to a value higher than 0? Perhaps 1, say? I'm not sure I'm on board with that as a releaseable default, but there certainly would be an argument for turning it on from now to say mid beta, so as to improve test coverage. I think we've done similar things in the past. I don't understand however how that doesn't break the regression tests? Surely we've got lots of EXPLAIN queries that would change. regards, tom lane
On Mon, Feb 8, 2016 at 1:24 PM, Andres Freund <andres@anarazel.de> wrote: > I think that's an argument to enable it till at least beta1. Let's > change the default, and add an item to the open items list to reconsider > then. +1. Reminds me of what happened with the num_xloginsert_locks GUC (it was eventually replaced with a #define before release, though). -- Peter Geoghegan
On Monday, February 8, 2016, Andres Freund <andres@anarazel.de> wrote:
So, for me the quality of the feature is a on/off decision and not a "defaults" one. the argument that our default configuration is geared toward low resource setups and since this is resource intensive it should be disabled resonates though I'd rather it work reasonably (of self toggle off) in both low and large resource environments. Is that possible?
Hi,
On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> One of the questions I have about parallel query is whether it should
> be enabled by default. That is, should we make the default value of
> max_parallel_degree to a value higher than 0? Perhaps 1, say?
>
> There are some good reasons why this might be a bad idea, such as:
>
> - As discussed on a nearby thread, there is every possibility of nasty
> bugs.
I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.
I'd rather phrase that as: I cannot think of any reason to not make it on by default so let's do so until experience tells us differently. If experience says it is too buggy then I'd be concerned about allowing it be to enabled at all let alone by default. If there are usage concerns then ideally the postmaster could detect them and configure itself rather than provide yet another knob for people to research. I don't know enough about the specifics on that end myself. IOW I could not convince myself that the end of beta was somehow important to this decision - but maybe the hash join bug has me soured...
As a user I'd like it to just work within the confines of what system resources I've told PostgreSQL as a whole it can use and that it detects it has available to it by asking the O/S.
David J.
I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.
+1 during the beta, +0.95 for default thereafter.
I think that most databases in the past have defaulted to single-core unless otherwise stated because machines that had multiple cores were uncommon, and the query that could intelligently use parallel was even more uncommon. So for them, the name of the game was "plan stability".
Machines are architected to be multicore now, and that will be the norm going forward, as will larger workloads that can easily overwhelm a single CPU. So I think Postgres should just enable parallel out of the box.
Machines are architected to be multicore now, and that will be the norm going forward, as will larger workloads that can easily overwhelm a single CPU. So I think Postgres should just enable parallel out of the box.
Having said that, it seems like the sort of thing I'd want set-able on a per-user basis.
ALTER ROLE overly_needy_web_client SET max_parallel_degree = 1;
ALTER ROLE moar_powarrr SET max_parallel_degree = 32;
And of course this is my chance to re-ask that we not block the possibility of one day being able to set this value relative to the number of cores available on the machine, i.e. this user can have a parallel degree 2x the number of CPUs, this one can only have 0.25x as many CPUs. It would be nice to have our configurations adapt with the hardware.