Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Parallel Seq Scan
Date
Msg-id 20141219194929.GC29570@tamriel.snowman.net
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Dec 19, 2014 at 9:39 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > Perhaps we should reconsider our general position on hints then and
> > add them so users can define the plan to be used..  For my part, I don't
> > see this as all that much different.
> >
> > Consider if we were just adding HashJoin support today as an example.
> > Would we be happy if we had to default to enable_hashjoin = off?  Or if
> > users had to do that regularly because our costing was horrid?  It's bad
> > enough that we have to resort to those tweaks today in rare cases.
>
> If you're proposing that it is not reasonable to have a GUC that
> limits the degree of parallelism, then I think that's outright crazy:

I'm pretty sure that I didn't say anything along those lines.  I'll try
to be clearer.

What I'd like is such a GUC that we can set at a reasonable default of,
say, 4, and trust that our planner will generally do the right thing.
Clearly, this may be something which admins have to tweak but what I
would really like to avoid is users having to set this GUC explicitly
for each of their queries.

> that is probably the very first GUC we need to add.  New query
> processing capabilities can entail new controlling GUCs, and
> parallelism, being as complex at it is, will probably add several of
> them.

That's fine if they're intended for debugging issues or dealing with
unexpected bugs or issues, but let's not go into this thinking we should
add GUCs which are geared with the expectation of users tweaking them
regularly.

> But the big picture here is that if you want to ever have parallelism
> in PostgreSQL at all, you're going to have to live with the first
> version being pretty crude.  I think it's quite likely that the first
> version of parallel sequential scan will be just as buggy as Hot
> Standby was when we first added it, or as buggy as the multi-xact code
> was when it went in, and probably subject to an even greater variety
> of taxing limitations than any feature we've committed in the 6 years
> I've been involved in the project.  We get to pick between that and
> not having it at all.

If it's disabled by default then I'm worried it won't really improve
until it is.  Perhaps that's setting a higher bar than you feel is
necessary but, for my part at least, it doesn't feel like a very high
level.

> I'll take a look at the papers you sent about parallel query
> optimization, but personally I think that's putting the cart not only
> before the horse but also before the road.  For V1, we need a query
> optimization model that does not completely suck - no more.  The key
> criterion here is that this has to WORK.  There will be time enough to
> improve everything else once we reach that goal.

I agree that it's got to work, but it also needs to be generally well
designed, and have the expectation of being on by default.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Parallel Seq Scan
Next
From: Stephen Frost
Date:
Subject: Re: Parallel Seq Scan