Re: Parallel tuplesort, partitioning, merging, and the future - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel tuplesort, partitioning, merging, and the future
Date
Msg-id CA+Tgmob=Pas24FiJ9M24+=3e_8Dtz8i8i3aHjZJ83P+HJianyw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel tuplesort, partitioning, merging, and the future  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Aug 10, 2016 at 4:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> My view on this - currently anyway - is that we shouldn't conflate the
>> tuplesort with the subsequent index generation, but that we should try
>> to use parallelism within the tuplesort itself to the greatest extent
>> possible.  If there is a single output stream that the leader uses to
>> generate the final index, then none of the above problems arise.  They
>> only arise if you've got multiple processes actually writing to the
>> index.
>
> I'm not sure if you're agreeing with my contention about parallel
> CREATE INDEX not being a good target for partitioning here. Are you?

No.  I agree that writing to the index in parallel is bad, but I think
it's entirely reasonable to try to set things up so that the leader
does as little of the final merge work itself as possible, instead
offloading that to workers.  Unless, of course, we can prove that the
overhead of the final merge pass is so low that it doesn't matter
whether we offload it.

> While all this speculation about choice of algorithm is fun,
> realistically I'm not gong to write the patch for a rainy day (nor for
> parallel CREATE INDEX, at least until we become very comfortable with
> all the issues I raise, which could never happen). I'd be happy to
> consider helping you improve parallel query by providing
> infrastructure like this, but I need someone else to write the client
> of the infrastructure (e.g. a parallel merge join patch), or to at
> least agree to meet me half way with an interdependent prototype of
> their own. It's going to be messy, and we'll have to do a bit of
> stumbling to get to a good place. I can sign up to that if I'm not the
> only one that has to stumble.

Fair enough.

> Serial merging still needs work, it seems.

At the risk of stating the obvious, improving serial execution
performance is always superior to comparable gains originating from
parallelism, so no complaints here about work in that area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Next
From: Tom Lane
Date:
Subject: Re: Is there a way around function search_path killing SQL function inlining?