Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Robert Haas
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CA+TgmoZmhfqzeTJsPma9ZkkCFGK0s+i3cHcXLk8WUtcLSApdfA@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: group locking: incomplete patch, just for discussion
List pgsql-hackers
On Wed, Oct 29, 2014 at 4:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> If you do wish to pursue || Seq Scan, then a working prototype would
> help. It allows us to see that there is an open source solution we are
> working to solve the problems for. People can benchmark it, understand
> the benefits and issues it raises and that would help focus attention
> on the problems you are trying to solve in infrastructure. People may
> have suggestions on how to solve or avoid those that you hadn't
> thought of.

I've mulled that over a bit and it might be worth pursuing further.
Of course there's always the trade-off: doing that means not doing
something else.

> As I mentioned previously when you started discussing shared memory
> segments, parallel sort does NOT require shared memory. The only thing
> you need to share are files. Split the problem into N pieces, sort
> them to produce N files and then merge the files using existing code.
> That only applies to large sorts, but then those are the ones you
> cared about doing in parallel anyway.

A simple implementation of this would work only for simple
pass-by-value types, like integers.  Pass-by-reference types require
the comparator to de-TOAST, and some other types require catalog
lookups.  I don't think that's very useful: Noah previously did some
analysis of this problem and concluded (with apologies if I'm remember
the details incorrectly here) that the comparator for strings was
something like 1000x as expensive as the comparator for integers, and
that you basically couldn't get the latter to take enough time to be
worth parallelizing.

I care much more about getting the general infrastructure in place to
make parallel programming feasible in PostgreSQL than I do about
getting one particular case working.  And more than feasible: I want
it to be relatively straightforward.  That's not simple, but the
potential rewards are great.  Let's face it: there are people here who
are much better than I am at hacking on the planner and especially the
executor than I am.  Why haven't any of those people implemented
parallel anything?  I think it's because, right now, it's just too
darn hard.  I'm trying to reduce that to something approaching the
difficulty of writing normal PostgreSQL backend code, and I think I'm
6-12 patches away from that.  This is one of them and, yeah, it's not
done, and, yeah, we might not get to parallel anything this release
and, yeah, things would be going faster if I could work on parallelism
full time.  But I think that the progress we are making is meaningful
and the goal is within sight.

I appreciate that you'd probably attack this problem from a different
direction than I'm attacking it from, but I still think that what I'm
trying to do is a legitimate direction of attack which, by the way,
does not preclude anybody else from attacking it from a different
direction and, indeed, such a development would be most welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: WITH CHECK and Column-Level Privileges
Next
From: Robert Haas
Date:
Subject: Re: TAP test breakage on MacOS X