Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Parallel Seq Scan
Date
Msg-id 20141219142710.GA29570@tamriel.snowman.net
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Marko Tiikkaja <marko@joh.to>)
Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Dec 19, 2014 at 7:51 AM, Stephen Frost <sfrost@snowman.net> wrote:
> >> 3. After certain point, increasing having more number of workers won't
> >> help and rather have negative impact, refer Test-4.
> >
> > Yes, I see that too and it's also interesting- have you been able to
> > identify why?  What is the overhead (specifically) which is causing
> > that?
>
> Let's rewind.  Amit's results show that, with a naive algorithm
> (pre-distributing equal-sized chunks of the relation to every worker)
> and a fairly-naive first cut at how to pass tuples around (I believe
> largely from what I did in pg_background) he can sequential-scan a
> table with 8 workers at 6.4 times the speed of a single process, and
> you're complaining because it's not efficient enough?  It's a first
> draft!  Be happy we got 6.4x, for crying out loud!

He also showed cases where parallelizing a query even with just two
workers caused a serious increase in the total runtime (Test 6).  Even
having four workers was slower in that case, but a modest performance
improvment was reached at eight but then no improvement from that was
seen when running with 16.

Being able to understand what's happening will inform how we cost this
to, hopefully, achieve the 6.4x gains where we can and avoid the
pitfalls of performing worse than a single thread in cases where
parallelism doesn't help.  What would likely be very helpful in the
analysis would be CPU time information- when running with eight workers,
were we using 800% CPU (8x 100%), or something less (perhaps due to
locking, i/o, or other processes).

Perhaps it's my fault for not being surprised that a naive first cut
gives us such gains as my experience with parallel operations and PG has
generally been very good (through the use of multiple connections to the
DB and therefore independent transactions, of course).  I'm very excited
that we're making such great progress towards having parallel execution
in the DB as I've often used PG in data warehouse use-cases.

> The barrier to getting parallel sequential scan (or any parallel
> feature at all) committed is not going to be whether an 8-way scan is
> 6.4 times faster or 7.1 times faster or 7.8 times faster.  It's going
> to be whether it's robust and won't break things.  We should be
> focusing most of our effort here on identifying and fixing robustness
> problems.  I'd vote to commit a feature like this with a 3x
> performance speedup if I thought it was robust enough.

I don't have any problem if an 8-way scan is 6.4x faster or if it's 7.1
times faster, but what if that 3x performance speedup is only achieved
when running with 8 CPUs at 100%?  We'd have to coach our users to
constantly be tweaking the enable_parallel_query (or whatever) option
for the queries where it helps and turning it off for others.  I'm not
so excited about that.

> I'm not saying we shouldn't try to improve the performance here - we
> definitely should.  But I don't think we should say, oh, an 8-way scan
> isn't good enough, we need a 16-way or 32-way scan in order for this
> to be efficient.  That is getting your priorities quite mixed up.

I don't think I said that.  What I was getting at is that we need a cost
system which accounts for the costs accurately enough that we don't end
up with worse performance than single-threaded operation.  In general, I
don't expect that to be very difficult and we can be conservative in the
initial releases to hopefully avoid regressions, but it absolutely needs
consideration.

> >> I think as discussed previously we need to introduce 2 additional cost
> >> variables (parallel_startup_cost, cpu_tuple_communication_cost) to
> >> estimate the parallel seq scan cost so that when the tables are small
> >> or selectivity is high, it should increase the cost of parallel plan.
> >
> > I agree that we need to figure out a way to cost out parallel plans, but
> > I have doubts about these being the right way to do that.  There has
> > been quite a bit of literature regarding parallel execution and
> > planning- have you had a chance to review anything along those lines?
> > We certainly like to draw on previous experiences and analysis rather
> > than trying to pave our own way.
>
> I agree that it would be good to review the literature, but am not
> aware of anything relevant.  Could you (or can anyone) provide some
> links?

There's certainly documentation available from the other RDBMS' which
already support parallel query, as one source.  Other academic papers
exist (and once you've linked into one, the references and prior work
helps bring in others).  Sadly, I don't currently have ACM access (might
have to change that..), but there are publicly available papers also,
such as:

http://i.stanford.edu/pub/cstr/reports/cs/tr/96/1570/CS-TR-96-1570.pdf
http://www.vldb.org/conf/1998/p251.pdf
http://www.cs.uiuc.edu/class/fa05/cs591han/sigmodpods04/sigmod/pdf/I-001c.pdf

> > With these additional costs comes the consideration that we're looking
> > for a wall-clock runtime proxy and therefore, while we need to add costs
> > for parallel startup and tuple communication, we have to reduce the
> > overall cost because of the parallelism or we'd never end up choosing a
> > parallel plan.  Is the thought to simply add up all the costs and then
> > divide?  Or perhaps to divide the cost of the actual plan but then add
> > in the parallel startup cost and the tuple communication cost?
>
> This has been discussed, on this thread.

Fantastic.  What I found in the patch was:

+   /*
+    * We simply assume that cost will be equally shared by parallel
+    * workers which might not be true especially for doing disk access.
+    * XXX - We would like to change these values based on some concrete
+    * tests.
+    */

What I asked for was:

----
I'm thinking we need a README or similar which discusses all of this and
includes any references out to academic papers or similar as appropriate.
----

Perhaps it doesn't deserve its own README, but we clearly need more.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: parallel mode and parallel contexts
Next
From: Marko Tiikkaja
Date:
Subject: Re: Parallel Seq Scan