Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Horizontal scalability/sharding
Date
Msg-id CA+TgmoaH8hLOJB4tMjBmBst+tswVe3kUBmaJhiJB3cLQ1ksRpg@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Horizontal scalability/sharding  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Horizontal scalability/sharding  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Tue, Sep 1, 2015 at 1:06 PM, Josh Berkus <josh@agliodbs.com> wrote:
> You're assuming that our primary bottleneck for writes is IO.  It's not
> at present for most users, and it certainly won't be in the future.  You
> need to move your thinking on systems resources into the 21st century,
> instead of solving the resource problems from 15 years ago.

Your experience doesn't match mine.  I find that it's frequently
impossible to get the system to use all of the available CPU capacity,
either because you're bottlenecked on locks or because you are
bottlenecked on the  I/O subsystem, and with the locking improvements
in newer versions, the former is becoming less and less common.
Amit's recent work on scalability demonstrates this trend: he goes
looking for lock bottlenecks, and finds problems that only occur at
128+ concurrent connections running full tilt.  The patches show
limited benefit - a few percentage points - at lesser concurrency
levels.  Either there are other locking bottlenecks that limit
performance at lower client counts but which mysteriously disappear as
concurrency increases, which I would find surprising, or the limit is
somewhere else.  I haven't seen any convincing evidence that the I/O
subsystem is the bottleneck, but I'm having a hard time figuring out
what else it could be.

> Our real future bottlenecks are:
>
> * ability to handle more than a few hundred connections
> * locking limits on the scalability of writes
> * ability to manage large RAM and data caches

I do agree that all of those things are problems, FWIW.

> Any sharding solution worth bothering with will solve some or all of the
> above by extending our ability to process requests across multiple
> nodes.  Any solution which does not is merely an academic curiosity.

I think the right solution to those problems is to attack them
head-on.  Sharding solutions should cater to use cases where using all
the resources of one machine isn't sufficient no matter how
efficiently we do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: perlcritic
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Horizontal scalability/sharding