Thread: How clustering for scale out works in PostgreSQL

How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 15:14:09

I am *expecting 1000+ hits to my PostgreSQL DB* and I doubt my standalone DB
will be able to handle it.

So I want to *scale out by adding more servers to share the load*. For this,
I want to do clustering.

I am *curious to know how clustering works in PostgreSQL.* (I don't want to
know how to setup cluster - as of now. Just want to know how clustering
works).

When I look at some of the content available while googling, I am getting
more and more confused, as I find that in most of the sites, clustering is
used interchangeably with replication.

*My purpose is scale out to handle more load, not high availability.*

Can any one please help me with the details or guide me to use urls.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

Richard Huxton

Date:

29 August 2013, 17:59:14

On 29/08/13 13:14, bsreejithin wrote:
>
> I am *expecting 1000+ hits to my PostgreSQL DB* and I doubt my standalone DB
> will be able to handle it.

OMG! 1000 hits every year! And "hits" too - not just any type of
query!!!! :-)

Seriously, if you try describing your setup, what queries make up your
"hits" and what you mean by 1000 then there are people on this list who
can tell you what sort of setup you'll need.

While you're away googling though, "replication" is indeed the term you
want. In particular "hot standby" which lets you run read-only queries
on the replicas.

--
   Richard Huxton
   Archonet Ltd

Re: How clustering for scale out works in PostgreSQL

From

"Joshua D. Drake"

Date:

29 August 2013, 18:08:45

On 08/29/2013 07:59 AM, Richard Huxton wrote:
>
> On 29/08/13 13:14, bsreejithin wrote:
>>
>> I am *expecting 1000+ hits to my PostgreSQL DB* and I doubt my
>> standalone DB
>> will be able to handle it.

We are going to need a little more detail here. In a normal environment
1000+ "hits" isn't that much, even if the hit is generating a dozen
queries per page.

A more appropriate action would be to consider the amount of transaction
per second and the type of queries the machine will be doing. You will
want to look into replication, hot standby as well as read only scaling
with pgpool-II.

>
> OMG! 1000 hits every year! And "hits" too - not just any type of
> query!!!! :-)
>
> Seriously, if you try describing your setup, what queries make up your
> "hits" and what you mean by 1000 then there are people on this list who
> can tell you what sort of setup you'll need.
>
> While you're away googling though, "replication" is indeed the term you
> want. In particular "hot standby" which lets you run read-only queries
> on the replicas.

Sarcasm with new recruits to the community is not the way to go.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats

Re: How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 19:13:29

Thanks a lot Joshua and others who have responded..

I am sorry about not putting in more details in my initial post.

What I posted is about a new setup that's going to come up..Discussions are
on whether to setup DB cluster to handle 1000 concurrent users.

DB cluster was thought of because of the following reasons :
A performance testing of the product was done and it was found that the DB
side utilizations were on the higher side with 125 concurrent
users.Application server was 4 Core 12GB RAM , DB server was 4 Core 12GB
RAM.

PostgreSQL version was* 8.2*. Also, I noted that most of the column data
types were declared as bigint, [quite unnecessarily]. ---I am trying to
put-in all details here.

The product team is working on to migrate to version 9.2 and to look at
possible areas where bigint data type can be changed to smaller data types.

The product owner wants to scale up to 1000 concurrent users. So one of the
discussions was to setup up Application level clustering and probably DB
level clustering to share the load (We don't need fail-over or HA here).

For application clustering what we thought is : to have EIGHT 4 Core 12GB
machines.
Currently for DB, the new server we have is : 8 Core 32 GB RAM.

Many here are having doubts whether the DB server will be able to handle
1000 concurrent user connections coming-in from the Application cluster
nodes.

I know that, this is still a generalised questions - but this is the
scenario we have here..
I am putting the question in this forum so that I could get as much opinions
as possible before we decide on what to do next.

To those replying - thanks alot - any help will be very useful..

And the sarcasm is ok, as long as I get to learn :)

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917p5768951.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

Mike Blackwell

Date:

29 August 2013, 19:21:36

On Thu, Aug 29, 2013 at 11:13 AM, bsreejithin <bsreejithin@gmail.com> wrote:

Thanks a lot Joshua and others who have responded..

I am sorry about not putting in more details in my initial post.

What I posted is about a new setup that's going to come up..Discussions are
on whether to setup DB cluster to handle 1000 concurrent users.

Ok. That's a start. Can you tell us more about what these users are doing? What kind of queries are being issued to the database? How often (per user or total per time)?

__________________________________________________________________________________
Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com

Re: How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 19:42:20

The performance test that was conducted was for 1 Hour.

There are 6 transactions. 2 DB inserts and 4 SELECTs.
Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
DB inserts.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917p5768957.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

"Joshua D. Drake"

Date:

29 August 2013, 19:46:35

On 08/29/2013 09:42 AM, bsreejithin wrote:
>
> The performance test that was conducted was for 1 Hour.
>
> There are 6 transactions. 2 DB inserts and 4 SELECTs.
> Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
> DB inserts.

This shouldn't be a problem with proper hardware and a connection
pooler. The concern isn't the 1000 sessions, it is the creating and
destroying in rapid succession of 1000 connections. A connection pooler
will resolve that issue.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats

Re: How clustering for scale out works in PostgreSQL

From

Igor Neyman

Date:

29 August 2013, 19:50:07

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of bsreejithin
> Sent: Thursday, August 29, 2013 12:42 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] How clustering for scale out works in PostgreSQL
>
> The performance test that was conducted was for 1 Hour.
>
> There are 6 transactions. 2 DB inserts and 4 SELECTs.
> Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
> DB inserts.
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-
> works-in-PostgreSQL-tp5768917p5768957.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>

With that kind of activity, you don't need clustering for your 1000 users.
What you need is PgBouncer, it should solv your problem.  Please read some docs on PgBouncer, it's "light-weight" and
veryeasy to setup. 

Regards,
Igor Neyman

Re: How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 19:52:48

Ok Igor..Will check out PgBouncer..Thanks a lot.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917p5768960.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 19:54:40

Thanks Joshua..Will look to use connection pooler which Igor mentioned..



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917p5768961.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

Thomas Kellerer

Date:

29 August 2013, 20:45:50

bsreejithin wrote on 29.08.2013 18:13:
> PostgreSQL version was* 8.2*.

8.2 has long been deprecated.

For a new system you should use 9.2 (or at least 9.1)

Thomas

Re: How clustering for scale out works in PostgreSQL

From

bsreejithin

Date:

29 August 2013, 20:48:21

Ya..sure...Migration to 9.2 is one of the activities planned and in fact it's already on track.Thanks Thomas

On Thu, Aug 29, 2013 at 11:16 PM, Thomas Kellerer [via PostgreSQL] <[hidden email]> wrote:

bsreejithin wrote on 29.08.2013 18:13:
> PostgreSQL version was* 8.2*.

8.2 has long been deprecated.

For a new system you should use 9.2 (or at least 9.1)

Thomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-works-in-PostgreSQL-tp5768917p5768973.html
To unsubscribe from How clustering for scale out works in PostgreSQL, click here.
NAML

View this message in context: Re: How clustering for scale out works in PostgreSQL
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How clustering for scale out works in PostgreSQL

From

Kevin Grittner

Date:

31 August 2013, 17:44:07

bsreejithin <bsreejithin@gmail.com> wrote:

> What I posted is about a new setup that's going to come
> up..Discussions are on whether to setup DB cluster to handle 1000
> concurrent users.

I previously worked for Wisconsin Courts, where we had a single
server which handled about 3000 web users collectively generating
hundreds of web hits per second generating thousands of queries per
second, while at the same time functioning as a replication target
from 80 sources sending about 20 transactions per second which
modified data (many having a large number of DML statements per
transaction) against a 3 TB database.  The same machine also hosted
a transaction repository for all modifications to the database,
indexed for audit reports and ad hoc queries; that was another 3
TB.  Each of these was running on a 40-drive RAID.

Shortly before I left we upgraded from a machine with 16 cores and
256 GB RAM to one with 32 cores and 512 GB RAM, because there is
constant growth in both database size and load.  Performance was
still good on the smaller machine, but monitoring showed we were
approaching saturation.  We had started to see some performance
degradation on the old machine, but were able to buy time by
reducing the size of the web connection pool (in the Java
application code) from 65 to 35.  Testing different connection pool
sizes showed that pool size to be optimal for our workload on that
machine; your ideal pool size can only be determined through
testing.

You can poke around in this application here, if you like:
http://wcca.wicourts.gov/

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How clustering for scale out works in PostgreSQL

From

Scott Marlowe

Date:

02 September 2013, 04:38:27

On Thu, Aug 29, 2013 at 6:14 AM, bsreejithin <bsreejithin@gmail.com> wrote:
>
> I am *expecting 1000+ hits to my PostgreSQL DB* and I doubt my standalone DB
> will be able to handle it.
>
> So I want to *scale out by adding more servers to share the load*. For this,
> I want to do clustering.
>
> I am *curious to know how clustering works in PostgreSQL.* (I don't want to
> know how to setup cluster - as of now. Just want to know how clustering
> works).
>
> When I look at some of the content available while googling, I am getting
> more and more confused, as I find that in most of the sites, clustering is
> used interchangeably with replication.
>
> *My purpose is scale out to handle more load, not high availability.*
>
> Can any one please help me with the details or guide me to use urls.

What you are doing is called capacity planning, and it's a vital step
before deploying an app and the servers to support it.

Look at several things:
How many WRITEs do you need to make a second.
How many READs do you need to make a second.
How big will your data set be.
How many clients you'll have concurrently.

Your first post pretty much just has how many concurrent users. Later
posts had read and writes but didn't specify if that's per user or in
total. I'm guessing per user. Either way the total load you listed was
pretty small. So yeah, the pgbouncer pooling solution looks optimal.
But you might want to look at how big your data set is, how fast it
will grow, and what kind of indexes it'll need for good performance as
well. If your data set is likely to get REALLY big then that's another
issue to tackle as well.

To understand recursion, one must first understand recursion.

Re: How clustering for scale out works in PostgreSQL

From

Craig Ringer

Date:

02 September 2013, 10:09:16

On 08/30/2013 01:48 AM, bsreejithin wrote:
> Ya..sure...Migration to 9.2 is one of the activities planned and in fact
> it's already on track.Thanks Thomas

You'll want to re-do your performance testing; a huge amount has changed
since 8.2.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: How clustering for scale out works in PostgreSQL

From

Craig Ringer

Date:

02 September 2013, 10:46:21

bsreejithin wrote:
>
> I am *expecting 1000+ hits to my PostgreSQL DB* and I doubt my standalone DB
> will be able to handle it.
>
> So I want to *scale out by adding more servers to share the load*. For this,
> I want to do clustering.
>
>  DB server was 4 Core 12GB RAM.

You're jumping way ahead here. You have a medium sized server that
should effortlessly handle most loads if its I/O subsystem is up to it.

It's a good idea to plan for what you'll do as load grows, but it's not
necessary to jump straight to engineering some "web scale" monstrosity
if you don't have to.

> The performance test that was conducted was for 1 Hour.

> There are 6 transactions. 2 DB inserts and 4 SELECTs.
> Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
> DB inserts.

It's not possible to give useful specific advice without knowing what
the "selects" and "updates" you're dealing with are. After all, a
single-tuple update of a non-indexed field with no trigger side-effects
will be way sub-millisecond. On the other hand, a big update over a
multi-table join that causes updates on several multi-column indexes /
GIN indexes / etc, a cascade update, etc, might take hours.

You need to work out what the actual load is. Determine whether you're
bottlenecked on disk reads, disk writes, disk flushes (fsync), CPU, etc.

Ask some basic tuning questions. Does your DB fit in RAM? Do at least
the major indexes and "hot" smaller tables fit in RAM? Is
effective_cache_size set to tell the query planner that.

Look at the query plans. Is there anything grossly unreasonable? Do you
need to adjust any tuning params (random_page_cost, etc)? Is
effective_cache_size set appropriately for the server?  Figure out
whether there are any indexes that're worth creating that won't make the
write load too much worse.

Find the point where throughput stops scaling up with load on the
server. Put a connection pooler in place and limit concurrent working
connections to PostgreSQL to about that level; overall performance will
be greatly improved by not trying to do too much all at once.

> I am *curious to know how clustering works in PostgreSQL.* (I don't want to
> know how to setup cluster - as of now. Just want to know how clustering
> works).

The "clustering" available in PostgreSQL is a variety of forms of
replication.

It is important to understand that attempting to scale out to
multi-server setups requires significant changes to many applications.
There is no transparent multi-master clustering for PostgreSQL.

If you're on a single server, you can rely on the strict rules
PostgreSQL follows for traffic isolation. It will ensure that two
updates can't conflict with row-level locking. In SERIALIZABLE isolation
it'll protect against a variety of concurrency problems.

Most of that goes away when you go multi-server. If you're using a
single master and multiple read-replicas you have to deal with lags,
where the replicas haven't yet seen / replayed transactions performed on
the master. So you might UPDATE a row in one transaction, only to find
that when you SELECT it the update isn't there ... then it suddenly
appears when you SELECT again. Additionally, long-running queries on the
read-only replicas can be aborted to allow the replica to continue
replaying changes from the master.

You can work around that one with synchronous replication, but then you
create another set of performance challenges on the master.

There are also a variety of logical / row-level replication options.
They have their own trade-offs in terms of impact on master performance,
transaction consistency, etc.

It only gets more "fun" when you want multiple masters, where you can
write to more than one server.  Don't go there unless you have to.

> When I look at some of the content available while googling, I am getting
> more and more confused, as I find that in most of the sites, clustering is
> used interchangeably with replication.

Well, a cluster of replicas is still a cluster.

If you mean "transparent multi-master clustering", well that's another
thing entirely.

I strongly recommend you go back to basics. Evaluate the capacity of the
server you've got, update PostgreSQL, characterize the load, do some
basic tuning, benchmark based on a simulation of your load, get a
connection pooler in place, do some basic query pattern and plan
analysis, etc.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: How clustering for scale out works in PostgreSQL

From

Jim Nasby

Date:

12 September 2013, 19:35:52

On 8/31/13 9:44 AM, Kevin Grittner wrote:
> bsreejithin <bsreejithin@gmail.com> wrote:
>
>> What I posted is about a new setup that's going to come
>> up..Discussions are on whether to setup DB cluster to handle 1000
>> concurrent users.
>
> I previously worked for Wisconsin Courts, where we had a single
> server which handled about 3000 web users collectively generating
> hundreds of web hits per second generating thousands of queries per
> second, while at the same time functioning as a replication target
> from 80 sources sending about 20 transactions per second which
> modified data (many having a large number of DML statements per
> transaction) against a 3 TB database.  The same machine also hosted
> a transaction repository for all modifications to the database,
> indexed for audit reports and ad hoc queries; that was another 3
> TB.  Each of these was running on a 40-drive RAID.
>
> Shortly before I left we upgraded from a machine with 16 cores and
> 256 GB RAM to one with 32 cores and 512 GB RAM, because there is
> constant growth in both database size and load.  Performance was
> still good on the smaller machine, but monitoring showed we were
> approaching saturation.  We had started to see some performance
> degradation on the old machine, but were able to buy time by
> reducing the size of the web connection pool (in the Java
> application code) from 65 to 35.  Testing different connection pool
> sizes showed that pool size to be optimal for our workload on that
> machine; your ideal pool size can only be determined through
> testing.
>
> You can poke around in this application here, if you like:
> http://wcca.wicourts.gov/

Just to add another data point...

We run multiple ~2TB databases that see an average workload of ~700 transactions per second with peaks well above 4000
TPS.This is on servers with 512G of memory and varying numbers of cores. 

We probably wouldn't need such beefy hardware for this, except our IO performance (seen by the server) is pretty
pathetic,there's some flaws in the data model (that I inherited), and Rails likes to do some things that are patently
stupid.Were it not for those issues we could probably get by with 256G or even less. 

Granted, the servers we're running on cost around $30k a pop and there's a SAN behind them. But by the time you get to
thatkind of volume you should be able to afford good hardware... if not you should be rethinking your business model!
;)

If you setup some form of replication it's very easy to move to larger servers as you grow. I'm sure that when Kevin
movedtheir database it was a complete non-event. 
--
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: How clustering for scale out works in PostgreSQL

From

Kevin Grittner

Date:

12 September 2013, 21:46:28

Jim Nasby <jim@nasby.net> wrote:

> If you setup some form of replication it's very easy to move to
> larger servers as you grow. I'm sure that when Kevin moved their
> database it was a complete non-event.

Yeah, replication was turned on for the new server in addition to
the old one.  When everything was ready the web application
configuration was updates so that it started using the new server
for new requests and disconnected from the old server as requests
completed.  Zero down time.  No user-visible impact, other than
things ran a little faster because of the better hardware.

One generation of old hardware is kept in replication for running
ad hoc queries and to provide availability in case the new one
crashes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company