Thread: Re: Data split -- Creating a copy of database without outage

Re: Data split -- Creating a copy of database without outage

From

"Kevin Grittner"

Date:

02 June 2012, 15:11:56

"Igor Shmain"  wrote:

> I need to design a solution for a database which will grow and will
> require horizontal split at some moment.

Just one more bit of "food for thought" -- we have a database with
3TB processing approximately 50 million database transactions per day
(some with a great many statements or affecting many rows) running
quite comfortably on a single machine (actually sharing that machine
with a 2TB database on a separate drive array), without partitioning.

We have done a lot of tuning.

I'm not sure what your basis is for the assumption that you will need
to split the database across machines; you might be right, but you
might be engaging in "premature optimization".

-Kevin

Re: Data split -- Creating a copy of database without outage

From

"Igor Shmain"

Date:

03 June 2012, 00:57:47

Thank you for the "food for thoughts", Kevin :-)  Would it be possible for
you to mention what hardware (cpu, ram, disks, etc.) and software your
system uses to support this db size and number of transactions?

Regarding the original question: It was not a question of potency of
postgres. The architecture I am working on is intended to be used by a web
startup. If the product is successfully, many users can start using the
service in a very short time. If that happens, there will be not time to
re-architect the database and the applications; the database will need to be
scaled almost overnight. Total number of requests per day is not a major
criterion for this system. The response time for multiple hits in a short
period of time is more important. The requirement is to serve thousands of
requests per second.

Buying a "super" computer, hoping that one day it will run at full throttle
is not for startups. Getting such a powerful computer quickly and moving the
database there is unrealistic. It makes more sense to design the system in a
way so it can be easily and quickly distributed across many relatively
inexpensive servers. That is why the sharding is needed.

If you say something like "this is just purely theoretical", "what are the
chances to get all those users", "things like that does not happen
overnight", I would totally agree. But look at it from another angle: If
only a few people use the application, the company will stay with a small
server and will not lose much. But if the service is successful, the company
will deploy a whole bunch of servers in a few hours and will be able to
serve all the users quickly :-)

It is a trade-off. More work now in exchange for having a scalable system
tomorrow (yes, yes, it is also called premature optimization :-)   And you
know what, it does not look like too much extra work now :-)

If you see real or potential problems in this logic, or heard about similar
implementations, please mention that. I would appreciate it very much.

Best wishes,
-igor

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: June-02-12 11:12 AM
To: igor.shmain@gmail.com; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without
outage

"Igor Shmain"  wrote:

> I need to design a solution for a database which will grow and will
> require horizontal split at some moment.

Just one more bit of "food for thought" -- we have a database with 3TB
processing approximately 50 million database transactions per day (some with
a great many statements or affecting many rows) running quite comfortably on
a single machine (actually sharing that machine with a 2TB database on a
separate drive array), without partitioning.

We have done a lot of tuning.

I'm not sure what your basis is for the assumption that you will need to
split the database across machines; you might be right, but you might be
engaging in "premature optimization".

-Kevin

Re: Data split -- Creating a copy of database without outage

From

"Kevin Grittner"

Date:

06 June 2012, 16:38:53

"Igor Shmain" <igor.shmain@gmail.com> wrote:

> Would it be possible for you to mention what hardware (cpu, ram,
> disks, etc.) and software your system uses to support this db size
> and number of transactions?

We have 4 Intel Xeon  X7350 @ 2.93GHz for 16 cores with 128GB RAM.
We've got a pair of drives in RAID 1 for OS on its own controller,
four drives in RAID 10 for xlog directories on its own controller,
and a couple RAID 5 arrays, each about 40 drives, for our two
databases (3TB and 2TB).  I'm not exactly clear on the controller
configuration there except that I understand there are separate
paths from two controllers to each drive.  All controllers are using
battery-backed cache configured for write-back.

A machine like that is still capable of handling our current load;
but the load is always increasing so we step up the hardware each
time we replace a machine.  The new server (able to handle about
twice the load of the one I just described for our normal
transaction mix) has 4 Intel Xeon X7560 @ 2.27GHz for 32 cores with
256GB RAM.

We are replicating to each of the databases on these boxes using a
pool of 6 database connections to process data from 72 circuit court
databases and on the 2TB from other sources, like Supreme Court and
Court of Appeals, Board of Bar Examiners, etc.  For the read-only
web load we have a pool of 30 database connections.  Checking the
monitoring system for the read-only web application, at the moment
we are showing:

Active Requests: 3
Requests Per Second: 148.66
Active Sessions: 9081

This is running through a firewall to an apache web server in our
DMZ which just redirects through another firewall to a an apache web
server which just functions as a load balancer which sends the
requests to renderers (well, currently just one, since on the latest
hardware one renderer handles the load) which runs Tomcat connecting
to our custom Java middle tier on the database server machine which
provides the connection pooling and manages each database
transaction.  Requests for "boilerplate" content are served before
it gets to this point where it would show in this monitoring; this
is just requests which require database content.  One "request"
above may run up to about 15 queries, many of which contain a large
number of joins.

While the load I show above would amount to about 13 million web
requests if it went on 24 hours per day, load does drop at night.
Last I heard, we had about 5 million requests per day, but that was
a couple years ago and it seems to grow pretty steadily.

Last I checked, the replication consisted of about two million
database transactions per day, many of which have dozens (or
hundreds) of statements modifying data.  When idle time is detected
on a replication source, it is used to compare source data to
target, apply fixes to the target, and log the fixes for review.
(These are infrequent, but I'm not comfortable running multi-master
replication without such automated review.)

> Buying a "super" computer, hoping that one day it will run at full
> throttle is not for startups. Getting such a powerful computer
> quickly and moving the database there is unrealistic. It makes
> more sense to design the system in a way so it can be easily and
> quickly distributed across many relatively inexpensive servers.
> That is why the sharding is needed.

I understand the scaling need, and certainly don't want to discount
that.  Cloud resources might be an alternative to sharding in that
fashion, at least to a point.

Before we moved to PostgreSQL we were using a commercial database
which could not keep up with demand using just one box, so we load
balanced between identical servers.  Since the replication is
asynchronous and we didn't want people potentially jumping around in
time, we used session affinity from the renderers to particular
database servers to keep a consistent timeline for each user
session.  This sort of approach is a viable alternative to sharding
in some cases.

I hope that helps.

-Kevin

Re: Data split -- Creating a copy of database without outage

From

"Igor Shmain"

Date:

09 June 2012, 02:21:55

Thank you, Kevin, for this detailed info. It was very helpful.

Best wishes,
-igor

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: June-06-12 12:39 PM
To: Igor Shmain; pgsql-admin@postgresql.org
Subject: RE: [ADMIN] Data split -- Creating a copy of database without
outage

"Igor Shmain" <igor.shmain@gmail.com> wrote:

> Would it be possible for you to mention what hardware (cpu, ram,
> disks, etc.) and software your system uses to support this db size and
> number of transactions?

We have 4 Intel Xeon  X7350 @ 2.93GHz for 16 cores with 128GB RAM.
We've got a pair of drives in RAID 1 for OS on its own controller, four
drives in RAID 10 for xlog directories on its own controller, and a couple
RAID 5 arrays, each about 40 drives, for our two databases (3TB and 2TB).
I'm not exactly clear on the controller configuration there except that I
understand there are separate paths from two controllers to each drive.  All
controllers are using battery-backed cache configured for write-back.

A machine like that is still capable of handling our current load; but the
load is always increasing so we step up the hardware each time we replace a
machine.  The new server (able to handle about twice the load of the one I
just described for our normal transaction mix) has 4 Intel Xeon X7560 @
2.27GHz for 32 cores with 256GB RAM.

We are replicating to each of the databases on these boxes using a pool of 6
database connections to process data from 72 circuit court databases and on
the 2TB from other sources, like Supreme Court and Court of Appeals, Board
of Bar Examiners, etc.  For the read-only web load we have a pool of 30
database connections.  Checking the monitoring system for the read-only web
application, at the moment we are showing:

Active Requests: 3
Requests Per Second: 148.66
Active Sessions: 9081

This is running through a firewall to an apache web server in our DMZ which
just redirects through another firewall to a an apache web server which just
functions as a load balancer which sends the requests to renderers (well,
currently just one, since on the latest hardware one renderer handles the
load) which runs Tomcat connecting to our custom Java middle tier on the
database server machine which provides the connection pooling and manages
each database transaction.  Requests for "boilerplate" content are served
before it gets to this point where it would show in this monitoring; this is
just requests which require database content.  One "request"
above may run up to about 15 queries, many of which contain a large number
of joins.

While the load I show above would amount to about 13 million web requests if
it went on 24 hours per day, load does drop at night.
Last I heard, we had about 5 million requests per day, but that was a couple
years ago and it seems to grow pretty steadily.

Last I checked, the replication consisted of about two million database
transactions per day, many of which have dozens (or
hundreds) of statements modifying data.  When idle time is detected on a
replication source, it is used to compare source data to target, apply fixes
to the target, and log the fixes for review.
(These are infrequent, but I'm not comfortable running multi-master
replication without such automated review.)

> Buying a "super" computer, hoping that one day it will run at full
> throttle is not for startups. Getting such a powerful computer quickly
> and moving the database there is unrealistic. It makes more sense to
> design the system in a way so it can be easily and quickly distributed
> across many relatively inexpensive servers.
> That is why the sharding is needed.

I understand the scaling need, and certainly don't want to discount that.
Cloud resources might be an alternative to sharding in that fashion, at
least to a point.

Before we moved to PostgreSQL we were using a commercial database which
could not keep up with demand using just one box, so we load balanced
between identical servers.  Since the replication is asynchronous and we
didn't want people potentially jumping around in time, we used session
affinity from the renderers to particular database servers to keep a
consistent timeline for each user session.  This sort of approach is a
viable alternative to sharding in some cases.

I hope that helps.

-Kevin