Thread: scaling postgres

scaling postgres

From
Chris Withers
Date:
Hi All,

I'm curious if there are recommendations for scaling postgres to what,
for me, seems like "a lot" of data...

The app in question currently writes around 1.5 billion rows into a
table before rolling them up into tables that have a few million roll up
rows each. That 1.5 billion row table is emptied and refilled each day,
so we're talking about quite high write as well as quite high read.
Where can I find could examples/docs of how to scale postgres for this
kind of data load? What sort of hardware would I be looking to spec?

Okay, now this app may well eventually want to progress to storing those
1.5 billion rows per day. Is that feasible with postgres? If not, what
storage and processing solutions would people recommend for that kind of
data load?

cheers,

Chris


Re: scaling postgres

From
Jony Cohen
Date:
Hi,
I'd recommend looking into 2 solutions here, depending if you want to keep the data or not and how "fault tolerant" is the app (could you loose data from the last hour?)
If you could loose some data and you don't intend keeping the whole data set - I'd look at using RAM as your storage - it's the fastest place to store data and you can easily get servers with lot's of RAM these days.
If you can't loose data then you'll need to use disks - depending on how big each row is, compute your expected read/write throughput and go shopping :) 
for 1kb rows you get 60GB per hour = 16MB per sec - simple disks can handle this.
for 10kb rows you get 160MB/s - now you need better disks :)

SSD disks are cheep these days but they don't like repeated writes/deletes so it might cause problems down the line (hence my first RAM recommendation)

as for keeping the raw data - you could easily do it if you use partitions, if you have daily partitions inheriting from a master table you can quickly access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to manage writing to the right partition yourself (not that hard - simply insert to <tablename>_<date>)

if you can write the data daily, keeping it will not add any real load (a little on parse times for queries that access the master table)

Just my 2 cents,
Regards,
 - Jony


On Mon, Aug 3, 2015 at 9:53 AM, Chris Withers <chris@simplistix.co.uk> wrote:
Hi All,

I'm curious if there are recommendations for scaling postgres to what, for me, seems like "a lot" of data...

The app in question currently writes around 1.5 billion rows into a table before rolling them up into tables that have a few million roll up rows each. That 1.5 billion row table is emptied and refilled each day, so we're talking about quite high write as well as quite high read. Where can I find could examples/docs of how to scale postgres for this kind of data load? What sort of hardware would I be looking to spec?

Okay, now this app may well eventually want to progress to storing those 1.5 billion rows per day. Is that feasible with postgres? If not, what storage and processing solutions would people recommend for that kind of data load?

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: scaling postgres

From
Chris Withers
Date:
On 03/08/2015 08:15, Jony Cohen wrote:
> SSD disks are cheep these days but they don't like repeated
> writes/deletes so it might cause problems down the line (hence my
> first RAM recommendation)
>
> as for keeping the raw data - you could easily do it if you use
> partitions, if you have daily partitions inheriting from a master
> table you can quickly access the last day (or even several days)
> but do take note that a full table scan takes time and you'll need to
> manage writing to the right partition yourself (not that hard - simply
> insert to <tablename>_<date>)
>
> if you can write the data daily, keeping it will not add any real load
> (a little on parse times for queries that access the master table)

Interesting, you seem a lot less fussed by these numbers than I am,
which is good to hear!

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one machine?
What are the options then?

cheers,

Chris



Re: scaling postgres

From
Seref Arikan
Date:


On Mon, Aug 3, 2015 at 8:22 AM, Chris Withers <chris@simplistix.co.uk> wrote:
On 03/08/2015 08:15, Jony Cohen wrote:
SSD disks are cheep these days but they don't like repeated writes/deletes so it might cause problems down the line (hence my first RAM recommendation)

as for keeping the raw data - you could easily do it if you use partitions, if you have daily partitions inheriting from a master table you can quickly access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to manage writing to the right partition yourself (not that hard - simply insert to <tablename>_<date>)

if you can write the data daily, keeping it will not add any real load (a little on parse times for queries that access the master table)

Interesting, you seem a lot less fussed by these numbers than I am, which is good to hear!

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one machine? What are the options then?

I think it is hard to come up with blanket responses to generic questions such as  "What happens when the computational load no longer fits on one machine?"
I'd say consider a scaling strategy that may be able to make use of your domain model first, if that is possible of course. 

I work in healthcare and patient centric records let me consider multiple servers for lots and lots of patients. The engineering team from instagram has been sharing their experience with postgres, which is possible due to their domain model. 

So my humble suggestion is: start from the simplest scenario, ask yourself if you can use multiple, independent postgres installations, if your domain model and its data model allow you to do this. If not, you may still have other options, but it all depends on your access patterns etc. 

All the best
Seref



cheers,

Chris



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: scaling postgres - can child tables be in a different tablespace?

From
Chris Withers
Date:
On 03/08/2015 08:40, Jony Cohen wrote:
> Servers now days reach very impressive write speeds and at rather low
> prices - it's simpler to split the write to 2 tables on different
> tablespaces/devices than 2 servers.
This raises an interesting question: can a child table be in a different
tablespace to its parent and other children of that parent?

cheers,

Chris



Re: scaling postgres

From
Chris Withers
Date:
On 03/08/2015 08:34, Seref Arikan wrote:
>
>
>     At what point does postgres stop scaling?
>     What happens when the computational load no longer fits on one
>     machine? What are the options then?
>
>
> I think it is hard to come up with blanket responses to generic
> questions such as  "What happens when the computational load no longer
> fits on one machine?"

Of course, I guess I'm looking out for the pain points that people
commonly hit with chunky postgres installs...

> I work in healthcare and patient centric records let me consider
> multiple servers for lots and lots of patients. The engineering team
> from instagram has been sharing their experience with postgres, which is
> possible due to their domain model.

I'll have a google, but do you have the Instagram links you're thinking of?

cheers,

Chris


Re: scaling postgres

From
Seref Arikan
Date:


On Tue, Aug 4, 2015 at 9:06 AM, Chris Withers <chris@simplistix.co.uk> wrote:
On 03/08/2015 08:34, Seref Arikan wrote:


    At what point does postgres stop scaling?
    What happens when the computational load no longer fits on one
    machine? What are the options then?


I think it is hard to come up with blanket responses to generic
questions such as  "What happens when the computational load no longer
fits on one machine?"

Of course, I guess I'm looking out for the pain points that people commonly hit with chunky postgres installs...

I work in healthcare and patient centric records let me consider
multiple servers for lots and lots of patients. The engineering team
from instagram has been sharing their experience with postgres, which is
possible due to their domain model.

I'll have a google, but do you have the Instagram links you're thinking of?


Nope, sorry, Google is your friend :)
 
cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: scaling postgres

From
Chris Withers
Date:
On 04/08/2015 09:11, Seref Arikan wrote:


I work in healthcare and patient centric records let me consider
multiple servers for lots and lots of patients. The engineering team
from instagram has been sharing their experience with postgres, which is
possible due to their domain model.

I'll have a google, but do you have the Instagram links you're thinking of?


Nope, sorry, Google is your friend :)

For the benefit of the archives:

http://instagram-engineering.tumblr.com/post/40781627982/handling-growth-with-postgres-5-tips-from

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

cheers,

Chris

Re: scaling postgres - can child tables be in a different tablespace?

From
Albe Laurenz
Date:
Chris Withers wrote:
> This raises an interesting question: can a child table be in a different
> tablespace to its parent and other children of that parent?

Yes.

Inheritance is a logical concept and is independent of physical placement.

Yours,
Laurenz Albe

Re: scaling postgres - can child tables be in a different tablespace?

From
Melvin Davidson
Date:
As additional advice, to get the best performance, you will want all your tablespaces to be on separate spindles/disks.
EG: disk1/tblspc1
        disk2/tblspc2
        disk3/tblspc3
        ...
        disk99/tblspc99

On Tue, Aug 4, 2015 at 5:31 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Chris Withers wrote:
> This raises an interesting question: can a child table be in a different
> tablespace to its parent and other children of that parent?

Yes.

Inheritance is a logical concept and is independent of physical placement.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: scaling postgres - can child tables be in a different tablespace?

From
John R Pierce
Date:
On 8/4/2015 6:14 AM, Melvin Davidson wrote:
> As additional advice, to get the best performance, you will want all
> your tablespaces to be on separate spindles/disks.
> EG: disk1/tblspc1
>         disk2/tblspc2
>         disk3/tblspc3
>         ...
>         disk99/tblspc99


actually, I find to get best performance most often, stripe all the
disks together and put everything on the same big volume, that way all
IO is evenly distributed.   otherwise you'll find too much IO on some
tables/partitions, and not enoguh on others, so most of the disks are
idle most of the time.


--
john r pierce, recycling bits in santa cruz