Re: multi billion row tables: possible or insane? - Mailing list pgsql-performance

From Vig, Sandor (G/FI-2)
Subject Re: multi billion row tables: possible or insane?
Date
Msg-id 977921B17B2F2048AA5AAE9B4CBB713E01423D75@huaudigs0035.audi.de
Whole thread Raw
In response to multi billion row tables: possible or insane?  (Ramon Bastiaans <bastiaans@sara.nl>)
Responses Re: multi billion row tables: possible or insane?
Re: multi billion row tables: possible or insane?
Re: multi billion row tables: possible or insane?
List pgsql-performance
385 transaction/sec?

fsync = false

risky but fast.

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash
Meinel
Sent: Tuesday, March 01, 2005 4:19 PM
To: Ramon Bastiaans
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] multi billion row tables: possible or insane?


Ramon Bastiaans wrote:

> Hi all,
>
> I am doing research for a project of mine where I need to store
> several billion values for a monitoring and historical tracking system
> for a big computer system. My currect estimate is that I have to store
> (somehow) around 1 billion values each month (possibly more).
>
If you have that 1 billion perfectly distributed over all hours of the
day, then you need 1e9/30/24/3600 = 385 transactions per second.

Which I'm pretty sure is possible with postgres, you just need pretty
beefy hardware. And like Jeff said, lots of disks for lots of IO.
Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
raid10 not raid5, etc. To improve query performance, you can do some
load balancing by having replication machines by using Slony.

Or if you can do batch processing, you could split up the work into a
few update machines, which then do bulk updates on the master database.
This lets you get more machines into the job, since you can't share a
database across multiple machines.

> I was wondering if anyone has had any experience with these kind of
> big numbers of data in a postgres sql database and how this affects
> database design and optimization.
>
Well, one of the biggest things is if you can get bulk updates, or if
clients can handle data being slightly out of date, so you can use
cacheing. Can you segregate your data into separate tables as much as
possible? Are your clients okay if aggregate information takes a little
while to update?

One trick is to use semi-lazy materialized views to get your updates to
be fast.

> What would be important issues when setting up a database this big,
> and is it at all doable? Or would it be a insane to think about
> storing up to 5-10 billion rows in a postgres database.

I think you if you can design the db properly, it is doable. But if you
have a clients saying "I need up to the second information on 1 billion
rows", you're never going to get it.

>
> The database's performance is important. There would be no use in
> storing the data if a query will take ages. Query's should be quite
> fast if possible.
>
Again, it depends on the queries being done.
There are some nice tricks you can use, like doing a month-by-month
partitioning (if you are getting 1G inserts, you might want week-by-week
partitioning), and then with a date column index, and a union all view
you should be able to get pretty good insert speed, and still keep fast
*recent* queries. Going through 1billion rows is always going to be
expensive.

> I would really like to hear people's thoughts/suggestions or "go see a
> shrink, you must be mad" statements ;)
>
> Kind regards,
>
> Ramon Bastiaans

I think it would be possible, but there are a lot of design issues with
a system like this. You can't go into it thinking that you can design a
multi billion row database the same way you would design a million row db.

John
=:->


The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

pgsql-performance by date:

Previous
From: John Arbash Meinel
Date:
Subject: Re: multi billion row tables: possible or insane?
Next
From: John Arbash Meinel
Date:
Subject: Re: multi billion row tables: possible or insane?