Re: multi billion row tables: possible or insane? - Mailing list pgsql-performance

From Alan Stange
Subject Re: multi billion row tables: possible or insane?
Date
Msg-id 42249102.2060700@rentec.com
Whole thread Raw
In response to Re: multi billion row tables: possible or insane?  ("Vig, Sandor (G/FI-2)" <Sandor.Vig@audi.hu>)
List pgsql-performance
Isn't that 385 rows/second.   Presumably one can insert more than one
row in a transaction?

-- Alan

Vig, Sandor (G/FI-2) wrote:

>385 transaction/sec?
>
>fsync = false
>
>risky but fast.
>
>-----Original Message-----
>From: pgsql-performance-owner@postgresql.org
>[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash
>Meinel
>Sent: Tuesday, March 01, 2005 4:19 PM
>To: Ramon Bastiaans
>Cc: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] multi billion row tables: possible or insane?
>
>
>Ramon Bastiaans wrote:
>
>
>
>>Hi all,
>>
>>I am doing research for a project of mine where I need to store
>>several billion values for a monitoring and historical tracking system
>>for a big computer system. My currect estimate is that I have to store
>>(somehow) around 1 billion values each month (possibly more).
>>
>>
>>
>If you have that 1 billion perfectly distributed over all hours of the
>day, then you need 1e9/30/24/3600 = 385 transactions per second.
>
>Which I'm pretty sure is possible with postgres, you just need pretty
>beefy hardware. And like Jeff said, lots of disks for lots of IO.
>Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
>raid10 not raid5, etc. To improve query performance, you can do some
>load balancing by having replication machines by using Slony.
>
>Or if you can do batch processing, you could split up the work into a
>few update machines, which then do bulk updates on the master database.
>This lets you get more machines into the job, since you can't share a
>database across multiple machines.
>
>
>
>>I was wondering if anyone has had any experience with these kind of
>>big numbers of data in a postgres sql database and how this affects
>>database design and optimization.
>>
>>
>>
>Well, one of the biggest things is if you can get bulk updates, or if
>clients can handle data being slightly out of date, so you can use
>cacheing. Can you segregate your data into separate tables as much as
>possible? Are your clients okay if aggregate information takes a little
>while to update?
>
>One trick is to use semi-lazy materialized views to get your updates to
>be fast.
>
>
>
>>What would be important issues when setting up a database this big,
>>and is it at all doable? Or would it be a insane to think about
>>storing up to 5-10 billion rows in a postgres database.
>>
>>
>
>I think you if you can design the db properly, it is doable. But if you
>have a clients saying "I need up to the second information on 1 billion
>rows", you're never going to get it.
>
>
>
>>The database's performance is important. There would be no use in
>>storing the data if a query will take ages. Query's should be quite
>>fast if possible.
>>
>>
>>
>Again, it depends on the queries being done.
>There are some nice tricks you can use, like doing a month-by-month
>partitioning (if you are getting 1G inserts, you might want week-by-week
>partitioning), and then with a date column index, and a union all view
>you should be able to get pretty good insert speed, and still keep fast
>*recent* queries. Going through 1billion rows is always going to be
>expensive.
>
>
>
>>I would really like to hear people's thoughts/suggestions or "go see a
>>shrink, you must be mad" statements ;)
>>
>>Kind regards,
>>
>>Ramon Bastiaans
>>
>>
>
>I think it would be possible, but there are a lot of design issues with
>a system like this. You can't go into it thinking that you can design a
>multi billion row database the same way you would design a million row db.
>
>John
>=:->
>
>
>The information transmitted is intended only for the person or entity to
>which it is addressed and may contain confidential and/or privileged
>material. Any review, retransmission, dissemination or other use of, or
>taking of any action in reliance upon, this information by persons or
>entities other than the intended recipient is prohibited. If you received
>this in error, please contact the sender and delete the material from any
>computer.
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>


pgsql-performance by date:

Previous
From: Andras Kadinger
Date:
Subject: Re: multi billion row tables: possible or insane?
Next
From: Sven Willenberger
Date:
Subject: Re: Inheritence versus delete from