Re: multi billion row tables: possible or insane? - Mailing list pgsql-performance
From | Alex Turner |
---|---|
Subject | Re: multi billion row tables: possible or insane? |
Date | |
Msg-id | 33c6269f05030416151b826a9c@mail.gmail.com Whole thread Raw |
In response to | Re: multi billion row tables: possible or insane? ("Vig, Sandor (G/FI-2)" <Sandor.Vig@audi.hu>) |
List | pgsql-performance |
Not true - with fsync on I get nearly 500 tx/s, with it off I'm as high as 1600/sec with dual opteron and 14xSATA drives and 4GB RAM on a 3ware Escalade. Database has 3 million rows. As long as queries use indexes, multi billion row shouldn't be too bad. Full table scan will suck though. Alex Turner netEconomist On Tue, 1 Mar 2005 16:40:29 +0100, Vig, Sandor (G/FI-2) <Sandor.Vig@audi.hu> wrote: > 385 transaction/sec? > > fsync = false > > risky but fast. > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash > Meinel > Sent: Tuesday, March 01, 2005 4:19 PM > To: Ramon Bastiaans > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] multi billion row tables: possible or insane? > > Ramon Bastiaans wrote: > > > Hi all, > > > > I am doing research for a project of mine where I need to store > > several billion values for a monitoring and historical tracking system > > for a big computer system. My currect estimate is that I have to store > > (somehow) around 1 billion values each month (possibly more). > > > If you have that 1 billion perfectly distributed over all hours of the > day, then you need 1e9/30/24/3600 = 385 transactions per second. > > Which I'm pretty sure is possible with postgres, you just need pretty > beefy hardware. And like Jeff said, lots of disks for lots of IO. > Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks. > raid10 not raid5, etc. To improve query performance, you can do some > load balancing by having replication machines by using Slony. > > Or if you can do batch processing, you could split up the work into a > few update machines, which then do bulk updates on the master database. > This lets you get more machines into the job, since you can't share a > database across multiple machines. > > > I was wondering if anyone has had any experience with these kind of > > big numbers of data in a postgres sql database and how this affects > > database design and optimization. > > > Well, one of the biggest things is if you can get bulk updates, or if > clients can handle data being slightly out of date, so you can use > cacheing. Can you segregate your data into separate tables as much as > possible? Are your clients okay if aggregate information takes a little > while to update? > > One trick is to use semi-lazy materialized views to get your updates to > be fast. > > > What would be important issues when setting up a database this big, > > and is it at all doable? Or would it be a insane to think about > > storing up to 5-10 billion rows in a postgres database. > > I think you if you can design the db properly, it is doable. But if you > have a clients saying "I need up to the second information on 1 billion > rows", you're never going to get it. > > > > > The database's performance is important. There would be no use in > > storing the data if a query will take ages. Query's should be quite > > fast if possible. > > > Again, it depends on the queries being done. > There are some nice tricks you can use, like doing a month-by-month > partitioning (if you are getting 1G inserts, you might want week-by-week > partitioning), and then with a date column index, and a union all view > you should be able to get pretty good insert speed, and still keep fast > *recent* queries. Going through 1billion rows is always going to be > expensive. > > > I would really like to hear people's thoughts/suggestions or "go see a > > shrink, you must be mad" statements ;) > > > > Kind regards, > > > > Ramon Bastiaans > > I think it would be possible, but there are a lot of design issues with > a system like this. You can't go into it thinking that you can design a > multi billion row database the same way you would design a million row db. > > John > =:-> > > The information transmitted is intended only for the person or entity to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipient is prohibited. If you received > this in error, please contact the sender and delete the material from any > computer. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
pgsql-performance by date: