Re: multi billion row tables: possible or insane? - Mailing list pgsql-performance

From Alex Turner
Subject Re: multi billion row tables: possible or insane?
Date
Msg-id 33c6269f05030416151b826a9c@mail.gmail.com
Whole thread Raw
In response to Re: multi billion row tables: possible or insane?  ("Vig, Sandor (G/FI-2)" <Sandor.Vig@audi.hu>)
List pgsql-performance
Not true - with fsync on I get nearly 500 tx/s, with it off I'm as
high as 1600/sec with dual opteron and 14xSATA drives and 4GB RAM on a
3ware Escalade.  Database has 3 million rows.

As long as queries use indexes, multi billion row shouldn't be too
bad.  Full table scan will suck though.

Alex Turner
netEconomist


On Tue, 1 Mar 2005 16:40:29 +0100, Vig, Sandor (G/FI-2)
<Sandor.Vig@audi.hu> wrote:
> 385 transaction/sec?
>
> fsync = false
>
> risky but fast.
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash
> Meinel
> Sent: Tuesday, March 01, 2005 4:19 PM
> To: Ramon Bastiaans
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] multi billion row tables: possible or insane?
>
> Ramon Bastiaans wrote:
>
> > Hi all,
> >
> > I am doing research for a project of mine where I need to store
> > several billion values for a monitoring and historical tracking system
> > for a big computer system. My currect estimate is that I have to store
> > (somehow) around 1 billion values each month (possibly more).
> >
> If you have that 1 billion perfectly distributed over all hours of the
> day, then you need 1e9/30/24/3600 = 385 transactions per second.
>
> Which I'm pretty sure is possible with postgres, you just need pretty
> beefy hardware. And like Jeff said, lots of disks for lots of IO.
> Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
> raid10 not raid5, etc. To improve query performance, you can do some
> load balancing by having replication machines by using Slony.
>
> Or if you can do batch processing, you could split up the work into a
> few update machines, which then do bulk updates on the master database.
> This lets you get more machines into the job, since you can't share a
> database across multiple machines.
>
> > I was wondering if anyone has had any experience with these kind of
> > big numbers of data in a postgres sql database and how this affects
> > database design and optimization.
> >
> Well, one of the biggest things is if you can get bulk updates, or if
> clients can handle data being slightly out of date, so you can use
> cacheing. Can you segregate your data into separate tables as much as
> possible? Are your clients okay if aggregate information takes a little
> while to update?
>
> One trick is to use semi-lazy materialized views to get your updates to
> be fast.
>
> > What would be important issues when setting up a database this big,
> > and is it at all doable? Or would it be a insane to think about
> > storing up to 5-10 billion rows in a postgres database.
>
> I think you if you can design the db properly, it is doable. But if you
> have a clients saying "I need up to the second information on 1 billion
> rows", you're never going to get it.
>
> >
> > The database's performance is important. There would be no use in
> > storing the data if a query will take ages. Query's should be quite
> > fast if possible.
> >
> Again, it depends on the queries being done.
> There are some nice tricks you can use, like doing a month-by-month
> partitioning (if you are getting 1G inserts, you might want week-by-week
> partitioning), and then with a date column index, and a union all view
> you should be able to get pretty good insert speed, and still keep fast
> *recent* queries. Going through 1billion rows is always going to be
> expensive.
>
> > I would really like to hear people's thoughts/suggestions or "go see a
> > shrink, you must be mad" statements ;)
> >
> > Kind regards,
> >
> > Ramon Bastiaans
>
> I think it would be possible, but there are a lot of design issues with
> a system like this. You can't go into it thinking that you can design a
> multi billion row database the same way you would design a million row db.
>
> John
> =:->
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: multi billion row tables: possible or insane?
Next
From: Tom Lane
Date:
Subject: Re: MAIN vs. PLAIN