Re: multi billion row tables: possible or insane? - Mailing list pgsql-performance
From | Alan Stange |
---|---|
Subject | Re: multi billion row tables: possible or insane? |
Date | |
Msg-id | 42249102.2060700@rentec.com Whole thread Raw |
In response to | Re: multi billion row tables: possible or insane? ("Vig, Sandor (G/FI-2)" <Sandor.Vig@audi.hu>) |
List | pgsql-performance |
Isn't that 385 rows/second. Presumably one can insert more than one row in a transaction? -- Alan Vig, Sandor (G/FI-2) wrote: >385 transaction/sec? > >fsync = false > >risky but fast. > >-----Original Message----- >From: pgsql-performance-owner@postgresql.org >[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash >Meinel >Sent: Tuesday, March 01, 2005 4:19 PM >To: Ramon Bastiaans >Cc: pgsql-performance@postgresql.org >Subject: Re: [PERFORM] multi billion row tables: possible or insane? > > >Ramon Bastiaans wrote: > > > >>Hi all, >> >>I am doing research for a project of mine where I need to store >>several billion values for a monitoring and historical tracking system >>for a big computer system. My currect estimate is that I have to store >>(somehow) around 1 billion values each month (possibly more). >> >> >> >If you have that 1 billion perfectly distributed over all hours of the >day, then you need 1e9/30/24/3600 = 385 transactions per second. > >Which I'm pretty sure is possible with postgres, you just need pretty >beefy hardware. And like Jeff said, lots of disks for lots of IO. >Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks. >raid10 not raid5, etc. To improve query performance, you can do some >load balancing by having replication machines by using Slony. > >Or if you can do batch processing, you could split up the work into a >few update machines, which then do bulk updates on the master database. >This lets you get more machines into the job, since you can't share a >database across multiple machines. > > > >>I was wondering if anyone has had any experience with these kind of >>big numbers of data in a postgres sql database and how this affects >>database design and optimization. >> >> >> >Well, one of the biggest things is if you can get bulk updates, or if >clients can handle data being slightly out of date, so you can use >cacheing. Can you segregate your data into separate tables as much as >possible? Are your clients okay if aggregate information takes a little >while to update? > >One trick is to use semi-lazy materialized views to get your updates to >be fast. > > > >>What would be important issues when setting up a database this big, >>and is it at all doable? Or would it be a insane to think about >>storing up to 5-10 billion rows in a postgres database. >> >> > >I think you if you can design the db properly, it is doable. But if you >have a clients saying "I need up to the second information on 1 billion >rows", you're never going to get it. > > > >>The database's performance is important. There would be no use in >>storing the data if a query will take ages. Query's should be quite >>fast if possible. >> >> >> >Again, it depends on the queries being done. >There are some nice tricks you can use, like doing a month-by-month >partitioning (if you are getting 1G inserts, you might want week-by-week >partitioning), and then with a date column index, and a union all view >you should be able to get pretty good insert speed, and still keep fast >*recent* queries. Going through 1billion rows is always going to be >expensive. > > > >>I would really like to hear people's thoughts/suggestions or "go see a >>shrink, you must be mad" statements ;) >> >>Kind regards, >> >>Ramon Bastiaans >> >> > >I think it would be possible, but there are a lot of design issues with >a system like this. You can't go into it thinking that you can design a >multi billion row database the same way you would design a million row db. > >John >=:-> > > >The information transmitted is intended only for the person or entity to >which it is addressed and may contain confidential and/or privileged >material. Any review, retransmission, dissemination or other use of, or >taking of any action in reliance upon, this information by persons or >entities other than the intended recipient is prohibited. If you received >this in error, please contact the sender and delete the material from any >computer. > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
pgsql-performance by date: