Re: Large tables (was: RAID 0 not as fast as expected) - Mailing list pgsql-performance
From | Alex Turner |
---|---|
Subject | Re: Large tables (was: RAID 0 not as fast as expected) |
Date | |
Msg-id | 33c6269f0609181614h32d2d379k99b313530ce2ab0f@mail.gmail.com Whole thread Raw |
In response to | Re: Large tables (was: RAID 0 not as fast as expected) ("Bucky Jordan" <bjordan@lumeta.com>) |
Responses |
Re: Large tables (was: RAID 0 not as fast as expected)
Re: Large tables (was: RAID 0 not as fast as |
List | pgsql-performance |
Do the basic math:
If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). If you have to do a full table scan, then it will take roughly 400 seconds with a single 10k RPM SCSI drive with an average read speed of 50MB/sec. If you are going to read indexes, figure out how big your index is, and how many blocks will be returned, and figure out how many blocks this will require transferring from the main table, make an estimate of the seeks, add in the transfer total, and you have a time to get your data. A big array with a good controller can pass 1000MB/sec transfer on the right bus if you buy the write technologies. But be warned, if you buy the wrong ones, your big array can end up being slower than a single drive for sequential transfer. At 1000MB/sec your scan would take 20 seconds.
Be warned, the tech specs page:
http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and write operations. If you are doing mostly Read, then RAID 5 is passable, but it's redundancy with large numbers of drives is not so great.
Alex.
If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). If you have to do a full table scan, then it will take roughly 400 seconds with a single 10k RPM SCSI drive with an average read speed of 50MB/sec. If you are going to read indexes, figure out how big your index is, and how many blocks will be returned, and figure out how many blocks this will require transferring from the main table, make an estimate of the seeks, add in the transfer total, and you have a time to get your data. A big array with a good controller can pass 1000MB/sec transfer on the right bus if you buy the write technologies. But be warned, if you buy the wrong ones, your big array can end up being slower than a single drive for sequential transfer. At 1000MB/sec your scan would take 20 seconds.
Be warned, the tech specs page:
http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and write operations. If you are doing mostly Read, then RAID 5 is passable, but it's redundancy with large numbers of drives is not so great.
Alex.
On 9/18/06, Bucky Jordan <bjordan@lumeta.com> wrote:
> good normalization skills are really important for large databases,
> along with materialization strategies for 'denormalized sets'.
Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I've
only found a smattering of references on google. My guess is, you roll
your own for optimal performance...
> regarding the number of rows, there is no limit to how much pg can
> handle per se, just some practical limitations, especially vacuum and
> reindex times. these are important because they are required to keep
> a handle on mvcc bloat and its very nice to be able to vaccum bits of
> your database at a time.
I was hoping for some actual numbers on "practical". Hardware isn't too
much of an issue (within reason- we're not talking an amazon or google
here... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'd
just rather not have to re-design the database. Say the requirement is
to keep 12 months of data accessible, each "scan" produces 100M records,
and I run one per month. What happens if the customer wants to run it
once a week? I was more trying to figure out at what point (ballpark)
I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, but
not so easy to redesign/add history tables...
>
> just another fyi, if you have a really big database, you can forget
> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of
> replication/failover strategy. i would start with pitr.
>
> merlin
I was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.
Thanks for the pointers though...
- Bucky
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
pgsql-performance by date: