Thread: Fast Inserts and Hardware Questions
I am specing out a database for my company and I need some advice. The database in mind will consist of one table, with about 300 bytes per record. The table will be getting two or three million inserts a week and it would be nice if it could handle a sustained 30 to 50 a second. The table will have (ACK!) 4 indexes (if anyone can explain to me how I can get away with less please let me know) The indexes will be int8 (primary key) int4 (group number) timestamp (creation date) int4 (customer id) The customers want to be able to query their data whenever and have it be snappy. So here is my question: * Is there an OS that is best suited for postgres. All things being equal I would like to run this on Linux. * Is there an architecture best suited for this. Should I shell out the extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? Quad Alpha or Quad UltraSparc? * Since most of what I am doing is inserts I will assume that the disk will be my bottleneck over anything else. Is this correct? * Will the 7.1 WAL save me, when it comes to insert times? * I read something about disabling fsync() to speed up inserts. How much will this speed things up? I would consider it as I COULD rebuild lost data from my logs in the event of a system crash and one night a year of panic on my part is worth saving $100,000 in drive arrays. Oh, and if any of you SQL guru's are still reading I'll show you the queries I will be running to see if I can really ditch an index. select * from table where customer_id = ? and primary_key = ?::int8 select * from table where customer_id = ? and group_number = ? select * from table where customer_id = ? and creation > ? and creation < ? Thanks for all your help, Orion Henry CTO TrustCommerce orion@trustcommerce.com
Can someone tell me what I am missing here... Thanks Michael Fork - CCNA - MCP - A+ Network Support - Toledo Internet Access - Toledo Ohio radius=# BEGIN; BEGIN radius=# select age(tstamp, now()) from radacct limit 1; age -------------- 03:37:08 ago (1 row) radius=# select age(now(), tstamp) from radacct limit 1; age ---------- 03:37:08 (1 row) radius=# SELECT count(*) FROM radacct WHERE age(now(), tstamp) > '6 months'::interval; count -------- 128378 (1 row) -- Up until this points everything makes sense, however what follows -- does not radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6 months ago'::interval; count --------- 1988641 (1 row) -- Shouldn't this be equal to the previous query (flipped the arguments -- and added 'ago' radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6 months'::interval; count ------- 0 (1 row) -- Since the previous query didn't work, I thought that this would return -- the right number of rows, but I was wrong
Orion Henry writes: > The indexes will be > int8 (primary key) > int4 (group number) > timestamp (creation date) > int4 (customer id) Since one query can only use one index per table, you should only need the customer_id index, given the queries you listed. > * Is there an OS that is best suited for postgres. All things being > equal I would like to run this on Linux. > * Is there an architecture best suited for this. Should I shell out the > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > Quad Alpha or Quad UltraSparc? > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? Probably. The file system is also going to be relevant. Linux' ext2fs is not the best possible choice here. The CPU is probably going to be the least of your problems. > * Will the 7.1 WAL save me, when it comes to insert times? To be expected. > * I read something about disabling fsync() to speed up inserts. How > much will this speed things up? I would consider it as I COULD rebuild > lost data from my logs in the event of a system crash and one night a > year of panic on my part is worth saving $100,000 in drive arrays. Disabling fsync can be considered obsolete with 7.1 WAL. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
On Wed, 14 Mar 2001, Michael Fork wrote: > radius=# SELECT count(*) FROM radacct WHERE age(now(), tstamp) > '6 > months'::interval; > count > -------- > 128378 > (1 row) > > -- Up until this points everything makes sense, however what follows > -- does not > > radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6 months > ago'::interval; > count > --------- > 1988641 > (1 row) > > -- Shouldn't this be equal to the previous query (flipped the arguments > -- and added 'ago' I'd guess that since 6 months ago is effectively a negative interval, wouldn't you want to be comparing <'6 months ago'::interval for the same effect? (a-b>c -> b-a<-c or something)
Peter Eisentraut wrote: > > Orion Henry writes: > > > The indexes will be > > int8 (primary key) > > int4 (group number) > > timestamp (creation date) > > int4 (customer id) > > Since one query can only use one index per table, you should only need the > customer_id index, given the queries you listed. Isn't that was multi-column indexes are for? I have 100's of customers but millions of database entries. If I have to I'll make a table for each customer, big_table_5 and big_table_6 etc... but this seems to be a terrible hack and the database should be able to do things like this for me. Orion Henry CTO TrustCommerce orion@trustcommerce.com
On 3/14/01, 7:08:48 PM, Orion Henry <orion@trustcommerce.com> wrote regarding [GENERAL] Fast Inserts and Hardware Questions: Please bear in mind that you are operating well above anything I do, so I'm not speaking from experience. > I am specing out a database for my company and I need some advice. The > database in mind will consist of one table, with about 300 bytes per > record. The table will be getting two or three million inserts a week > and it would be nice if it could handle a sustained 30 to 50 a second. > The table will have (ACK!) 4 indexes (if anyone can explain to me how I > can get away with less please let me know) Three million inserts a week is about 5 per second, so you want to sustain 10 times the average rate (sounds reasonable). Have you considered separating inserts from reads? The inserts wouldn't even need to go into a database initially, just log them to a file and feed them in at a steady 10 per second (allows for downtime). > The indexes will be > int8 (primary key) > int4 (group number) > timestamp (creation date) > int4 (customer id) > The customers want to be able to query their data whenever and have it > be snappy. And they don't see why they need to pay so much for something so simple, either ;-) > So here is my question: > * Is there an OS that is best suited for postgres. All things being > equal I would like to run this on Linux. Can't comment - I've only used PG on Linux. People who use the various xxxBSDs swear by them (rather than at them) and Solaris has some excellent support (at a cost) Of course, you could try MS-Access on Win-ME ;-) > * Is there an architecture best suited for this. Should I shell out the > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > Quad Alpha or Quad UltraSparc? Again heresay, but Sun's hardware has excellent I/O throughput and it's difficult to get people to abandon it in favour of Intel once they've tried it. > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? Yes. > * Will the 7.1 WAL save me, when it comes to insert times? > * I read something about disabling fsync() to speed up inserts. How > much will this speed things up? I would consider it as I COULD rebuild > lost data from my logs in the event of a system crash and one night a > year of panic on my part is worth saving $100,000 in drive arrays. Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the WAL stuff means you can do so safely. Basically PG should rebuild itself from the WAL file in the event of system failure. Have to admit I've not tried it myself yet. > Oh, and if any of you SQL guru's are still reading I'll show you the > queries I will be running to see if I can really ditch an index. > select * from table where customer_id = ? and primary_key = ?::int8 Well if primary_key is one, you don't need customer_id. > select * from table where customer_id = ? and group_number = ? > select * from table where customer_id = ? and creation > ? and creation > < ? How many customers do you have? Are the customers going to want a specific time period? If so, you could have one table per week (say) and use a view to conceal the fact for the occasional query that crosses week boundaries. If these are actually going to by GROUP BY queries totalling figures, perhaps try calculating totals beforehand. If these are representative of your main queries, I'd be tempted to buy 10 cheap machines (+ a spare) and split the customers among the machines. Proxy this stuff at the application level and they'll never know. For your management stats you'd have to write a script to summarise stuff from several machines, but that shouldn't be too difficult. The spare machine can be over-specced and have the data from all the other ready for a hot-swapover. The beauty of multiple machines is it should scale well (so long as you are sufficiently fascist about keeping the configs the same). > Thanks for all your help, > Orion Henry Hope it was - Richard Huxton
Orion Henry writes: > > Since one query can only use one index per table, you should only need the > > customer_id index, given the queries you listed. > > Isn't that was multi-column indexes are for? > > I have 100's of customers but millions of database entries. If I have > to I'll make a table for each customer, big_table_5 and big_table_6 > etc... but this seems to be a terrible hack and the database should be > able to do things like this for me. Multi-column indices may be appropriate. You will have to do some experimenting with actual data to determine the selectivity of the queries. Indices only make sense for queries that retrieve less than about 5% of the data (with lots of other deciding factors involved). Since your application seems to be insert heavy you should try to avoid too many and too complicated indices. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Have you done any benchmarks with a prototype of your application. Based off of some of the numbers I've been seeing with my testing, I would not be surprised if a single pIII 1ghz box with a decent disk (ata 100 or scsi 160) would handle the load you describe and is way cheaper than some big smp box. A dual pIII with a decent raid card would get you even farther. Certainly these are not as expandable as other options out there but they are worth a look. What I would be most concered about is a table that grows by a few million rows a week, that to me seems like a liability in itself since maintenance on that table will get pretty slow after a few months. Alex. On Wed, 14 Mar 2001, Orion Henry wrote: > I am specing out a database for my company and I need some advice. The > database in mind will consist of one table, with about 300 bytes per > record. The table will be getting two or three million inserts a week > and it would be nice if it could handle a sustained 30 to 50 a second. > The table will have (ACK!) 4 indexes (if anyone can explain to me how I > can get away with less please let me know) > > The indexes will be > int8 (primary key) > int4 (group number) > timestamp (creation date) > int4 (customer id) > > The customers want to be able to query their data whenever and have it > be snappy. > > So here is my question: > * Is there an OS that is best suited for postgres. All things being > equal I would like to run this on Linux. > * Is there an architecture best suited for this. Should I shell out the > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > Quad Alpha or Quad UltraSparc? > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? > * Will the 7.1 WAL save me, when it comes to insert times? > * I read something about disabling fsync() to speed up inserts. How > much will this speed things up? I would consider it as I COULD rebuild > lost data from my logs in the event of a system crash and one night a > year of panic on my part is worth saving $100,000 in drive arrays. > > Oh, and if any of you SQL guru's are still reading I'll show you the > queries I will be running to see if I can really ditch an index. > > select * from table where customer_id = ? and primary_key = ?::int8 > select * from table where customer_id = ? and group_number = ? > select * from table where customer_id = ? and creation > ? and creation > < ? > > Thanks for all your help, > > Orion Henry > CTO TrustCommerce > orion@trustcommerce.com > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Richard H wrote: > > On 3/14/01, 7:08:48 PM, Orion Henry <orion@trustcommerce.com> wrote > regarding [GENERAL] Fast Inserts and Hardware Questions: > > Please bear in mind that you are operating well above anything I do, so > I'm not speaking from experience. I appricate the advice in anycase. > > I am specing out a database for my company and I need some advice. The > > database in mind will consist of one table, with about 300 bytes per > > record. The table will be getting two or three million inserts a week > > and it would be nice if it could handle a sustained 30 to 50 a second. > > The table will have (ACK!) 4 indexes (if anyone can explain to me how I > > can get away with less please let me know) > > Three million inserts a week is about 5 per second, so you want to > sustain 10 times the average rate (sounds reasonable). Have you > considered separating inserts from reads? The inserts wouldn't even need > to go into a database initially, just log them to a file and feed them in > at a steady 10 per second (allows for downtime). Already done. It's just that the less time between getting the data and it being avaliable to the customers the better we look. > > The indexes will be > > int8 (primary key) > > int4 (group number) > > timestamp (creation date) > > int4 (customer id) > > > The customers want to be able to query their data whenever and have it > > be snappy. > > And they don't see why they need to pay so much for something so simple, > either ;-) Tell me about it ;) > > So here is my question: > > * Is there an OS that is best suited for postgres. All things being > > equal I would like to run this on Linux. > > Can't comment - I've only used PG on Linux. People who use the various > xxxBSDs swear by them (rather than at them) and Solaris has some > excellent support (at a cost) > > Of course, you could try MS-Access on Win-ME ;-) Or... not. ;) > > * Is there an architecture best suited for this. Should I shell out the > > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > > Quad Alpha or Quad UltraSparc? > > Again heresay, but Sun's hardware has excellent I/O throughput and it's > difficult to get people to abandon it in favour of Intel once they've > tried it. What I was hoping to hear was something along the line of "An int8 index? Dear God! You better be using a 64 bit machine!" or "The only way you will be able to handle a load like that will be with UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level optimizations for the Alpha processors, they are the fastest!" From the sounds of it what I need is lots of fast disks and to hell with the CPU. I might as well get an Athlon and a Network Appliance. > > * Since most of what I am doing is inserts I will assume that the disk > > will be my bottleneck over anything else. Is this correct? > > Yes. > > > * Will the 7.1 WAL save me, when it comes to insert times? > > * I read something about disabling fsync() to speed up inserts. How > > much will this speed things up? I would consider it as I COULD rebuild > > lost data from my logs in the event of a system crash and one night a > > year of panic on my part is worth saving $100,000 in drive arrays. > > Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the > WAL stuff means you can do so safely. Basically PG should rebuild itself > from the WAL file in the event of system failure. Have to admit I've not > tried it myself yet. A tenfold increase in insert speed over what I have now would make me a very happy man. That would put me in the order of magnitude of speed I need with the hardware I have now. > > Oh, and if any of you SQL guru's are still reading I'll show you the > > queries I will be running to see if I can really ditch an index. > > > select * from table where customer_id = ? and primary_key = ?::int8 > > Well if primary_key is one, you don't need customer_id. true. > > select * from table where customer_id = ? and group_number = ? The groups are less than 10 objects each so ther customer index here does no good. > > select * from table where customer_id = ? and creation > ? and creation < ? This is where the customer index comes in handy. I'll explain below. > How many customers do you have? Are the customers going to want a > specific time period? If so, you could have one table per week (say) and > use a view to conceal the fact for the occasional query that crosses week > boundaries. > > If these are actually going to by GROUP BY queries totalling figures, > perhaps try calculating totals beforehand. > > If these are representative of your main queries, I'd be tempted to buy > 10 cheap machines (+ a spare) and split the customers among the machines. > Proxy this stuff at the application level and they'll never know. For > your management stats you'd have to write a script to summarise stuff > from several machines, but that shouldn't be too difficult. The spare > machine can be over-specced and have the data from all the other ready > for a hot-swapover. > > The beauty of multiple machines is it should scale well (so long as you > are sufficiently fascist about keeping the configs the same). I have about 100 customers but one of them is repsonsible for generating over half the database entries. Without the customer index the queries of the smaller customers who have a dozen entries are slow becasue the database is sorting through millions of entries from other customers. I suppose the thing to do is to get the big customer their own box and drop the customer index all together.
What extra overhead do you have? Are there any triggers on the table? Can you run some insert tests on your main table, starting with no index and adding one index at a time so you can see exactly what the effect of each index is. Also what are the specs on your scsi raid? is it 0, 1, 0+1 or 5? How many drives? which type of scsi, (scsi-II, ultra, wide 160, etc...) Can you install 7.1beta5 in a separate dir and run a test on that? Alex. On Wed, 14 Mar 2001, Orion Henry wrote: > adb wrote: > > > > Have you done any benchmarks with a prototype of your application. > > > > Based off of some of the numbers I've been seeing with my testing, > > I would not be surprised if a single pIII 1ghz box with a decent disk > > (ata 100 or scsi 160) would handle the load you describe and is way > > cheaper than some big smp box. A dual pIII with a decent raid card > > would get you even farther. Certainly these are not as expandable as > > other options out there but they are worth a look. > > > > What I would be most concered about is a table that grows by a few million > > rows a week, that to me seems like a liability in itself since > > maintenance on that table will get pretty slow after a few months. > > > > Alex. > > Right now I have a prototype running on a dual PIII 533 MHz with a scsi > raid and I am getting 2 inserts / second maximum. (To be fair there is > some overhead I did not mention on each insert but not a lot). Now if > WAL or disabling fsync() can get me a 10x speedup that's 20/second which > is almost what I need. Perhaps finding a way to drop some of my index's > and other overhead could get me to 30/second - my minimum. > > But yes - it seems that CPU power is the least of my worries. I just > need really fast disks and 7.1 to come out soon. >
adb wrote: > > Have you done any benchmarks with a prototype of your application. > > Based off of some of the numbers I've been seeing with my testing, > I would not be surprised if a single pIII 1ghz box with a decent disk > (ata 100 or scsi 160) would handle the load you describe and is way > cheaper than some big smp box. A dual pIII with a decent raid card > would get you even farther. Certainly these are not as expandable as > other options out there but they are worth a look. > > What I would be most concered about is a table that grows by a few million > rows a week, that to me seems like a liability in itself since > maintenance on that table will get pretty slow after a few months. > > Alex. Right now I have a prototype running on a dual PIII 533 MHz with a scsi raid and I am getting 2 inserts / second maximum. (To be fair there is some overhead I did not mention on each insert but not a lot). Now if WAL or disabling fsync() can get me a 10x speedup that's 20/second which is almost what I need. Perhaps finding a way to drop some of my index's and other overhead could get me to 30/second - my minimum. But yes - it seems that CPU power is the least of my worries. I just need really fast disks and 7.1 to come out soon.
> > > * Is there an architecture best suited for this. Should I shell out the > > > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > > > Quad Alpha or Quad UltraSparc? > > > > Again heresay, but Sun's hardware has excellent I/O throughput and it's > > difficult to get people to abandon it in favour of Intel once they've > > tried it. > > What I was hoping to hear was something along the line of "An int8 > index? Dear God! You better be using a 64 bit machine!" or "The only > way you will be able to handle a load like that will be with > UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level > optimizations for the Alpha processors, they are the fastest!" From the > sounds of it what I need is lots of fast disks and to hell with the > CPU. I might as well get an Athlon and a Network Appliance. I have a Quad Xeon that I could do some testing for you, if you wanted. I'm awfully lazy, though, so you'll have to send me a schema and some test data. ; ) I can give it a run with a single process, or with multiple simultaneous processes over the network. steve Alphas may be terrific (amazing, actually) for floating-point work, but for Postgres work, I'm not terribly impressed with them. A Compaq rep let us borrow a $25,000 dual-alpha for a week, and I pitted in in a severe stress-test against our quad Xeon. At the time, because of a lack of money and an obscure bug, we had PII Xeon 400's in it, *without* the L2 cache - talk about cutting the hamstrings. The Compaq machine was only about 1.5 times faster than the Xeon. Now that we have PIII Xeon700's in it (with the cache working), the machine has probably run us about $12,000 (including hardware RAID and the lot), and I'm sure that it would at least equal the dual Alpha now. steve
> Right now I have a prototype running on a dual PIII 533 MHz with a scsi > raid and I am getting 2 inserts / second maximum. (To be fair there is > some overhead I did not mention on each insert but not a lot). Two per second!? I can _type_ faster than that! :) I just ran a simple benchmark on two different systems. With the following small table and two-column index, I timed the insertion of 10,000 rows. Machine #1: Dual PIII/733, 512Mb RAM, hardware RAID with 10K RPM Ultra3 SCSI drives. Running PostgreSQL 7.0.3 and RedHat 7.0. Additional CPU load was minimal, additional disk load was light. Elapsed time was 56 seconds, for an average rate of 182 insertions per second. (Ok, ok, keep the sex jokes to youself, please. :) Machine #2: Single Pentium 200 (non-MMX), 64Mb RAM, 5400 RPM IDE drive. Running PostgreSQL 7.1beta5 and RedHat 6.2. Additional diskload was minimal, additional CPU load included continuous decoding of 192K MP3s, which is not such a trivial task for the ole' 200. Time was 44 seconds, for an average rate of 226 per second! Wow, does WAL make so much of a difference as to make my two-generations-old, built-from-spare-parts scrap box faster than my latest and greatest production server? Yikes! CREATE TABLE "thing" ( "date" date, "id" integer, "addr" text, "contract" integer, "type" integer, "count" integer ); CREATE INDEX "thing_index" on "thing" using btree ( "date" "date_ops", "id" "int4_ops" ); -- Alex Howansky Wankwood Associates http://www.wankwood.com/
What you may have to do some research on is how to spread the disk writes around as you sound like you will be write bound. WAL will be an writes always at the end of a file. From memory raid4 is better than raid5 for this kind of write but as the WAL will remain resonably small in comparison to the db it may be best to stripe and mirror disks for the best performance. Is is possible to separate the index file from the database file and place these on separted disk sets and scsi channels this should create faster io and index generation (anyone?). What is the ideal block size on disk for use with postgresql with your database (anyone?). -- Ian Willis -----Original Message----- From: Orion Henry [mailto:orion@trustcommerce.com] Sent: Thursday, 15 March 2001 11:43 AM To: adb Cc: pgsql-general@postgresql.org Subject: [GENERAL] Re: Fast Inserts and Hardware Questions adb wrote: > > Have you done any benchmarks with a prototype of your application. > > Based off of some of the numbers I've been seeing with my testing, > I would not be surprised if a single pIII 1ghz box with a decent disk > (ata 100 or scsi 160) would handle the load you describe and is way > cheaper than some big smp box. A dual pIII with a decent raid card > would get you even farther. Certainly these are not as expandable as > other options out there but they are worth a look. > > What I would be most concered about is a table that grows by a few million > rows a week, that to me seems like a liability in itself since > maintenance on that table will get pretty slow after a few months. > > Alex. Right now I have a prototype running on a dual PIII 533 MHz with a scsi raid and I am getting 2 inserts / second maximum. (To be fair there is some overhead I did not mention on each insert but not a lot). Now if WAL or disabling fsync() can get me a 10x speedup that's 20/second which is almost what I need. Perhaps finding a way to drop some of my index's and other overhead could get me to 30/second - my minimum. But yes - it seems that CPU power is the least of my worries. I just need really fast disks and 7.1 to come out soon. ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> > * Since most of what I am doing is inserts I will assume that the disk > > will be my bottleneck over anything else. Is this correct? > > Yes. > > > * Will the 7.1 WAL save me, when it comes to insert times? > > * I read something about disabling fsync() to speed up inserts. How > > much will this speed things up? I would consider it as I COULD rebuild > > lost data from my logs in the event of a system crash and one night a > > year of panic on my part is worth saving $100,000 in drive arrays. > > Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the > WAL stuff means you can do so safely. Basically PG should rebuild itself > from the WAL file in the event of system failure. Have to admit I've not > tried it myself yet. No. You should get good performance in 7.1 without using -F. Using -F in 7.1 makes the system unreliable in case of an OS crash. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
In article <3AAFC1C0.108A493@trustcommerce.com>, "Orion Henry" <orion@trustcommerce.com> wrote: > I am specing out a database for my company and I need some advice. The > database in mind will consist of one table, with about 300 bytes per > record. The table will be getting two or three million inserts a week > and it would be nice if it could handle a sustained 30 to 50 a second. [snip] > The customers want to be able to query their data whenever and have it > be snappy. How many week's worth of data will you be keeping? How many customers at a time will be querying? > So here is my question: > * Is there an OS that is best suited for postgres. All things being > equal I would like to run this on Linux. > * Is there an architecture best > suited for this. Should I shell out the extra cash for a 64 bit box > over a 32bit one. Quad Xeon or Quad Alpha? Quad Alpha or Quad > UltraSparc? Linux should be fine. 32-bit will also be fine. The real question is how vital is the data. If it's important, you'll want RAID. You'll want 64-bit PCI, too. I'd go with a Tier-1 vendor, such as IBM, Compaq, or Dell (in that order). > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? * Will the > 7.1 WAL save me, when it comes to insert times? I haven't pounded on 7.1, so I can't say. What I can say is that your RAID will affect this quite a bit. RAID-5 is the cheapest way to go, but also the slowest. IBM offers RAID-5E, which gives about 20% improvement over RAID-5 at the cost of one more disk. Best performance though is with RAID 1+0 or 0+1. As you might have guessed, it's the most expensive, too. If you have any boxen available to test on, I'd suggest doing some runs with simulated data driven from multiple clients. That would give you a good feel for it. If you have any "hefty" boxen, so much the better. Re: your indexes. It looks like the first would best optimize on primary_key, so it looks covered. What is the relationship (if any) between customers and groups? If the third query is being run a lot, clustering on creation_date, customer_id would help. The downside is will probably be the worst for inserts. Which would take us back to running some largish tests using realistic data (you'd be aiming to have roughly the same selectivity in your test data as your real-world app will have). Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond
I'm a big fan of commodity hardware however I think that some of the newer suns and alphas have a crossbar based memory compared to a bus which if for db can be a big win. Also having the cache disabled can sometimes benefit applications like dbs doing large queries. -- Ian Willis -----Original Message----- From: Steve Wolfe [mailto:steve@iboats.com] Sent: Thursday, 15 March 2001 12:21 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Re: Fast Inserts and Hardware Questions > > > * Is there an architecture best suited for this. Should I shell out the > > > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > > > Quad Alpha or Quad UltraSparc? > > > > Again heresay, but Sun's hardware has excellent I/O throughput and it's > > difficult to get people to abandon it in favour of Intel once they've > > tried it. > > What I was hoping to hear was something along the line of "An int8 > index? Dear God! You better be using a 64 bit machine!" or "The only > way you will be able to handle a load like that will be with > UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level > optimizations for the Alpha processors, they are the fastest!" From the > sounds of it what I need is lots of fast disks and to hell with the > CPU. I might as well get an Athlon and a Network Appliance. I have a Quad Xeon that I could do some testing for you, if you wanted. I'm awfully lazy, though, so you'll have to send me a schema and some test data. ; ) I can give it a run with a single process, or with multiple simultaneous processes over the network. steve Alphas may be terrific (amazing, actually) for floating-point work, but for Postgres work, I'm not terribly impressed with them. A Compaq rep let us borrow a $25,000 dual-alpha for a week, and I pitted in in a severe stress-test against our quad Xeon. At the time, because of a lack of money and an obscure bug, we had PII Xeon 400's in it, *without* the L2 cache - talk about cutting the hamstrings. The Compaq machine was only about 1.5 times faster than the Xeon. Now that we have PIII Xeon700's in it (with the cache working), the machine has probably run us about $12,000 (including hardware RAID and the lot), and I'm sure that it would at least equal the dual Alpha now. steve ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Wed, Mar 14, 2001 at 11:08:48AM -0800, Orion Henry wrote: > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? Recent personal experience has sold me on hardware RAID controllers and lots of RAM. Dell's relatively low-end PERC/2 (which is an AMD MegaRAID) on FreeBSD improved our create-and-populate scripts by at least an order of magnitude. -- Christopher Masto Senior Network Monkey NetMonger Communications chris@netmonger.net info@netmonger.net http://www.netmonger.net Free yourself, free your machine, free the daemon -- http://www.freebsd.org/
In article <Pine.LNX.4.30.0103141957490.14894-100000@net-srv-0001.bvrd.com>, "Alex Howansky" <alex@wankwood.com> wrote: [Interesting stats] > Wow, does WAL make so much of a difference as to make my > two-generations-old, built-from-spare-parts scrap box faster than my > latest and greatest production server? Yikes! What RAID level are you running? RAID-5 isn't always very fast on writes. That and WAL might be what's up. Have you tried 7.1b5 on the big box? Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond
> [Interesting stats] > > > Wow, does WAL make so much of a difference as to make my > > two-generations-old, built-from-spare-parts scrap box faster than my > > latest and greatest production server? Yikes! > > What RAID level are you running? RAID-5 isn't always very > fast on writes. That and WAL might be what's up. Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID equipped boxes sitting around, so I can't experiment with the different RAID levels. Still, you'd think that even a "slow" RAID-5 configuration would be faster than a $98 IDE drive... > Have you tried 7.1b5 on the big box? Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I get around to doing that. -- Alex Howansky Wankwood Associates http://www.wankwood.com/
> Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID > equipped boxes sitting around, so I can't experiment with the different RAID > levels. Still, you'd think that even a "slow" RAID-5 configuration would be > faster than a $98 IDE drive... Yes, it certainly should be. Right now I have a Mylex 170 in my machine for testing, hooked to 4 IBM 9-gig drives. Three of them are in a RAID 5 array, the last is a hot-spare. Copying data from the IDE drive to the RAID array, the IDE drive reads at full speed, the lights on the RAID array just blink quickly about once per second. The controller has 64 megs of cache on it, but I've copied far larger data sets than that (several gigabytes), and the behavior has been the same. So... yes, RAID 5 is slower than RAID 0 or 1 for writes. But it's still dang fast, especially compared to a single IDE drive. steve
On Fri, Mar 16, 2001 at 03:53:22PM -0600, Alex Howansky wrote: > levels. Still, you'd think that even a "slow" RAID-5 configuration would be > faster than a $98 IDE drive... I wouldn't. mrc -- Mike Castle Life is like a clock: You can work constantly dalgoda@ix.netcom.com and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen
it all depends on the number of drives, the type of drives and if raid is being done in hardware or software. A three drive raid 5 array in software with older drives is probably going to be slower than a single 7200rpm ata100 drive. Not that this is what you have, just pointing out that it's possible. Alex. On Fri, 16 Mar 2001, Alex Howansky wrote: > > [Interesting stats] > > > > > Wow, does WAL make so much of a difference as to make my > > > two-generations-old, built-from-spare-parts scrap box faster than my > > > latest and greatest production server? Yikes! > > > > What RAID level are you running? RAID-5 isn't always very > > fast on writes. That and WAL might be what's up. > > Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID > equipped boxes sitting around, so I can't experiment with the different RAID > levels. Still, you'd think that even a "slow" RAID-5 configuration would be > faster than a $98 IDE drive... > > > Have you tried 7.1b5 on the big box? > > Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I get > around to doing that. > > -- > Alex Howansky > Wankwood Associates > http://www.wankwood.com/ > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://www.postgresql.org/search.mpl >
> On Fri, Mar 16, 2001 at 03:53:22PM -0600, Alex Howansky wrote: > > levels. Still, you'd think that even a "slow" RAID-5 configuration would be > > faster than a $98 IDE drive... > > I wouldn't. You'd be wrong. : ) I've also copied large amounts of data from an IDE drive to an old AMI MegaRAID controller with some old Quantum drives in a RAID 5 configuration. The RAID was still faster at writing than the relatively modern IDE drive could read. Not to mention that in an environment like a database server where you may have many processes accessing the disk at once, a SCSI RAID array makes IDE look like a toy. steve
On Fri, Mar 16, 2001 at 02:44:57PM -0800, adb wrote: > and if raid is being done in hardware or software. It's not surprising to see software raid outperforming hardware raid (both of modern vintage). mrc -- Mike Castle Life is like a clock: You can work constantly dalgoda@ix.netcom.com and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen
> It's not surprising to see software raid outperforming hardware raid (both > of modern vintage). That's provided, of course, that you have a good amount of free CPU cycles. On machines that are otherwise heavily CPU-loaded, software raid has been (in my experience) horrible. steve
The use of scsi raid over a new ide drive may not be as much as you think in some situations. In situations where the writes will always be at the tail of the file like WAL ide may perform really well if this is the only writes that are occuring on the disk. The heads will alway be in the right place for the write so latency is just going to be spin time not positioning. This may be compared to all the writes occuring a big fast raid system where all the system writes are occuring in different positions on the same disk set. Seek time increases due to the multiplicity of writes occuring in different area. Even scsi smarts with out of order writes can only make this process optimal. Dedicating physical disk sets can eliminate some of the problems. -- Ian Willis -----Original Message----- From: Alex Howansky [mailto:alex@wankwood.com] Sent: Saturday, 17 March 2001 8:53 AM To: pgsql-general@postgresql.org Subject: [GENERAL] Re: Re: Fast Inserts and Hardware Questions > [Interesting stats] > > > Wow, does WAL make so much of a difference as to make my > > two-generations-old, built-from-spare-parts scrap box faster than my > > latest and greatest production server? Yikes! > > What RAID level are you running? RAID-5 isn't always very > fast on writes. That and WAL might be what's up. Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID equipped boxes sitting around, so I can't experiment with the different RAID levels. Still, you'd think that even a "slow" RAID-5 configuration would be faster than a $98 IDE drive... > Have you tried 7.1b5 on the big box? Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I get around to doing that. -- Alex Howansky Wankwood Associates http://www.wankwood.com/ ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
On linux the software raid will use the MMX instructions, can't these be done in parrell to most other operations? Memory bandwidth may be the real killer -- Ian Willis Systems Administrator Division of Entomology CSIRO GPO Box 1700 Canberra ACT 2601 ph 02 6246 4391 fax 02 6246 4000 -----Original Message----- From: Steve Wolfe [mailto:steve@iboats.com] Sent: Saturday, 17 March 2001 10:48 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Re: Re: Fast Inserts and Hardware Questions > It's not surprising to see software raid outperforming hardware raid (both > of modern vintage). That's provided, of course, that you have a good amount of free CPU cycles. On machines that are otherwise heavily CPU-loaded, software raid has been (in my experience) horrible. steve ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl