Re: Fast Inserts and Hardware Questions - Mailing list pgsql-general
From | Orion Henry |
---|---|
Subject | Re: Fast Inserts and Hardware Questions |
Date | |
Msg-id | 3AB00E01.6C4F7333@trustcommerce.com Whole thread Raw |
In response to | Fast Inserts and Hardware Questions (Orion Henry <orion@trustcommerce.com>) |
List | pgsql-general |
Richard H wrote: > > On 3/14/01, 7:08:48 PM, Orion Henry <orion@trustcommerce.com> wrote > regarding [GENERAL] Fast Inserts and Hardware Questions: > > Please bear in mind that you are operating well above anything I do, so > I'm not speaking from experience. I appricate the advice in anycase. > > I am specing out a database for my company and I need some advice. The > > database in mind will consist of one table, with about 300 bytes per > > record. The table will be getting two or three million inserts a week > > and it would be nice if it could handle a sustained 30 to 50 a second. > > The table will have (ACK!) 4 indexes (if anyone can explain to me how I > > can get away with less please let me know) > > Three million inserts a week is about 5 per second, so you want to > sustain 10 times the average rate (sounds reasonable). Have you > considered separating inserts from reads? The inserts wouldn't even need > to go into a database initially, just log them to a file and feed them in > at a steady 10 per second (allows for downtime). Already done. It's just that the less time between getting the data and it being avaliable to the customers the better we look. > > The indexes will be > > int8 (primary key) > > int4 (group number) > > timestamp (creation date) > > int4 (customer id) > > > The customers want to be able to query their data whenever and have it > > be snappy. > > And they don't see why they need to pay so much for something so simple, > either ;-) Tell me about it ;) > > So here is my question: > > * Is there an OS that is best suited for postgres. All things being > > equal I would like to run this on Linux. > > Can't comment - I've only used PG on Linux. People who use the various > xxxBSDs swear by them (rather than at them) and Solaris has some > excellent support (at a cost) > > Of course, you could try MS-Access on Win-ME ;-) Or... not. ;) > > * Is there an architecture best suited for this. Should I shell out the > > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha? > > Quad Alpha or Quad UltraSparc? > > Again heresay, but Sun's hardware has excellent I/O throughput and it's > difficult to get people to abandon it in favour of Intel once they've > tried it. What I was hoping to hear was something along the line of "An int8 index? Dear God! You better be using a 64 bit machine!" or "The only way you will be able to handle a load like that will be with UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level optimizations for the Alpha processors, they are the fastest!" From the sounds of it what I need is lots of fast disks and to hell with the CPU. I might as well get an Athlon and a Network Appliance. > > * Since most of what I am doing is inserts I will assume that the disk > > will be my bottleneck over anything else. Is this correct? > > Yes. > > > * Will the 7.1 WAL save me, when it comes to insert times? > > * I read something about disabling fsync() to speed up inserts. How > > much will this speed things up? I would consider it as I COULD rebuild > > lost data from my logs in the event of a system crash and one night a > > year of panic on my part is worth saving $100,000 in drive arrays. > > Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the > WAL stuff means you can do so safely. Basically PG should rebuild itself > from the WAL file in the event of system failure. Have to admit I've not > tried it myself yet. A tenfold increase in insert speed over what I have now would make me a very happy man. That would put me in the order of magnitude of speed I need with the hardware I have now. > > Oh, and if any of you SQL guru's are still reading I'll show you the > > queries I will be running to see if I can really ditch an index. > > > select * from table where customer_id = ? and primary_key = ?::int8 > > Well if primary_key is one, you don't need customer_id. true. > > select * from table where customer_id = ? and group_number = ? The groups are less than 10 objects each so ther customer index here does no good. > > select * from table where customer_id = ? and creation > ? and creation < ? This is where the customer index comes in handy. I'll explain below. > How many customers do you have? Are the customers going to want a > specific time period? If so, you could have one table per week (say) and > use a view to conceal the fact for the occasional query that crosses week > boundaries. > > If these are actually going to by GROUP BY queries totalling figures, > perhaps try calculating totals beforehand. > > If these are representative of your main queries, I'd be tempted to buy > 10 cheap machines (+ a spare) and split the customers among the machines. > Proxy this stuff at the application level and they'll never know. For > your management stats you'd have to write a script to summarise stuff > from several machines, but that shouldn't be too difficult. The spare > machine can be over-specced and have the data from all the other ready > for a hot-swapover. > > The beauty of multiple machines is it should scale well (so long as you > are sufficiently fascist about keeping the configs the same). I have about 100 customers but one of them is repsonsible for generating over half the database entries. Without the customer index the queries of the smaller customers who have a dozen entries are slow becasue the database is sorting through millions of entries from other customers. I suppose the thing to do is to get the big customer their own box and drop the customer index all together.
pgsql-general by date: