Re: Fast Inserts and Hardware Questions - Mailing list pgsql-general

From Gordon A. Runkle
Subject Re: Fast Inserts and Hardware Questions
Date
Msg-id 98p3rq$4of$1@news.tht.net
Whole thread Raw
In response to Fast Inserts and Hardware Questions  (Orion Henry <orion@trustcommerce.com>)
List pgsql-general
In article <3AAFC1C0.108A493@trustcommerce.com>, "Orion Henry"
<orion@trustcommerce.com> wrote:


> I am specing out a database for my company and I need some advice.  The
> database in mind will consist of one table, with about 300 bytes per
> record.  The table will be getting two or three million inserts a week
> and it would be nice if it could handle a sustained 30 to 50 a second.
[snip]
> The customers want to be able to query their data whenever and have it
> be snappy.

How many week's worth of data will you be keeping?

How many customers at a time will be querying?

> So here is my question:
> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.
> * Is there an architecture best
> suited for this.  Should I shell out the extra cash for a 64 bit box
> over a 32bit one.  Quad Xeon or Quad Alpha? Quad Alpha or Quad
> UltraSparc?

Linux should be fine.  32-bit will also be fine.  The real question
is how vital is the data.  If it's important, you'll want RAID.
You'll want 64-bit PCI, too.  I'd go with a Tier-1 vendor, such
as IBM, Compaq, or Dell (in that order).

> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct? * Will the
> 7.1 WAL save me, when it comes to insert times?

I haven't pounded on 7.1, so I can't say.  What I can say is that
your RAID will affect this quite a bit.  RAID-5 is the cheapest
way to go, but also the slowest.  IBM offers RAID-5E, which gives
about 20% improvement over RAID-5 at the cost of one more disk.
Best performance though is with RAID 1+0 or 0+1.  As you might
have guessed, it's the most expensive, too.

If you have any boxen available to test on, I'd suggest doing
some runs with simulated data driven from multiple clients.  That
would give you a good feel for it.  If you have any "hefty" boxen,
so much the better.

Re: your indexes.  It looks like the first would best optimize
on primary_key, so it looks covered.

What is the relationship (if any) between customers and groups?

If the third query is being run a lot, clustering on creation_date,
customer_id would help.  The downside is will probably be the
worst for inserts.

Which would take us back to running some largish tests using
realistic data (you'd be aiming to have roughly the same selectivity
in your test data as your real-world app will have).

Gordon.
--
It doesn't get any easier, you just go faster.
   -- Greg LeMond

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Fast Inserts and Hardware Questions
Next
From: "Willis, Ian (Ento, Canberra)"
Date:
Subject: RE: Re: Fast Inserts and Hardware Questions