Thread: Fast Inserts and Hardware Questions

Fast Inserts and Hardware Questions

From
Orion Henry
Date:
I am specing out a database for my company and I need some advice.  The
database in mind will consist of one table, with about 300 bytes per
record.  The table will be getting two or three million inserts a week
and it would be nice if it could handle a sustained 30 to 50 a second.
The table will have (ACK!) 4 indexes (if anyone can explain to me how I
can get away with less please let me know)

The indexes will be
int8 (primary key)
int4 (group number)
timestamp (creation date)
int4 (customer id)

The customers want to be able to query their data whenever and have it
be snappy.

So here is my question:
* Is there an OS that is best suited for postgres. All things being
equal I would like to run this on Linux.
* Is there an architecture best suited for this.  Should I shell out the
extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
Quad Alpha or Quad UltraSparc?
* Since most of what I am doing is inserts I will assume that the disk
will be my bottleneck over anything else.  Is this correct?
* Will the 7.1 WAL save me, when it comes to insert times?
* I read something about disabling fsync() to speed up inserts.  How
much will this speed things up?  I would consider it as I COULD rebuild
lost data from my logs in the event of a system crash and one night a
year of panic on my part is worth saving $100,000 in drive arrays.

Oh, and if any of you SQL guru's are still reading I'll show you the
queries I will be running to see if I can really ditch an index.

select * from table where customer_id = ? and primary_key = ?::int8
select * from table where customer_id = ? and group_number = ?
select * from table where customer_id = ? and creation > ? and creation
< ?

Thanks for all your help,

    Orion Henry
    CTO TrustCommerce
    orion@trustcommerce.com

AGE() function

From
Michael Fork
Date:
Can someone tell me what I am missing here...

Thanks

Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio

radius=# BEGIN;
BEGIN
radius=# select age(tstamp, now()) from radacct limit 1;
     age
--------------
 03:37:08 ago
(1 row)

radius=# select age(now(), tstamp) from radacct limit 1;
   age
----------
 03:37:08
(1 row)

radius=# SELECT count(*) FROM radacct WHERE age(now(), tstamp) > '6
months'::interval;
 count
--------
 128378
(1 row)

-- Up until this points everything makes sense, however what follows
-- does not

radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6 months
ago'::interval;
  count
---------
 1988641
(1 row)

-- Shouldn't this be equal to the previous query (flipped the arguments
-- and added 'ago'

radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6
months'::interval;
 count
-------
     0
(1 row)

-- Since the previous query didn't work, I thought that this would return
-- the right number of rows, but I was wrong



Re: Fast Inserts and Hardware Questions

From
Peter Eisentraut
Date:
Orion Henry writes:

> The indexes will be
> int8 (primary key)
> int4 (group number)
> timestamp (creation date)
> int4 (customer id)

Since one query can only use one index per table, you should only need the
customer_id index, given the queries you listed.

> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.
> * Is there an architecture best suited for this.  Should I shell out the
> extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
> Quad Alpha or Quad UltraSparc?
> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct?

Probably.  The file system is also going to be relevant.  Linux' ext2fs is
not the best possible choice here.  The CPU is probably going to be the
least of your problems.

> * Will the 7.1 WAL save me, when it comes to insert times?

To be expected.

> * I read something about disabling fsync() to speed up inserts.  How
> much will this speed things up?  I would consider it as I COULD rebuild
> lost data from my logs in the event of a system crash and one night a
> year of panic on my part is worth saving $100,000 in drive arrays.

Disabling fsync can be considered obsolete with 7.1 WAL.

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/


Re: AGE() function

From
Stephan Szabo
Date:
On Wed, 14 Mar 2001, Michael Fork wrote:

> radius=# SELECT count(*) FROM radacct WHERE age(now(), tstamp) > '6
> months'::interval;
>  count
> --------
>  128378
> (1 row)
>
> -- Up until this points everything makes sense, however what follows
> -- does not
>
> radius=# SELECT count(*) FROM radacct WHERE age(tstamp, now()) > '6 months
> ago'::interval;
>   count
> ---------
>  1988641
> (1 row)
>
> -- Shouldn't this be equal to the previous query (flipped the arguments
> -- and added 'ago'

I'd guess that since 6 months ago is effectively a negative interval,
wouldn't you want to be comparing  <'6 months ago'::interval for the
same effect?
(a-b>c -> b-a<-c or something)


Re: Fast Inserts and Hardware Questions

From
Orion Henry
Date:
Peter Eisentraut wrote:
>
> Orion Henry writes:
>
> > The indexes will be
> > int8 (primary key)
> > int4 (group number)
> > timestamp (creation date)
> > int4 (customer id)
>
> Since one query can only use one index per table, you should only need the
> customer_id index, given the queries you listed.

Isn't that was multi-column indexes are for?

I have 100's of customers but millions of database entries.  If I have
to I'll make a table for each customer, big_table_5 and big_table_6
etc... but this seems to be a terrible hack and the database should be
able to do things like this for me.

Orion Henry
CTO TrustCommerce
orion@trustcommerce.com

Re: Fast Inserts and Hardware Questions

From
Richard H
Date:
On 3/14/01, 7:08:48 PM, Orion Henry <orion@trustcommerce.com> wrote
regarding [GENERAL] Fast Inserts and Hardware Questions:

Please bear in mind that you are operating well above anything I do, so
I'm not speaking from experience.

> I am specing out a database for my company and I need some advice.  The
> database in mind will consist of one table, with about 300 bytes per
> record.  The table will be getting two or three million inserts a week
> and it would be nice if it could handle a sustained 30 to 50 a second.
> The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> can get away with less please let me know)

Three million inserts a week is about 5 per second, so you want to
sustain 10 times the average rate (sounds reasonable). Have you
considered separating inserts from reads? The inserts wouldn't even need
to go into a database initially, just log them to a file and feed them in
at a steady 10 per second (allows for downtime).

> The indexes will be
> int8 (primary key)
> int4 (group number)
> timestamp (creation date)
> int4 (customer id)

> The customers want to be able to query their data whenever and have it
> be snappy.

And they don't see why they need to pay so much for something so simple,
either ;-)

> So here is my question:
> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.

Can't comment - I've only used PG on Linux. People who use the various
xxxBSDs swear by them (rather than at them) and Solaris has some
excellent support (at a cost)

Of course, you could try MS-Access on Win-ME ;-)

> * Is there an architecture best suited for this.  Should I shell out the
> extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
> Quad Alpha or Quad UltraSparc?

Again heresay, but Sun's hardware has excellent I/O throughput and it's
difficult to get people to abandon it in favour of Intel once they've
tried it.

> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct?

Yes.

> * Will the 7.1 WAL save me, when it comes to insert times?
> * I read something about disabling fsync() to speed up inserts.  How
> much will this speed things up?  I would consider it as I COULD rebuild
> lost data from my logs in the event of a system crash and one night a
> year of panic on my part is worth saving $100,000 in drive arrays.

Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
WAL stuff means you can do so safely. Basically PG should rebuild itself
from the WAL file in the event of system failure. Have to admit I've not
tried it myself yet.

> Oh, and if any of you SQL guru's are still reading I'll show you the
> queries I will be running to see if I can really ditch an index.

> select * from table where customer_id = ? and primary_key = ?::int8

Well if primary_key is one, you don't need customer_id.

> select * from table where customer_id = ? and group_number = ?
> select * from table where customer_id = ? and creation > ? and creation
> < ?

How many customers do you have? Are the customers going to want a
specific time period? If so, you could have one table per week (say) and
use a view to conceal the fact for the occasional query that crosses week
boundaries.

If these are actually going to by GROUP BY queries totalling figures,
perhaps try calculating totals beforehand.

If these are representative of your main queries, I'd be tempted to buy
10 cheap machines (+ a spare) and split the customers among the machines.
Proxy this stuff at the application level and they'll never know. For
your management stats you'd have to write a script to summarise stuff
from several machines, but that shouldn't be too difficult. The spare
machine can be over-specced and have the data from all the other ready
for a hot-swapover.

The beauty of multiple machines is it should scale well (so long as you
are sufficiently fascist about keeping the configs the same).

> Thanks for all your help,

>       Orion Henry

Hope it was

- Richard Huxton

Re: Fast Inserts and Hardware Questions

From
Peter Eisentraut
Date:
Orion Henry writes:

> > Since one query can only use one index per table, you should only need the
> > customer_id index, given the queries you listed.
>
> Isn't that was multi-column indexes are for?
>
> I have 100's of customers but millions of database entries.  If I have
> to I'll make a table for each customer, big_table_5 and big_table_6
> etc... but this seems to be a terrible hack and the database should be
> able to do things like this for me.

Multi-column indices may be appropriate.  You will have to do some
experimenting with actual data to determine the selectivity of the
queries.  Indices only make sense for queries that retrieve less than
about 5% of the data (with lots of other deciding factors involved).
Since your application seems to be insert heavy you should try to avoid
too many and too complicated indices.

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/


Re: Fast Inserts and Hardware Questions

From
adb
Date:
Have you done any benchmarks with a prototype of your application.

Based off of some of the numbers I've been seeing with my testing,
I would not be surprised if a single pIII 1ghz box with a decent disk
(ata 100 or scsi 160) would handle the load you describe and is way
cheaper than some big smp box.  A dual pIII with a decent raid card
would get you even farther.  Certainly these are not as expandable as
other options out there but they are worth a look.

What I would be most concered about is a table that grows by a few million
rows a week, that to me seems like a liability in itself since
maintenance on that table will get pretty slow after a few months.

Alex.


On Wed, 14 Mar 2001, Orion Henry wrote:

> I am specing out a database for my company and I need some advice.  The
> database in mind will consist of one table, with about 300 bytes per
> record.  The table will be getting two or three million inserts a week
> and it would be nice if it could handle a sustained 30 to 50 a second.
> The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> can get away with less please let me know)
>
> The indexes will be
> int8 (primary key)
> int4 (group number)
> timestamp (creation date)
> int4 (customer id)
>
> The customers want to be able to query their data whenever and have it
> be snappy.
>
> So here is my question:
> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.
> * Is there an architecture best suited for this.  Should I shell out the
> extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
> Quad Alpha or Quad UltraSparc?
> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct?
> * Will the 7.1 WAL save me, when it comes to insert times?
> * I read something about disabling fsync() to speed up inserts.  How
> much will this speed things up?  I would consider it as I COULD rebuild
> lost data from my logs in the event of a system crash and one night a
> year of panic on my part is worth saving $100,000 in drive arrays.
>
> Oh, and if any of you SQL guru's are still reading I'll show you the
> queries I will be running to see if I can really ditch an index.
>
> select * from table where customer_id = ? and primary_key = ?::int8
> select * from table where customer_id = ? and group_number = ?
> select * from table where customer_id = ? and creation > ? and creation
> < ?
>
> Thanks for all your help,
>
>     Orion Henry
>     CTO TrustCommerce
>     orion@trustcommerce.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: Fast Inserts and Hardware Questions

From
Orion Henry
Date:
Richard H wrote:
>
> On 3/14/01, 7:08:48 PM, Orion Henry <orion@trustcommerce.com> wrote
> regarding [GENERAL] Fast Inserts and Hardware Questions:
>
> Please bear in mind that you are operating well above anything I do, so
> I'm not speaking from experience.

I appricate the advice in anycase.

> > I am specing out a database for my company and I need some advice.  The
> > database in mind will consist of one table, with about 300 bytes per
> > record.  The table will be getting two or three million inserts a week
> > and it would be nice if it could handle a sustained 30 to 50 a second.
> > The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> > can get away with less please let me know)
>
> Three million inserts a week is about 5 per second, so you want to
> sustain 10 times the average rate (sounds reasonable). Have you
> considered separating inserts from reads? The inserts wouldn't even need
> to go into a database initially, just log them to a file and feed them in
> at a steady 10 per second (allows for downtime).

Already done.  It's just that the less time between getting the data and
it being avaliable to the customers the better we look.

> > The indexes will be
> > int8 (primary key)
> > int4 (group number)
> > timestamp (creation date)
> > int4 (customer id)
>
> > The customers want to be able to query their data whenever and have it
> > be snappy.
>
> And they don't see why they need to pay so much for something so simple,
> either ;-)

Tell me about it ;)

> > So here is my question:
> > * Is there an OS that is best suited for postgres. All things being
> > equal I would like to run this on Linux.
>
> Can't comment - I've only used PG on Linux. People who use the various
> xxxBSDs swear by them (rather than at them) and Solaris has some
> excellent support (at a cost)
>
> Of course, you could try MS-Access on Win-ME ;-)

Or... not. ;)

> > * Is there an architecture best suited for this.  Should I shell out the
> > extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
> > Quad Alpha or Quad UltraSparc?
>
> Again heresay, but Sun's hardware has excellent I/O throughput and it's
> difficult to get people to abandon it in favour of Intel once they've
> tried it.

What I was hoping to hear was something along the line of "An int8
index?  Dear God!  You better be using a 64 bit machine!"  or  "The only
way you will be able to handle a load like that will be with
UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level
optimizations for the Alpha processors, they are the fastest!"  From the
sounds of it what I need is lots of fast disks and to hell with the
CPU.  I might as well get an Athlon and a Network Appliance.

> > * Since most of what I am doing is inserts I will assume that the disk
> > will be my bottleneck over anything else.  Is this correct?
>
> Yes.
>
> > * Will the 7.1 WAL save me, when it comes to insert times?
> > * I read something about disabling fsync() to speed up inserts.  How
> > much will this speed things up?  I would consider it as I COULD rebuild
> > lost data from my logs in the event of a system crash and one night a
> > year of panic on my part is worth saving $100,000 in drive arrays.
>
> Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
> WAL stuff means you can do so safely. Basically PG should rebuild itself
> from the WAL file in the event of system failure. Have to admit I've not
> tried it myself yet.

A tenfold increase in insert speed over what I have now would make me a
very happy man.  That would put me in the order of magnitude of speed I
need with the hardware I have now.

> > Oh, and if any of you SQL guru's are still reading I'll show you the
> > queries I will be running to see if I can really ditch an index.
>
> > select * from table where customer_id = ? and primary_key = ?::int8
>
> Well if primary_key is one, you don't need customer_id.

true.

> > select * from table where customer_id = ? and group_number = ?

The groups are less than 10 objects each so ther customer index here
does no good.

> > select * from table where customer_id = ? and creation > ? and creation < ?

This is where the customer index comes in handy.  I'll explain below.

> How many customers do you have? Are the customers going to want a
> specific time period? If so, you could have one table per week (say) and
> use a view to conceal the fact for the occasional query that crosses week
> boundaries.
>
> If these are actually going to by GROUP BY queries totalling figures,
> perhaps try calculating totals beforehand.
>
> If these are representative of your main queries, I'd be tempted to buy
> 10 cheap machines (+ a spare) and split the customers among the machines.
> Proxy this stuff at the application level and they'll never know. For
> your management stats you'd have to write a script to summarise stuff
> from several machines, but that shouldn't be too difficult. The spare
> machine can be over-specced and have the data from all the other ready
> for a hot-swapover.
>
> The beauty of multiple machines is it should scale well (so long as you
> are sufficiently fascist about keeping the configs the same).

I have about 100 customers but one of them is repsonsible for generating
over half the database entries.  Without the customer index the queries
of the smaller customers who have a dozen entries are slow becasue the
database is sorting through millions of entries from other customers.  I
suppose the thing to do is to get the big customer their own box and
drop the customer index all together.

Re: Fast Inserts and Hardware Questions

From
adb
Date:
What extra overhead do you have?  Are there any triggers on the table?

Can you run some insert tests on your main table, starting with no
index and adding one index at a time so you can see exactly what the
effect of each index is.

Also what are the specs on your scsi raid?  is it 0, 1, 0+1 or 5?
How many drives? which type of scsi, (scsi-II, ultra, wide 160, etc...)

Can you install 7.1beta5 in a separate dir and run a test on that?

Alex.

On Wed, 14 Mar 2001, Orion Henry wrote:

> adb wrote:
> >
> > Have you done any benchmarks with a prototype of your application.
> >
> > Based off of some of the numbers I've been seeing with my testing,
> > I would not be surprised if a single pIII 1ghz box with a decent disk
> > (ata 100 or scsi 160) would handle the load you describe and is way
> > cheaper than some big smp box.  A dual pIII with a decent raid card
> > would get you even farther.  Certainly these are not as expandable as
> > other options out there but they are worth a look.
> >
> > What I would be most concered about is a table that grows by a few million
> > rows a week, that to me seems like a liability in itself since
> > maintenance on that table will get pretty slow after a few months.
> >
> > Alex.
>
> Right now I have a prototype running on a dual PIII 533 MHz with a scsi
> raid and I am getting 2 inserts / second maximum. (To be fair there is
> some overhead I did not mention on each insert but not a lot). Now if
> WAL or disabling fsync() can get me a 10x speedup that's 20/second which
> is almost what I need.  Perhaps finding a way to drop some of my index's
> and other overhead could get me to 30/second - my minimum.
>
> But yes - it seems that CPU power is the least of my worries.  I just
> need really fast disks and 7.1 to come out soon.
>


Re: Fast Inserts and Hardware Questions

From
Orion Henry
Date:
adb wrote:
>
> Have you done any benchmarks with a prototype of your application.
>
> Based off of some of the numbers I've been seeing with my testing,
> I would not be surprised if a single pIII 1ghz box with a decent disk
> (ata 100 or scsi 160) would handle the load you describe and is way
> cheaper than some big smp box.  A dual pIII with a decent raid card
> would get you even farther.  Certainly these are not as expandable as
> other options out there but they are worth a look.
>
> What I would be most concered about is a table that grows by a few million
> rows a week, that to me seems like a liability in itself since
> maintenance on that table will get pretty slow after a few months.
>
> Alex.

Right now I have a prototype running on a dual PIII 533 MHz with a scsi
raid and I am getting 2 inserts / second maximum. (To be fair there is
some overhead I did not mention on each insert but not a lot). Now if
WAL or disabling fsync() can get me a 10x speedup that's 20/second which
is almost what I need.  Perhaps finding a way to drop some of my index's
and other overhead could get me to 30/second - my minimum.

But yes - it seems that CPU power is the least of my worries.  I just
need really fast disks and 7.1 to come out soon.

Re: Re: Fast Inserts and Hardware Questions

From
"Steve Wolfe"
Date:
> > > * Is there an architecture best suited for this.  Should I shell out
the
> > > extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad
Alpha?
> > > Quad Alpha or Quad UltraSparc?
> >
> > Again heresay, but Sun's hardware has excellent I/O throughput and it's
> > difficult to get people to abandon it in favour of Intel once they've
> > tried it.
>
> What I was hoping to hear was something along the line of "An int8
> index?  Dear God!  You better be using a 64 bit machine!"  or  "The only
> way you will be able to handle a load like that will be with
> UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level
> optimizations for the Alpha processors, they are the fastest!"  From the
> sounds of it what I need is lots of fast disks and to hell with the
> CPU.  I might as well get an Athlon and a Network Appliance.

     I have a Quad Xeon that I could do some testing for you, if you wanted.
I'm awfully lazy, though, so you'll have to send me a schema and some test
data. ; )  I can give it a run with a single process, or with multiple
simultaneous processes over the network.

steve

    Alphas may be terrific (amazing, actually) for floating-point work, but
for Postgres work, I'm not terribly impressed with them.  A Compaq rep let
us borrow a $25,000 dual-alpha for a week, and I pitted in in a severe
stress-test against our quad Xeon.  At the time, because of a lack of money
and an obscure bug, we had PII Xeon 400's in it, *without* the L2 cache -
talk about cutting the hamstrings.  The Compaq machine was only about 1.5
times faster than the Xeon.  Now that we have PIII Xeon700's in it (with the
cache working), the machine has probably run us about $12,000 (including
hardware RAID and the lot), and I'm sure that it would at least equal the
dual Alpha now.

steve



Re: Fast Inserts and Hardware Questions

From
Alex Howansky
Date:
> Right now I have a prototype running on a dual PIII 533 MHz with a scsi
> raid and I am getting 2 inserts / second maximum. (To be fair there is
> some overhead I did not mention on each insert but not a lot).

Two per second!? I can _type_ faster than that! :)

I just ran a simple benchmark on two different systems. With the following
small table and two-column index, I timed the insertion of 10,000 rows.

Machine #1: Dual PIII/733, 512Mb RAM, hardware RAID with 10K RPM Ultra3 SCSI
drives. Running PostgreSQL 7.0.3 and RedHat 7.0. Additional CPU load was
minimal, additional disk load was light. Elapsed time was 56 seconds, for an
average rate of 182 insertions per second. (Ok, ok, keep the sex jokes to
youself, please. :)

Machine #2: Single Pentium 200 (non-MMX), 64Mb RAM, 5400 RPM IDE drive. Running
PostgreSQL 7.1beta5 and RedHat 6.2. Additional diskload was minimal, additional
CPU load included continuous decoding of 192K MP3s, which is not such a trivial
task for the ole' 200. Time was 44 seconds, for an average rate of 226 per
second!

Wow, does WAL make so much of a difference as to make my two-generations-old,
built-from-spare-parts scrap box faster than my latest and greatest production
server? Yikes!

CREATE TABLE "thing" (
        "date" date,
        "id" integer,
        "addr" text,
        "contract" integer,
        "type" integer,
        "count" integer
);

CREATE INDEX "thing_index" on "thing" using btree ( "date" "date_ops", "id" "int4_ops" );

--
Alex Howansky
Wankwood Associates
http://www.wankwood.com/


RE: Re: Fast Inserts and Hardware Questions

From
"Willis, Ian (Ento, Canberra)"
Date:
What you may have to do some research on is how to spread the disk writes
around as you sound like you will be write bound.
WAL will be an writes always at the end of a file. From memory raid4 is
better than raid5 for this kind of write but as the WAL will remain
resonably small in comparison to the db it may be best to stripe and mirror
disks for the best performance.
Is is possible to separate the index file from the database file and place
these on separted disk sets and scsi channels this should create faster io
and index generation (anyone?).
What is the ideal block size on disk for use with postgresql with your
database (anyone?).

--
Ian Willis

-----Original Message-----
From: Orion Henry [mailto:orion@trustcommerce.com]
Sent: Thursday, 15 March 2001 11:43 AM
To: adb
Cc: pgsql-general@postgresql.org
Subject: [GENERAL] Re: Fast Inserts and Hardware Questions


adb wrote:
>
> Have you done any benchmarks with a prototype of your application.
>
> Based off of some of the numbers I've been seeing with my testing,
> I would not be surprised if a single pIII 1ghz box with a decent disk
> (ata 100 or scsi 160) would handle the load you describe and is way
> cheaper than some big smp box.  A dual pIII with a decent raid card
> would get you even farther.  Certainly these are not as expandable as
> other options out there but they are worth a look.
>
> What I would be most concered about is a table that grows by a few million
> rows a week, that to me seems like a liability in itself since
> maintenance on that table will get pretty slow after a few months.
>
> Alex.

Right now I have a prototype running on a dual PIII 533 MHz with a scsi
raid and I am getting 2 inserts / second maximum. (To be fair there is
some overhead I did not mention on each insert but not a lot). Now if
WAL or disabling fsync() can get me a 10x speedup that's 20/second which
is almost what I need.  Perhaps finding a way to drop some of my index's
and other overhead could get me to 30/second - my minimum.

But yes - it seems that CPU power is the least of my worries.  I just
need really fast disks and 7.1 to come out soon.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Fast Inserts and Hardware Questions

From
Bruce Momjian
Date:
> > * Since most of what I am doing is inserts I will assume that the disk
> > will be my bottleneck over anything else.  Is this correct?
>
> Yes.
>
> > * Will the 7.1 WAL save me, when it comes to insert times?
> > * I read something about disabling fsync() to speed up inserts.  How
> > much will this speed things up?  I would consider it as I COULD rebuild
> > lost data from my logs in the event of a system crash and one night a
> > year of panic on my part is worth saving $100,000 in drive arrays.
>
> Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
> WAL stuff means you can do so safely. Basically PG should rebuild itself
> from the WAL file in the event of system failure. Have to admit I've not
> tried it myself yet.

No.  You should get good performance in 7.1 without using -F.  Using -F
in 7.1 makes the system unreliable in case of an OS crash.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Fast Inserts and Hardware Questions

From
"Gordon A. Runkle"
Date:
In article <3AAFC1C0.108A493@trustcommerce.com>, "Orion Henry"
<orion@trustcommerce.com> wrote:


> I am specing out a database for my company and I need some advice.  The
> database in mind will consist of one table, with about 300 bytes per
> record.  The table will be getting two or three million inserts a week
> and it would be nice if it could handle a sustained 30 to 50 a second.
[snip]
> The customers want to be able to query their data whenever and have it
> be snappy.

How many week's worth of data will you be keeping?

How many customers at a time will be querying?

> So here is my question:
> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.
> * Is there an architecture best
> suited for this.  Should I shell out the extra cash for a 64 bit box
> over a 32bit one.  Quad Xeon or Quad Alpha? Quad Alpha or Quad
> UltraSparc?

Linux should be fine.  32-bit will also be fine.  The real question
is how vital is the data.  If it's important, you'll want RAID.
You'll want 64-bit PCI, too.  I'd go with a Tier-1 vendor, such
as IBM, Compaq, or Dell (in that order).

> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct? * Will the
> 7.1 WAL save me, when it comes to insert times?

I haven't pounded on 7.1, so I can't say.  What I can say is that
your RAID will affect this quite a bit.  RAID-5 is the cheapest
way to go, but also the slowest.  IBM offers RAID-5E, which gives
about 20% improvement over RAID-5 at the cost of one more disk.
Best performance though is with RAID 1+0 or 0+1.  As you might
have guessed, it's the most expensive, too.

If you have any boxen available to test on, I'd suggest doing
some runs with simulated data driven from multiple clients.  That
would give you a good feel for it.  If you have any "hefty" boxen,
so much the better.

Re: your indexes.  It looks like the first would best optimize
on primary_key, so it looks covered.

What is the relationship (if any) between customers and groups?

If the third query is being run a lot, clustering on creation_date,
customer_id would help.  The downside is will probably be the
worst for inserts.

Which would take us back to running some largish tests using
realistic data (you'd be aiming to have roughly the same selectivity
in your test data as your real-world app will have).

Gordon.
--
It doesn't get any easier, you just go faster.
   -- Greg LeMond

RE: Re: Fast Inserts and Hardware Questions

From
"Willis, Ian (Ento, Canberra)"
Date:
I'm a big fan of commodity hardware however I think that some of the newer
suns and alphas have a crossbar based memory compared to a bus which if for
db can be a big win. Also having the cache disabled can sometimes benefit
applications like dbs doing large queries.

--
Ian Willis

-----Original Message-----
From: Steve Wolfe [mailto:steve@iboats.com]
Sent: Thursday, 15 March 2001 12:21 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Re: Fast Inserts and Hardware Questions



> > > * Is there an architecture best suited for this.  Should I shell out
the
> > > extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad
Alpha?
> > > Quad Alpha or Quad UltraSparc?
> >
> > Again heresay, but Sun's hardware has excellent I/O throughput and it's
> > difficult to get people to abandon it in favour of Intel once they've
> > tried it.
>
> What I was hoping to hear was something along the line of "An int8
> index?  Dear God!  You better be using a 64 bit machine!"  or  "The only
> way you will be able to handle a load like that will be with
> UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level
> optimizations for the Alpha processors, they are the fastest!"  From the
> sounds of it what I need is lots of fast disks and to hell with the
> CPU.  I might as well get an Athlon and a Network Appliance.

     I have a Quad Xeon that I could do some testing for you, if you wanted.
I'm awfully lazy, though, so you'll have to send me a schema and some test
data. ; )  I can give it a run with a single process, or with multiple
simultaneous processes over the network.

steve

    Alphas may be terrific (amazing, actually) for floating-point work, but
for Postgres work, I'm not terribly impressed with them.  A Compaq rep let
us borrow a $25,000 dual-alpha for a week, and I pitted in in a severe
stress-test against our quad Xeon.  At the time, because of a lack of money
and an obscure bug, we had PII Xeon 400's in it, *without* the L2 cache -
talk about cutting the hamstrings.  The Compaq machine was only about 1.5
times faster than the Xeon.  Now that we have PIII Xeon700's in it (with the
cache working), the machine has probably run us about $12,000 (including
hardware RAID and the lot), and I'm sure that it would at least equal the
dual Alpha now.

steve



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Fast Inserts and Hardware Questions

From
Christopher Masto
Date:
On Wed, Mar 14, 2001 at 11:08:48AM -0800, Orion Henry wrote:
> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else.  Is this correct?

Recent personal experience has sold me on hardware RAID controllers
and lots of RAM.  Dell's relatively low-end PERC/2 (which is an AMD
MegaRAID) on FreeBSD improved our create-and-populate scripts by at
least an order of magnitude.
--
Christopher Masto         Senior Network Monkey      NetMonger Communications
chris@netmonger.net        info@netmonger.net        http://www.netmonger.net

Free yourself, free your machine, free the daemon -- http://www.freebsd.org/

Re: Re: Fast Inserts and Hardware Questions

From
"Gordon A. Runkle"
Date:
In article
<Pine.LNX.4.30.0103141957490.14894-100000@net-srv-0001.bvrd.com>, "Alex
Howansky" <alex@wankwood.com> wrote:

[Interesting stats]

> Wow, does WAL make so much of a difference as to make my
> two-generations-old, built-from-spare-parts scrap box faster than my
> latest and greatest production server? Yikes!

What RAID level are you running?  RAID-5 isn't always very
fast on writes.  That and WAL might be what's up.

Have you tried 7.1b5 on the big box?

Gordon.
--
It doesn't get any easier, you just go faster.
   -- Greg LeMond

Re: Re: Fast Inserts and Hardware Questions

From
Alex Howansky
Date:
> [Interesting stats]
>
> > Wow, does WAL make so much of a difference as to make my
> > two-generations-old, built-from-spare-parts scrap box faster than my
> > latest and greatest production server? Yikes!
>
> What RAID level are you running?  RAID-5 isn't always very
> fast on writes.  That and WAL might be what's up.

Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID
equipped boxes sitting around, so I can't experiment with the different RAID
levels. Still, you'd think that even a "slow" RAID-5 configuration would be
faster than a $98 IDE drive...

> Have you tried 7.1b5 on the big box?

Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I get
around to doing that.

--
Alex Howansky
Wankwood Associates
http://www.wankwood.com/


Re: Re: Re: Fast Inserts and Hardware Questions

From
"Steve Wolfe"
Date:
> Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare
RAID
> equipped boxes sitting around, so I can't experiment with the different
RAID
> levels. Still, you'd think that even a "slow" RAID-5 configuration would
be
> faster than a $98 IDE drive...

  Yes, it certainly should be.  Right now I have a Mylex 170 in my machine
for testing, hooked to 4 IBM 9-gig drives.  Three of them are in a RAID 5
array, the last is a hot-spare.  Copying data from the IDE drive to the RAID
array, the IDE drive reads at full speed, the lights on the RAID array just
blink quickly about once per second.  The controller has 64 megs of cache on
it, but I've copied far larger data sets than that (several gigabytes), and
the behavior has been the same.

  So... yes, RAID 5 is slower than RAID 0 or 1 for writes.  But it's still
dang fast, especially compared to a single IDE drive.

steve



Re: Re: Re: Fast Inserts and Hardware Questions

From
Mike Castle
Date:
On Fri, Mar 16, 2001 at 03:53:22PM -0600, Alex Howansky wrote:
> levels. Still, you'd think that even a "slow" RAID-5 configuration would be
> faster than a $98 IDE drive...

I wouldn't.

mrc
--
       Mike Castle       Life is like a clock:  You can work constantly
  dalgoda@ix.netcom.com  and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day.  -- mrc
    We are all of us living in the shadow of Manhattan.  -- Watchmen

Re: Re: Re: Fast Inserts and Hardware Questions

From
adb
Date:
it all depends on the number of drives, the type of drives
and if raid is being done in hardware or software.

A three drive raid 5 array in software with older drives
is probably going to be slower than a single 7200rpm ata100 drive.

Not that this is what you have, just pointing out that it's possible.

Alex.

On Fri, 16 Mar 2001, Alex Howansky wrote:

> > [Interesting stats]
> >
> > > Wow, does WAL make so much of a difference as to make my
> > > two-generations-old, built-from-spare-parts scrap box faster than my
> > > latest and greatest production server? Yikes!
> >
> > What RAID level are you running?  RAID-5 isn't always very
> > fast on writes.  That and WAL might be what's up.
>
> Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID
> equipped boxes sitting around, so I can't experiment with the different RAID
> levels. Still, you'd think that even a "slow" RAID-5 configuration would be
> faster than a $98 IDE drive...
>
> > Have you tried 7.1b5 on the big box?
>
> Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I get
> around to doing that.
>
> --
> Alex Howansky
> Wankwood Associates
> http://www.wankwood.com/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>


Re: Re: Re: Fast Inserts and Hardware Questions

From
"Steve Wolfe"
Date:
> On Fri, Mar 16, 2001 at 03:53:22PM -0600, Alex Howansky wrote:
> > levels. Still, you'd think that even a "slow" RAID-5 configuration would
be
> > faster than a $98 IDE drive...
>
> I wouldn't.

  You'd be wrong. : )

   I've also copied large amounts of data from an IDE drive to an old AMI
MegaRAID controller with some old Quantum drives in a RAID 5 configuration.
The RAID was still faster at writing than the relatively modern IDE drive
could read.  Not to mention that in an environment like a database server
where you may have many processes accessing the disk at once, a SCSI RAID
array makes IDE look like a toy.

steve



Re: Re: Re: Fast Inserts and Hardware Questions

From
Mike Castle
Date:
On Fri, Mar 16, 2001 at 02:44:57PM -0800, adb wrote:
> and if raid is being done in hardware or software.

It's not surprising to see software raid outperforming hardware raid (both
of modern vintage).

mrc
--
       Mike Castle       Life is like a clock:  You can work constantly
  dalgoda@ix.netcom.com  and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day.  -- mrc
    We are all of us living in the shadow of Manhattan.  -- Watchmen

Re: Re: Re: Fast Inserts and Hardware Questions

From
"Steve Wolfe"
Date:
> It's not surprising to see software raid outperforming hardware raid (both
> of modern vintage).

  That's provided, of course, that you have a good amount of free CPU
cycles.  On machines that are otherwise heavily CPU-loaded, software raid
has been (in my experience) horrible.

steve



RE: Re: Re: Fast Inserts and Hardware Questions

From
"Willis, Ian (Ento, Canberra)"
Date:
The use of scsi raid over a new ide drive may not be as much as you think in
some situations. In situations where the writes will always be at the tail
of the file like WAL ide may perform really well if this is the only writes
that are occuring on the disk. The heads will alway be in the right place
for the write so latency is just going to be spin time not positioning. This
may be compared to all the writes occuring a big fast raid system where all
the system writes are occuring in different positions on the same disk set.
Seek time increases due to the multiplicity of writes occuring in different
area. Even scsi smarts with out of order writes can only make this process
optimal. Dedicating physical disk sets can eliminate some of the problems.
--
Ian Willis

-----Original Message-----
From: Alex Howansky [mailto:alex@wankwood.com]
Sent: Saturday, 17 March 2001 8:53 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Re: Re: Fast Inserts and Hardware Questions


> [Interesting stats]
>
> > Wow, does WAL make so much of a difference as to make my
> > two-generations-old, built-from-spare-parts scrap box faster than my
> > latest and greatest production server? Yikes!
>
> What RAID level are you running?  RAID-5 isn't always very
> fast on writes.  That and WAL might be what's up.

Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare RAID
equipped boxes sitting around, so I can't experiment with the different RAID
levels. Still, you'd think that even a "slow" RAID-5 configuration would be
faster than a $98 IDE drive...

> Have you tried 7.1b5 on the big box?

Not yet, I'm waiting for a quiet weekend. I'll post my stats when (if...) I
get
around to doing that.

--
Alex Howansky
Wankwood Associates
http://www.wankwood.com/


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

RE: Re: Re: Fast Inserts and Hardware Questions

From
"Willis, Ian (Ento, Canberra)"
Date:
On linux the software raid will use the MMX instructions, can't these be
done in parrell to most other operations? Memory bandwidth may be the real
killer


--
Ian Willis
Systems Administrator
Division of Entomology CSIRO
GPO Box 1700
Canberra ACT 2601
ph  02 6246 4391
fax 02 6246 4000


-----Original Message-----
From: Steve Wolfe [mailto:steve@iboats.com]
Sent: Saturday, 17 March 2001 10:48 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Re: Re: Fast Inserts and Hardware Questions


> It's not surprising to see software raid outperforming hardware raid (both
> of modern vintage).

  That's provided, of course, that you have a good amount of free CPU
cycles.  On machines that are otherwise heavily CPU-loaded, software raid
has been (in my experience) horrible.

steve



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl