Disk performance, was Re: tablespaces and DB administration - Mailing list pgsql-performance

From Marty Scholes
Subject Disk performance, was Re: tablespaces and DB administration
Date
Msg-id 40BD1A82.7010001@outputservices.com
Whole thread Raw
List pgsql-performance
This was a lively debate on what was faster, single spindles or RAID.

This is important, because I keep running into people who do not
understand the performance dynamics of a RDBMS like Oracle or Pg.

Pg and Oracle make a zillion tiny reads and writes and fsync()
regularly.  If your drive will copy a 4GB file at a sustained rate of 72
MB/s, that tells you nothing about how it will do with an RDBMS.

I will throw in my experience on RAID vs spindles.

With the RAID write cache disabled, a well balanced set of spindles will
kill a RAID system any day.

Enable the cache, and the RAID starts inching ahead.  My experience is
that no one can continuously keep I/O properly balanced across several
spindles on a production system.  Things change and the I/O mix changes.
  Then, the RAID is outperforming the spindles.  If you want to spend
the rest of your career constantly balancing I/O across spindles, then
do so.

For the rest of us, with a write cache, a hardware RAID wins hands down
over the long haul.

It might make sense to provide some sort of benchmarking tool for
various systems so that we can predict I/O performance.

Run the following code both on a hardware RAID and on a single spindle.

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>

void makefile(int fs)
{
         int i;
         char    buf[8192];
         int     ld;
         int     blocks=4096;
         int     pos;
         time_t  stim;
         time_t  etim;
         float   avg;

         unlink("dump.out");
         ld=open("dump.out", O_WRONLY | O_CREAT);

         printf("Writing %d blocks sequentially\n", blocks);
         time(&stim);
         for (i=0; i<blocks; i++) {
                 write(ld, buf, sizeof(buf));
                 if (fs) {
                         fsync(ld);
                         }
                 }
         time(&etim);
         avg = (blocks+0.0)/(etim-stim-0.0);
         printf("Took %d seconds, avg %f iops\n\n", etim-stim, avg);

         // purge the write cache
         fsync(ld);

         printf("Writing %d blocks (somewhat randomly)\n", blocks);
         time(&stim);
         for (i=0; i<blocks; i++) {
                 pos = (rand()%blocks)*sizeof(buf);
                 lseek(ld, pos, SEEK_SET);
                 write(ld, buf, sizeof(buf));
                 if (fs) {
                         fsync(ld);
                         }
                 }
         time(&etim);
         avg = (blocks+0.0)/(etim-stim-0.0);
         printf("Took %d seconds, avg %f iops\n\n", etim-stim, avg);

         close(ld);
         unlink("dump.out");
}

int main()
{
         printf("No fsync()\n");
         makefile(0);

         printf("With fsync()\n");
         makefile(1);

         return 0;
}


The first operation shows how well the OS write cache is doing.  The
second shows how poorly everything runs with fsync(), which is what Pg
and Oracle do.

My RAID produced the following, but was also running production when I
ran it:

No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.000000 iops

Writing 4096 blocks (somewhat randomly)
Took 4 seconds, avg 1024.000000 iops

With fsync()
Writing 4096 blocks sequentially
Took 40 seconds, avg 102.400002 iops

Writing 4096 blocks (somewhat randomly)
Took 66 seconds, avg 62.060608 iops

When I ran this on a decent fibre channel drive, I got:

No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.000000 iops

Writing 4096 blocks (somewhat randomly)
Took 7 seconds, avg 585.142883 iops

With fsync()
Writing 4096 blocks sequentially
Took 106 seconds, avg 38.641510 iops

Writing 4096 blocks (somewhat randomly)
Took 115 seconds, avg 35.617390 iops


You can see that the RAID array really helps out with small writes.



pgsql-performance by date:

Previous
From: Vitaly Belman
Date:
Subject: PostgreSQL on VMWare vs Windows vs CoLinux
Next
From: "V i s h a l Kashyap @ [Sai Hertz And Control Systems]"
Date:
Subject: PostgreSQL and Kernel 2.6.x