Re: RAID controllers for Postgresql on large setups - Mailing list pgsql-performance

From PFC
Subject Re: RAID controllers for Postgresql on large setups
Date
Msg-id op.ua31a20rcigqcu@apollo13.peufeu.com
Whole thread Raw
In response to Re: RAID controllers for Postgresql on large setups  (Francisco Reyes <lists@stringsutils.com>)
List pgsql-performance
>
>>     You say that like you don't mind having PCI in a server whose job is
>> to  perform massive query over large data sets.
>
> I am in my 4th week at a new job. Trying to figure what I am working
> with.

    LOOL, ok, hehe, not exactly the time to have a "let's change everything"
fit ;)

> From what I see I will likely get as much improvement from new hardware
> as from re-doing some of the database design. Can't get everything done
> at once, not to mention I have to redo one machine sooner rather than
> later so I need to prioritize.
>
>> In fact for bulk IO a box with 2 SATA drives would be just as fast as
>> your monster RAID, lol.
>
> I am working on setting up a standard test based on the type of
> operations that the company does. This will give me a beter idea.
> Specially I will work with the developers to make sure the queries I
> create for the benchmark are representative of the workload.

    watching vmstat (or iostat) while running a very big seq scan query will
give you information about the reading speed of your drives.
    Same for writes, during one of your big updates, watch vmstat, you'll
know if you are CPU bound or IO bound...

- one core at 100% -> CPU bound
- lots of free CPU but lots of iowait -> disk bound
    - disk throughput decent (in your setup, 100 MB/s) -> PCI bus saturation
    - disk throughput miserable (< 10 MB/s) -> random IO bound (either random
reads or fsync() or random writes depending on the case)

> In your opinion if we get a new machine with PCI-e, at how many spindles
> will the  SCSI random access superiority start to be less notable?
> Specially given the low number of connections we usually have running
> against these machines.

    Sorting of random reads depends on multiple concurrent requests (which
you don't have). Sorting of random writes does not depend on concurrent
requests so, you'll benefit on your updates. About SCSI vs SATA vs number
of spindles : can't answer this one.

> We are using one single SQL statement.

    OK, so forget about fsync penalty, but do tune your checkpoints so they
are not happening all the time... and bgwriter etc.




pgsql-performance by date:

Previous
From: James Mansion
Date:
Subject: Re: RAID controllers for Postgresql on large setups
Next
From: fernando castano
Date:
Subject: can I move sort to first outer join ?