Thread: Ideal disk setup for Postgresql 7.4?

Ideal disk setup for Postgresql 7.4?

From
Steve Poe
Date:
I help manage an animal hospital of 100-employees Linux servers. I am
new to database setup and tuning, I was hoping I could get some
direction on a setting up drive array we're considering moving our
database to.

They're currently on a two-disk Adaptec RAID1 with Postgresql 7.4.2.

The drive array is a 7-disk fibre channel on a Qlogic 2100 controller. I
am currently testing RAID5 (sw).

The main reason of moving to a drive array is the high level of context
switches we get during the day (>30K for 20 mins per hour). The OS and
database exist on the same disk but seperate parition (which probably
makes little difference)


additional info:

On average, 30-35 vets/doctors are connecting to the database at any
time from 7am - 7pm. The database is very active for the small company.

Server Info:
Centos 3.3 (RHEL 3.x equivelent)
4GB RAM
Adaptec 2100S RAID
Qlogic QLA2100 Fibre

Any feedback/suggestions are greatly appreciated.

Thanks.

Steve Poe



Re: Ideal disk setup for Postgresql 7.4?

From
Josh Berkus
Date:
Steve,

> I help manage an animal hospital of 100-employees Linux servers. I am
> new to database setup and tuning, I was hoping I could get some
> direction on a setting up drive array we're considering moving our
> database to.

Check what I have to say at http://www.powerpostgresql.com/PerfList

> They're currently on a two-disk Adaptec RAID1 with Postgresql 7.4.2.

And you've not upgraded to 7.4.6 because .... ?

> The drive array is a 7-disk fibre channel on a Qlogic 2100 controller. I
> am currently testing RAID5 (sw).

In general, RAID 5 is not so great for databases.  See the article for more.

> The main reason of moving to a drive array is the high level of context
> switches we get during the day (>30K for 20 mins per hour). The OS and
> database exist on the same disk but seperate parition (which probably
> makes little difference)

Unfortunately, the context switches are probably due to a known issue in
PostgreSQL, and changing the drive array won't help this issue (it may help
other issues though).    Search the archives of this list, and pgsql-hackers,
for "Context Switch Bug".

For the CS bug, the only workaround right now is to avoid the query structures
that trigger it.

> Server Info:
> Centos 3.3 (RHEL 3.x equivelent)
> 4GB RAM
> Adaptec 2100S RAID
> Qlogic QLA2100 Fibre

CPU?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Ideal disk setup for Postgresql 7.4?

From
Steve Poe
Date:
Josh,

Thanks for your feedback, I appreciate it.

>Check what I have to say at http://www.powerpostgresql.com/PerfList
>
>
Will do.

>>They're currently on a two-disk Adaptec RAID1 with Postgresql 7.4.2.
>>
>>
>
>And you've not upgraded to 7.4.6 because .... ?
>
>
>
Because the proprietary application running the business has not
certified on it.  Unfortunately, I am at the mercy of their support in
case something goes wrong.

>>The drive array is a 7-disk fibre channel on a Qlogic 2100 controller. I
>>am currently testing RAID5 (sw).
>>
>>
>
>In general, RAID 5 is not so great for databases.  See the article for more.
>
>
>
Okay, thanks. Even with 7-disks? I trust that. So, RAID 1+0 (sw) is
probably the best option. I've run sw RAID personally for years without
issue. I am a bit hesitant in doing sw RAID for a production server for
a database --- probably because its not my server. Any thoughts on sw
RAID for Postgresql?

>>The main reason of moving to a drive array is the high level of context
>>switches we get during the day (>30K for 20 mins per hour). The OS and
>>database exist on the same disk but seperate parition (which probably
>>makes little difference)
>>
>>
>
>Unfortunately, the context switches are probably due to a known issue in
>PostgreSQL, and changing the drive array won't help this issue (it may help
>other issues though).    Search the archives of this list, and pgsql-hackers,
>for "Context Switch Bug".
>
>For the CS bug, the only workaround right now is to avoid the query structures
>that trigger it.
>
>
Okay. Darn. While I don't write the queries for the application, I do
interact with the company frequently. Their considering moving the
queries into the database with PL/pgSQL. Currently their queries are
done through ProvIV development using ODBC. Will context switching be
minimized here by using PL/pgSQL?

>
>
>>Server Info:
>>Centos 3.3 (RHEL 3.x equivelent)
>>4GB RAM
>>Adaptec 2100S RAID
>>Qlogic QLA2100 Fibre
>>
>>
>
>CPU?
>
>
Dual Xeon 2.8 CPUs with HT turned off.


Thanks again.

Steve Poe

Re: Ideal disk setup for Postgresql 7.4?

From
Josh Berkus
Date:
Steve,

> Because the proprietary application running the business has not
> certified on it.  Unfortunately, I am at the mercy of their support in
> case something goes wrong.

FWIW, 7.4.6 is a binary, drop-in place upgrade for 7.4.2.  And 7.4.2 has known
bugs.   However, I understand your situation.

> Okay, thanks. Even with 7-disks? I trust that.

Well, it's less bad with 7 disks than it is with 3, certainly.   However,there
is an obvious and quick gain to be had by splitting off the WAL logs onto
their own disk resource ... up to 14%+ performance in some applications.

> So, RAID 1+0 (sw) is
> probably the best option. I've run sw RAID personally for years without
> issue. I am a bit hesitant in doing sw RAID for a production server for
> a database --- probably because its not my server. Any thoughts on sw
> RAID for Postgresql?

Yes.   See my article for one.  In generaly, SW RAID on BSD or Linux works
well for PostgreSQL ... UNLESS your machine is already CPU-bound, in which
case it's a bad idea.   If you're hitting the CS bug, it's definitely a bad
idea, because the SW RAID will increase context switching.

So if your choice, on your system, is between sw RAID 10, and hw RAID 5, and
you're having excessive CSes, I'd stick with the HW RAID.

> Okay. Darn. While I don't write the queries for the application, I do
> interact with the company frequently. Their considering moving the
> queries into the database with PL/pgSQL. Currently their queries are
> done through ProvIV development using ODBC. Will context switching be
> minimized here by using PL/pgSQL?

Won't make a difference, actually.   Should improve performance in other ways,
though, by reducing round-trip time on procedures.  Feel free to recommend
the company to this list.

> Dual Xeon 2.8 CPUs with HT turned off.

Yeah, thought it was a Xeon.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Ideal disk setup for Postgresql 7.4?

From
Steve Poe
Date:
>FWIW, 7.4.6 is a binary, drop-in place upgrade for 7.4.2.  And 7.4.2 has known
>bugs.   However, I understand your situation.
>
>
>
As soon as we get the go-ahead, I will upgrade. I think the company is
actually looking towards 8.0 certification.

>>Okay, thanks. Even with 7-disks? I trust that.
>>
>>
>
>Well, it's less bad with 7 disks than it is with 3, certainly.   However,there
>is an obvious and quick gain to be had by splitting off the WAL logs onto
>their own disk resource ... up to 14%+ performance in some applications.
>
>
>
Pardon my ignorance, but the WAL logs are comprised of pg_xlog and
pg_clog? Their own disk resource, but not within the same channel of
disks the database is on, right?

>>So, RAID 1+0 (sw) is
>>probably the best option. I've run sw RAID personally for years without
>>issue. I am a bit hesitant in doing sw RAID for a production server for
>>a database --- probably because its not my server. Any thoughts on sw
>>RAID for Postgresql?
>>
>>
>
>Yes.   See my article for one.  In generaly, SW RAID on BSD or Linux works
>well for PostgreSQL ... UNLESS your machine is already CPU-bound, in which
>case it's a bad idea.   If you're hitting the CS bug, it's definitely a bad
>idea, because the SW RAID will increase context switching.
>
>So if your choice, on your system, is between sw RAID 10, and hw RAID 5, and
>you're having excessive CSes, I'd stick with the HW RAID.
>
>
>
Okay. InCPU-bound servers, use hw RAID. Any hw raids to avoid?

>>Okay. Darn. While I don't write the queries for the application, I do
>>interact with the company frequently. Their considering moving the
>>queries into the database with PL/pgSQL. Currently their queries are
>>done through ProvIV development using ODBC. Will context switching be
>>minimized here by using PL/pgSQL?
>>
>>
>
>Won't make a difference, actually.   Should improve performance in other ways,
>though, by reducing round-trip time on procedures.  Feel free to recommend
>the company to this list.
>
>
>
I think their too busy to monitor/watch this list. Not a put-down to
them, but I have to do my own leg work to help decide what we're going
to do.

>>Dual Xeon 2.8 CPUs with HT turned off.
>>
>>
>
>Yeah, thought it was a Xeon.
>
>
>
If we went with a single CPU, like Athlon/Opertron64,  would CS
storming  go away?


Thanks.

Steve Poe

Re: Ideal disk setup for Postgresql 7.4?

From
Tom Lane
Date:
Steve Poe <spoe@sfnet.cc> writes:
>> Well, it's less bad with 7 disks than it is with 3, certainly.   However,there
>> is an obvious and quick gain to be had by splitting off the WAL logs onto
>> their own disk resource ... up to 14%+ performance in some applications.
>>
> Pardon my ignorance, but the WAL logs are comprised of pg_xlog and
> pg_clog? Their own disk resource, but not within the same channel of
> disks the database is on, right?

Just pg_xlog.  Ideally you don't want any other traffic on the physical
disk pg_xlog is on --- the idea is that the write heads need to stay
over the current xlog file.  I don't think it hurts too much to share a
controller channel though.

            regards, tom lane

Re: Ideal disk setup for Postgresql 7.4?

From
Josh Berkus
Date:
Steve,

> Okay. InCPU-bound servers, use hw RAID. Any hw raids to avoid?

Well, the list of ones which are good is shorter: pretty much LSI and 3Ware
(for SATA).  You can suffer with Adaptec if you have to.

> If we went with a single CPU, like Athlon/Opertron64,  would CS
> storming  go away?

Yes.  And then you might be able to use SW Raid.   Of course, you may lose
performance in other areas with the 1 processor.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Ideal disk setup for Postgresql 7.4?

From
Steve Poe
Date:
Josh,

Thanks again for the feedback.

>Well, the list of ones which are good is shorter: pretty much LSI and 3Ware
>(for SATA).  You can suffer with Adaptec if you have to.
>
>
>
Good. We don't plan on using IDE, but I've pondered Firewire.

>>If we went with a single CPU, like Athlon/Opertron64,  would CS
>>storming  go away?
>>
>>
>
>Yes.  And then you might be able to use SW Raid.   Of course, you may lose
>performance in other areas with the 1 processor.
>
>
>
Good to know.

You mentioned earlier that to get around the CS bug, avoid the query
structures which trigger it. Dumb question: How do you isolate this?

Is there a way in a Postgresql query to only look at 1 processor only in
a dual-CPU setup?

FYI:Our company has an near-identical server (SCSI and IDE)for testing
purposes of the animal hopsital application that is used. If there are
any test patches to Postgresql to deal with CS storm, we can test it out
if this is possible.

Any likelyhood this CS storm will be understood in the next couple months?

Thanks.

Steve Poe

Re: Ideal disk setup for Postgresql 7.4?

From
Josh Berkus
Date:
Steve,

> You mentioned earlier that to get around the CS bug, avoid the query
> structures which trigger it. Dumb question: How do you isolate this?

In real terms, it's generally triggered by a query joining against a very
large table requiring a seq scan.

You can probably find the "bad queries" just by using PQA, and looking for
select, delete and update queries which last over 60 seconds.

> Is there a way in a Postgresql query to only look at 1 processor only in
> a dual-CPU setup?

That would be an OS question.    I personally can't see how.

> Any likelyhood this CS storm will be understood in the next couple months?

It's well understood.   See the archives of this list.   The problem is that
implementing the solution is very, very hard -- 100+ hours from a top-notch
programmer.  I'm still hoping to find a corporate sponsor for the issue ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Ideal disk setup for Postgresql 7.4?

From
"Steinar H. Gunderson"
Date:
On Thu, Jan 27, 2005 at 08:56:03AM -0800, Josh Berkus wrote:
> It's well understood.   See the archives of this list.   The problem is that
> implementing the solution is very, very hard -- 100+ hours from a top-notch
> programmer.  I'm still hoping to find a corporate sponsor for the issue ...

Hm, I must have missed something -- all I read earlier (and in the archives)
indicated that it was _not_ well understood... Care to give URLs giving the
answer away?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Ideal disk setup for Postgresql 7.4?

From
"Merlin Moncure"
Date:
Steve wrote:
> Okay. Darn. While I don't write the queries for the application, I do
> interact with the company frequently. Their considering moving the
> queries into the database with PL/pgSQL. Currently their queries are
> done through ProvIV development using ODBC. Will context switching be
> minimized here by using PL/pgSQL?

Yes, yes, yes!  :-)
Or maybe, depending on what you are doing.  Moving application code into
the database has the potential to supercharge your system depending on
how it is structured.

Our company has done very detailed performance measurements on the
subject.  We converted our COBOL based ERP to PostgreSQL by writing a
libpq wrapper to allow our COBOL runtime to read/write queries to the
database.  If you don't know much about COBOL, let's just say it has a
'one record at a time' mentality. (read a record...do something...read a
record...do something...).  It is these cases that really want to be
moved into the server.

Here are some rough performance numbers, but they are a pretty good
reflection why pl/pgsql is so good.  The procedure in question here will
build a bill of materials for a fairly complex product assembly in an
order entry system.  Since all users hate waiting for things, this is a
performance sensitive operation.

The baseline time is the COBOL app's pre-conversion-to-sql time to build
the BOM.
BOM-ISAM: 8 seconds

Using SQL queries instead of ISAM statements, our time suddenly leaps to
BOM-SQL: 20 seocnds.

A long, long, time ago, we implemented prepared statements into our
driver using the parameterized interface.
BOM-SQL (prepared): 10 seconds

We converted the COBOL code to pl/pgsql.  The logic is the same, however
easy record aggregations were taken via refcursors were made where
possible.
BOM-PL/PGSQL: 1 second

Even the commercial COBOL vendor's file system driver can't beat that
time when the application is running on the server.  Also, pl/pgsql
routines are not latency sensitive, so they can be run over the internet
etc.  In addition, having the server execute the business logic actually
*reduced* the cpu load on the server by greatly reducing the time the
server spent switching back and forth from network/processing.

Of course, ours is an extreme case but IMO, the benefits are real.

Merlin








Re: Ideal disk setup for Postgresql 7.4?

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Steve,
>
> > I help manage an animal hospital of 100-employees Linux servers. I am
> > new to database setup and tuning, I was hoping I could get some
> > direction on a setting up drive array we're considering moving our
> > database to.
>
> Check what I have to say at http://www.powerpostgresql.com/PerfList
>

Added to our FAQ.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073