Re: Which hardware ? - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Which hardware ?
Date
Msg-id dcc563d10806171007o719a2171o34e492f463a6c2f2@mail.gmail.com
Whole thread Raw
In response to Re: Which hardware ?  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
On Tue, Jun 17, 2008 at 10:56 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Tue, 17 Jun 2008, Scott Marlowe wrote:
>
>> We had a reporting server with about 80G of data on a machine with 4G
>> ram last place I worked, and it could take it a few extra seconds to
>> hit the old data, but the SW RAID-10 on it made it much faster at
>> reporting than it would have been with a single disk.
>
> I agree with your statement above, that query time could likely be dropped a
> few seconds with a better disk setup.  I just question whether that's
> necessary given the performance target here.
>
> Right now the app is running on an underpowered Windows box and is returning
> results in around 10s, on a sample data set that sounds like 1/8 of a year
> worth of data (1/40 of the total).  It is seemingly CPU bound with not
> enough processor to handle concurrent queries being the source of the
> worst-case behavior.  The target is keeping that <30s on more powerful
> hardware, with at least 6X as much processor power and a more efficient OS,
> while using yearly partitions to keep the amount of data to juggle at once
> under control.  That seems reasonable to me, and while better disks would be
> nice I don't see any evidence they're really needed here.  This application
> sounds a batch processing/reporting one where plus or minus a few seconds
> doesn't have a lot of business value.

I think you're making a big assumption that this is CPU bound.  And it
may be that when all the queries are operating on current data that it
is.  But as soon as a few ugly queries fire that need to read tens of
gigs of data off the drives, then you'll start to switch to I/O bound
and the system will slow a lot.

We had a single drive box doing work on an 80G set that was just fine
with the most recent bits.  Until I ran a report that ran across the
last year instead of the last two days, and took 2 hours to run.

All the queries that had run really quickly on all the recent data
suddenly were going from 1 or 2 seconds to 2 or 3 minutes.  And I'd
have to kill my reporting query.

Moved it to the same exact hardware but with a 4 disc RAID-10 and the
little queries stayed 1-2 seconds while th reporting queries were cut
down by factors of about 4 to 10.  RAID-1 will be somewhere between
them I'd imagine.  RAID-10 has an amazing ability to handle parallel
accesses without falling over performance-wise.

You're absolutely right though, we really need to know the value of
fast performance here.

If you're monitoring industrial systems you need fast enough response
to spot problems before they escalate to disasters.

If you're running aggregations of numbers used for filling out
quarterly reports, not so much.

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Which hardware ?
Next
From: "Lionel"
Date:
Subject: Re: Which hardware ?