Re: Which hardware ? - Mailing list pgsql-performance
From | Scott Marlowe |
---|---|
Subject | Re: Which hardware ? |
Date | |
Msg-id | dcc563d10806171007o719a2171o34e492f463a6c2f2@mail.gmail.com Whole thread Raw |
In response to | Re: Which hardware ? (Greg Smith <gsmith@gregsmith.com>) |
List | pgsql-performance |
On Tue, Jun 17, 2008 at 10:56 AM, Greg Smith <gsmith@gregsmith.com> wrote: > On Tue, 17 Jun 2008, Scott Marlowe wrote: > >> We had a reporting server with about 80G of data on a machine with 4G >> ram last place I worked, and it could take it a few extra seconds to >> hit the old data, but the SW RAID-10 on it made it much faster at >> reporting than it would have been with a single disk. > > I agree with your statement above, that query time could likely be dropped a > few seconds with a better disk setup. I just question whether that's > necessary given the performance target here. > > Right now the app is running on an underpowered Windows box and is returning > results in around 10s, on a sample data set that sounds like 1/8 of a year > worth of data (1/40 of the total). It is seemingly CPU bound with not > enough processor to handle concurrent queries being the source of the > worst-case behavior. The target is keeping that <30s on more powerful > hardware, with at least 6X as much processor power and a more efficient OS, > while using yearly partitions to keep the amount of data to juggle at once > under control. That seems reasonable to me, and while better disks would be > nice I don't see any evidence they're really needed here. This application > sounds a batch processing/reporting one where plus or minus a few seconds > doesn't have a lot of business value. I think you're making a big assumption that this is CPU bound. And it may be that when all the queries are operating on current data that it is. But as soon as a few ugly queries fire that need to read tens of gigs of data off the drives, then you'll start to switch to I/O bound and the system will slow a lot. We had a single drive box doing work on an 80G set that was just fine with the most recent bits. Until I ran a report that ran across the last year instead of the last two days, and took 2 hours to run. All the queries that had run really quickly on all the recent data suddenly were going from 1 or 2 seconds to 2 or 3 minutes. And I'd have to kill my reporting query. Moved it to the same exact hardware but with a 4 disc RAID-10 and the little queries stayed 1-2 seconds while th reporting queries were cut down by factors of about 4 to 10. RAID-1 will be somewhere between them I'd imagine. RAID-10 has an amazing ability to handle parallel accesses without falling over performance-wise. You're absolutely right though, we really need to know the value of fast performance here. If you're monitoring industrial systems you need fast enough response to spot problems before they escalate to disasters. If you're running aggregations of numbers used for filling out quarterly reports, not so much.
pgsql-performance by date: