Re: slow query - will CLUSTER help? - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: slow query - will CLUSTER help?
Date
Msg-id 52B338A6.1010309@optionshouse.com
Whole thread Raw
In response to slow query - will CLUSTER help?  (Sev Zaslavsky <sevzas@gmail.com>)
List pgsql-performance
On 12/12/2013 11:30 AM, Sev Zaslavsky wrote:

> _First question_ is: Does loading 24Gb of data in 21 sec seem "about
>  right" (hardware specs at bottom of email)?

That's actually pretty good. 24GB is a lot of data to process.

> _Second question_: Is it possible to tell postgres to physically store
> the data in such a way that it parallels an index?

Yes and no. Unlike Sybase or SQL Server, CLUSTERed indexes in PostgreSQL
are not maintained in the index pages. When you CLUSTER a table by a
particular index, it's only sorted in that order initially. New inserts
and updates no longer honor that ordering.

However, since you said you're inserting data by date, your data should
already be naturally sorted. Your query plan also looked right to me.
You may have some excess expectations for your hardware, though.

A RAID-1 of 15K drives can deliver, at most, 1000 reads per second
depending on your drives and the controller cache. That's a very
optimistic assumption. The plan said it fetched 1631 rows from the
index. In order to weed out dead pages, it verifies data by checking the
data pages, which is another 1631 fetches at minimum. All by itself,
that's about three seconds of IO from a cold cache.

I agree that 21 seconds is rather high for this workload, but Windows
handles data caching and data elevator algorithms much differently than
Linux, so I can't say if anything else is going on.

> Top of the line HP DL380 G7 server with 32 Gb Ram, P410i RAID, 10K
> SAS drives in Raid-1 config. Wal on separate Raid-1 volume with 15K
> SAS drives.The only unusual thing here is that I'm running on Windows
> Server 2008 R2.

In any case, you should really consider upgrading both your hardware,
and switching your DB server to Linux. If you are handling millions of
rows on a regular basis, 32GB will not be sufficient for longer than a
few months. Eventually your data will no longer fit in memory, and
you'll see more and more disk-related delays.

Further, a RAID1 is not good enough for that kind of data volume. If you
cant afford a RAID1+0 consisting of several spindles, NVRAM-based
solution (SSD or PCIe card), or a SAN, you simply do not have enough
IOPS to supply a fast database of any description.

I only suggest Linux as your OS because that's the primary use case.
Most testing, development, and users have that setup. You're much more
likely to get meaningful feedback if you follow the herd. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Optimizing a query
Next
From: Jeff Janes
Date:
Subject: Re: query plan not optimal