Thread: dbt2 performance

dbt2 performance

From
Yu-Ju Hong
Date:
Hi,

I have a couple of questions about dbt2 performance.

1. I tested dbt2+postgresql 8.4.2 on my server, but the NOTPM is around only 320~390 with 10 connections and 30 warehouses. Increasing the number of connections did not improve the throughput? The NOPTM number does not seem very high to me. Should I try more configurations to see if it can be improved? Are there any numbers I can compare with (NOPTM and response time)?

2. Moreover, the disk utilization was high and the "await" time from iostat is around 500 ms. Could disk I/O limit the overall throughput? The server has 2 SATA disks, one for system and postgresql and the other is dedicated to logging (pg_xlog). As far as I understand, modern database systems should be CPU-bound rather than I/O-bound, is it because I did not perform adequate performance tuning?

3. From "vmstat", the cpus spent around 72% of time idle, 25% waiting for I/O, and only 2~3% left doing real work. I was surprised that the cpu utilization was so low. Is that normal or could it be due to misconfiguration? In my opinion, even if disk I/O may have been stressed, 70% of idle time was still too high.


Below are some specs/configurations that I used. Any suggestion is welcome. Thanks!

server spec:
4 cores (2*Dual-Core AMD Opteron, 800MHz), 12GB ram
2 SATA disks, one for system and postgresql and the other is dedicated to logging (pg_xlog)

postgres configuration:
30 warehouses
256MB shared_buffer
768MB effective_cache_size
checkpoint_timeout 1hr (All my tests are within 10 minutes interval, so checkpointing should not interfere the performance)
I turned off fsync to see whether the performance could be improved.

Yu-Ju

Re: dbt2 performance

From
Greg Smith
Date:
Yu-Ju Hong wrote:
> 2. Moreover, the disk utilization was high and the "await" time from
> iostat is around 500 ms. Could disk I/O limit the overall throughput?
> The server has 2 SATA disks, one for system and postgresql and the
> other is dedicated to logging (pg_xlog). As far as I understand,
> modern database systems should be CPU-bound rather than I/O-bound, is
> it because I did not perform adequate performance tuning?

dbt2 is almost exclusively disk I/O bound once the data set gets big
enough.  There are some applications where most of the data fits in RAM
and therefore CPU performance is the limiter.  dbt2 is exactly the
opposite of such an application though, and the idea that "modern
database systems should be CPU bound" is not really true at all.  That's
only the case if the data you're operating on fits in RAM.  Otherwise,
databases are just as I/O bound as they've always been.  Main thing
that's changed is there's a lot more RAM in systems nowadays.

By the way:  a large increase in checkpoint_segments is the first thing
you should do.  If you check the database logs, they're probably filled
with complaints about it being too low.  32 would be a useful starting
value, going much higher for a test that's only 10 minutes long is
probably cheating.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: dbt2 performance

From
Yu-Ju Hong
Date:
Thanks for the reply.

On Thu, Feb 25, 2010 at 5:48 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Yu-Ju Hong wrote:
2. Moreover, the disk utilization was high and the "await" time from iostat is around 500 ms. Could disk I/O limit the overall throughput? The server has 2 SATA disks, one for system and postgresql and the other is dedicated to logging (pg_xlog). As far as I understand, modern database systems should be CPU-bound rather than I/O-bound, is it because I did not perform adequate performance tuning?

dbt2 is almost exclusively disk I/O bound once the data set gets big enough.  There are some applications where most of the data fits in RAM and therefore CPU performance is the limiter.  dbt2 is exactly the opposite of such an application though, and the idea that "modern database systems should be CPU bound" is not really true at all.  That's only the case if the data you're operating on fits in RAM.  Otherwise, databases are just as I/O bound as they've always been.  Main thing that's changed is there's a lot more RAM in systems nowadays.

In my test, there was almost no disk reads (mostly disk writes), so I assumed the size of the database didn't cause the performance bottleneck. Maybe I was wrong. If so, should I increase shared_buffer?

Assuming that dbt2 was limited by disk I/O in my experiments, do you think the numbers I got with my server configuration are reasonable?

Also, would you mind giving some examples where the applications are CPU bound? That could be useful information to me.

By the way:  a large increase in checkpoint_segments is the first thing you should do.  If you check the database logs, they're probably filled with complaints about it being too low.  32 would be a useful starting value, going much higher for a test that's only 10 minutes long is probably cheating.


I increased the checkpoint_segments to 10 when I ran the tests. I'll certainly increase it to 32 and give it a try.

 
--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Thanks,
Yu-Ju

Re: dbt2 performance

From
Robert Haas
Date:
On Thu, Feb 25, 2010 at 6:29 PM, Yu-Ju Hong <yuru.hong@gmail.com> wrote:
> Thanks for the reply.
>
> On Thu, Feb 25, 2010 at 5:48 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>
>> Yu-Ju Hong wrote:
>>>
>>> 2. Moreover, the disk utilization was high and the "await" time from
>>> iostat is around 500 ms. Could disk I/O limit the overall throughput? The
>>> server has 2 SATA disks, one for system and postgresql and the other is
>>> dedicated to logging (pg_xlog). As far as I understand, modern database
>>> systems should be CPU-bound rather than I/O-bound, is it because I did not
>>> perform adequate performance tuning?
>>
>> dbt2 is almost exclusively disk I/O bound once the data set gets big
>> enough.  There are some applications where most of the data fits in RAM and
>> therefore CPU performance is the limiter.  dbt2 is exactly the opposite of
>> such an application though, and the idea that "modern database systems
>> should be CPU bound" is not really true at all.  That's only the case if the
>> data you're operating on fits in RAM.  Otherwise, databases are just as I/O
>> bound as they've always been.  Main thing that's changed is there's a lot
>> more RAM in systems nowadays.
>
> In my test, there was almost no disk reads (mostly disk writes), so I
> assumed the size of the database didn't cause the performance bottleneck.
> Maybe I was wrong. If so, should I increase shared_buffer?

Well if you're writing a lot of stuff to disk you could easily be I/O limited.

> Assuming that dbt2 was limited by disk I/O in my experiments, do you think
> the numbers I got with my server configuration are reasonable?

Since you've provided no details on your hardware configuration I'm
not sure how anyone could express an educated opinion on this
(personally I wouldn't know anyway, but others here would).

> Also, would you mind giving some examples where the applications are CPU
> bound? That could be useful information to me.

You'll typically be CPU bound when you're not I/O bound - i.e. when
the data that your accessing is small enough to fit in memory.

...Robert