Thread: dbt-2 tuning results with postgresql-8.3.5

dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
Hi all,

So after a long hiatus after running this OLTP workload at the OSDL,
many of you know the community has had some equipment donated by HP: a
DL380 G5 and an MSA70 disk array.  We are currently using the hardware
to do some tuning exercises to show the effects of various GUC
parameters.  I wanted to share what I've started with for input for
what is realistic to tune an OLTP database on a single large LUN.  The
initial goal is to show how much can (or can't) be tuned on an OLTP
type workload with just database and kernel parameters before
physically partitioning the database.  I hope this is actually a
useful exercise (it was certainly helped get the kit updated a little
bit.)

To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
are using a 1000 warehouse database, which amounts to about 100GB of
raw text data. The DL380 G5 is an 8 core Xeon E5405 with 32GB of
memory.  The MSA70 is a 25-disk 15,000 RPM SAS array, currently
configured as a 25-disk RAID-0 array.  More specific hardware details
can be found here:

http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide#Hardware_Details

So first task is to show the confidence of the results, here are a
link to a few repeated runs using all default GUC values except the
number of connections (250):

http://pugs.postgresql.org/node/502

Here are links to how the throughput changes when increasing shared_buffers:

http://pugs.postgresql.org/node/505

And another series of tests to show how throughput changes when
checkpoint_segments are increased:

http://pugs.postgresql.org/node/503

The links go to a graphical summary and raw data.  Note that the
maximum theoretical throughput at this scale factor is approximately
12000 notpm.

My first glance takes tells me that the system performance is quite
erratic when increasing the shared_buffers.  I'm also not what to
gather from increasing the checkpoint_segments.  Is it simply that the
more checkpoint segments you have, the more time the database spends
fsyncing when at a checkpoint?

Moving forward, what other parameters (or combinations of) do people
feel would be valuable to illustrate with this workload?

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Gregory Stark
Date:
"Mark Wong" <markwkm@gmail.com> writes:

> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
> are using a 1000 warehouse database, which amounts to about 100GB of
> raw text data.

Really? Do you get conforming results with 1,000 warehouses? What's the 95th
percentile response time?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: dbt-2 tuning results with postgresql-8.3.5

From
Mark Wong
Date:
On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote:

> "Mark Wong" <markwkm@gmail.com> writes:
>
>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
>> are using a 1000 warehouse database, which amounts to about 100GB of
>> raw text data.
>
> Really? Do you get conforming results with 1,000 warehouses? What's
> the 95th
> percentile response time?

No, the results are not conforming.  You and others have pointed that
out already.  The 95th percentile response time are calculated on each
page of the previous links.

I find your questions a little odd for the input I'm asking for.  Are
you under the impression we are trying to publish benchmarking
results?  Perhaps this is a simple misunderstanding?

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Gregory Stark
Date:
Mark Wong <markwkm@gmail.com> writes:

> On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote:
>
>> "Mark Wong" <markwkm@gmail.com> writes:
>>
>>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
>>> are using a 1000 warehouse database, which amounts to about 100GB of
>>> raw text data.
>>
>> Really? Do you get conforming results with 1,000 warehouses? What's  the 95th
>> percentile response time?
>
> No, the results are not conforming.  You and others have pointed that  out
> already.  The 95th percentile response time are calculated on each  page of the
> previous links.

Where exactly? Maybe I'm blind but I don't see them.

>
> I find your questions a little odd for the input I'm asking for.  Are  you
> under the impression we are trying to publish benchmarking  results?  Perhaps
> this is a simple misunderstanding?

Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and
larger scale factors until you find out the largest scale factor you can get a
conformant result at. In other words the scale factor is an output, not an
input variable.

You're using TPC-C just as an example workload and looking to see how to
maximize the TPM for a given scale factor. I guess there's nothing wrong with
that as long as everyone realizes it's not a TPC-C benchmark.

Except that if the 95th percentile response times are well above a second I
have to wonder whether the situation reflects an actual production OLTP system
well. It implies there are so many concurrent sessions that any given query is
being context switched out for seconds at a time.

I have to imagine that a real production system would consider the system
overloaded as soon as queries start taking significantly longer than they take
on an unloaded system. People monitor the service wait times and queue depths
for i/o systems closely and having several seconds of wait time is a highly
abnormal situation.

I'm not sure how bad that is for the benchmarks. The only effect that comes to
mind is that it might exaggerate the effects of some i/o intensive operations
that under normal conditions might not cause any noticeable impact like wal
log file switches or even checkpoints.

If you have a good i/o controller it might confuse your results a bit when
you're comparing random and sequential i/o because the controller might be
able to sort requests by physical position better than in a typical oltp
environment where the wait queues are too short to effectively do that.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Sun, Dec 21, 2008 at 10:56 PM, Gregory Stark <stark@enterprisedb.com> wrote:
> Mark Wong <markwkm@gmail.com> writes:
>
>> On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote:
>>
>>> "Mark Wong" <markwkm@gmail.com> writes:
>>>
>>>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
>>>> are using a 1000 warehouse database, which amounts to about 100GB of
>>>> raw text data.
>>>
>>> Really? Do you get conforming results with 1,000 warehouses? What's  the 95th
>>> percentile response time?
>>
>> No, the results are not conforming.  You and others have pointed that  out
>> already.  The 95th percentile response time are calculated on each  page of the
>> previous links.
>
> Where exactly? Maybe I'm blind but I don't see them.

Here's an example:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/

The links on the blog entries should be pointing to their respective
reports.  I spot checked a few and it seems I got some right.  I
probably didn't make it clear you needed to click on the results to
see the reports.

>> I find your questions a little odd for the input I'm asking for.  Are  you
>> under the impression we are trying to publish benchmarking  results?  Perhaps
>> this is a simple misunderstanding?
>
> Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and
> larger scale factors until you find out the largest scale factor you can get a
> conformant result at. In other words the scale factor is an output, not an
> input variable.
>
> You're using TPC-C just as an example workload and looking to see how to
> maximize the TPM for a given scale factor. I guess there's nothing wrong with
> that as long as everyone realizes it's not a TPC-C benchmark.

Perhaps, but we're not trying to run a TPC-C benchmark.  We're trying
to illustrate how performance changes with an understood OLTP
workload.  The purpose is to show how the system bahaves more so than
what the maximum transactions are.  We try to advertise the kit the
and work for self learning, we never try to pass dbt-2 off as a
benchmarking kit.

> Except that if the 95th percentile response times are well above a second I
> have to wonder whether the situation reflects an actual production OLTP system
> well. It implies there are so many concurrent sessions that any given query is
> being context switched out for seconds at a time.
>
> I have to imagine that a real production system would consider the system
> overloaded as soon as queries start taking significantly longer than they take
> on an unloaded system. People monitor the service wait times and queue depths
> for i/o systems closely and having several seconds of wait time is a highly
> abnormal situation.

We attempt to illustrate the response times on the reports.  For
example, there is a histogram (drawn as a scatter plot) illustrating
the number of transactions vs. the response time for each transaction.
 This is for the New Order transaction:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/dist_n.png

We also plot the response time for a transaction vs the elapsed time
(also as a scatter plot).  Again, this is for the New Order
transaction:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/rt_n.png

> I'm not sure how bad that is for the benchmarks. The only effect that comes to
> mind is that it might exaggerate the effects of some i/o intensive operations
> that under normal conditions might not cause any noticeable impact like wal
> log file switches or even checkpoints.

I'm not sure I'm following.  Is this something than can be shown by
any stats collection or profiling?  This vaguely reminds me of the the
significant spikes in system time (and dips everywhere else) when the
operating system is fsyncing during a checkpoint that we've always
observed when running this in the past.

> If you have a good i/o controller it might confuse your results a bit when
> you're comparing random and sequential i/o because the controller might be
> able to sort requests by physical position better than in a typical oltp
> environment where the wait queues are too short to effectively do that.

Thanks for the input.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Greg Smith
Date:
On Sat, 20 Dec 2008, Mark Wong wrote:

> Here are links to how the throughput changes when increasing
> shared_buffers: http://pugs.postgresql.org/node/505 My first glance
> takes tells me that the system performance is quite erratic when
> increasing the shared_buffers.

If you smooth that curve out a bit, you have to throw out the 22528MB
figure as meaningless--particularly since it's way too close to the cliff
where performance dives hard.  The sweet spot looks to me like 11264MB to
17408MB.  I'd say 14336MB is the best performing setting that's in the
middle of a stable area.

> And another series of tests to show how throughput changes when
> checkpoint_segments are increased: http://pugs.postgresql.org/node/503
> I'm also not what to gather from increasing the checkpoint_segments.

What was shared_buffers set to here?  Those two settings are not
completely independent, for example at a tiny buffer size it's not as
obvious there's a win in spreading the checkpoints out more.  It's
actually a 3-D graph, with shared_buffers and checkpoint_segments as two
axes and the throughput as the Z value.

Since that's quite time consuming to map out in its entirety, the way I'd
suggest navigating the territory more efficiently is to ignore the
defaults altogether.  Start with a configuration that someone familiar
with tuning the database would pick for this hardware:  8192MB for
shared_buffers and 100 checkpoint segments would be a reasonable base
point.  Run the same tests you did here, but with the value you're not
changing set to those much larger values rather than the database
defaults, and then I think you'd end with something more interesting.
Also, I think the checkpoint_segments values >500 are a bit much, given
what level of recovery time would come with a crash at that setting.
Smaller steps from a smaller range would be better there I think.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: dbt-2 tuning results with postgresql-8.3.5

From
Gregory Stark
Date:
"Mark Wong" <markwkm@gmail.com> writes:

>> I'm not sure how bad that is for the benchmarks. The only effect that comes to
>> mind is that it might exaggerate the effects of some i/o intensive operations
>> that under normal conditions might not cause any noticeable impact like wal
>> log file switches or even checkpoints.
>
> I'm not sure I'm following.

All I'm saying is that the performance characteristics won't be the same when
the service wait times are 1-10 seconds rather than the 20-30ms at which alarm
bells would start to ring on a real production system.

I'm not exactly sure what changes it might make though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: dbt-2 tuning results with postgresql-8.3.5

From
Gregory Stark
Date:
"Mark Wong" <markwkm@gmail.com> writes:

> Thanks for the input.

In a more constructive vein:

1) autovacuum doesn't seem to be properly tracked. It looks like you're just
   tracking the autovacuum process and not the actual vacuum subprocesses
   which it spawns.

2) The response time graphs would be more informative if you excluded the
   ramp-up portion of the test. As it is there are big spikes at the low end
   but it's not clear whether they're really part of the curve or due to
   ramp-up. This is especially visible in the stock-level graph where it
   throws off the whole y scale.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Kevin Grittner"
Date:
>>> "Mark Wong" <markwkm@gmail.com> wrote:

> The DL380 G5 is an 8 core Xeon E5405 with 32GB of
> memory.  The MSA70 is a 25-disk 15,000 RPM SAS array, currently
> configured as a 25-disk RAID-0 array.

> number of connections (250):

> Moving forward, what other parameters (or combinations of) do people
> feel would be valuable to illustrate with this workload?

To configure PostgreSQL for OLTP on that hardware, I would strongly
recommend the use of a connection pool which queues requests above
some limit on concurrent queries.  My guess is that you'll see best
results with a limit somewhere aound 40, based on my tests indicating
that performance drops off above (cpucount * 2) + spindlecount.

I wouldn't consider tests of the other parameters as being very useful
before tuning this.  This is more or less equivalent to the "engines"
configuration in Sybase, for example.

-Kevin

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Mon, Dec 22, 2008 at 12:59 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Sat, 20 Dec 2008, Mark Wong wrote:
>
>> Here are links to how the throughput changes when increasing
>> shared_buffers: http://pugs.postgresql.org/node/505 My first glance takes
>> tells me that the system performance is quite erratic when increasing the
>> shared_buffers.
>
> If you smooth that curve out a bit, you have to throw out the 22528MB figure
> as meaningless--particularly since it's way too close to the cliff where
> performance dives hard.  The sweet spot looks to me like 11264MB to 17408MB.
>  I'd say 14336MB is the best performing setting that's in the middle of a
> stable area.
>
>> And another series of tests to show how throughput changes when
>> checkpoint_segments are increased: http://pugs.postgresql.org/node/503 I'm
>> also not what to gather from increasing the checkpoint_segments.
>
> What was shared_buffers set to here?  Those two settings are not completely
> independent, for example at a tiny buffer size it's not as obvious there's a
> win in spreading the checkpoints out more.  It's actually a 3-D graph, with
> shared_buffers and checkpoint_segments as two axes and the throughput as the
> Z value.

The shared_buffers are the default, 24MB.  The database parameters are
saved, probably unclearly, here's an example link:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/param.out

> Since that's quite time consuming to map out in its entirety, the way I'd
> suggest navigating the territory more efficiently is to ignore the defaults
> altogether.  Start with a configuration that someone familiar with tuning
> the database would pick for this hardware:  8192MB for shared_buffers and
> 100 checkpoint segments would be a reasonable base point.  Run the same
> tests you did here, but with the value you're not changing set to those much
> larger values rather than the database defaults, and then I think you'd end
> with something more interesting. Also, I think the checkpoint_segments
> values >500 are a bit much, given what level of recovery time would come
> with a crash at that setting. Smaller steps from a smaller range would be
> better there I think.

I should probably run your pgtune script, huh?

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Mon, Dec 22, 2008 at 2:56 AM, Gregory Stark <stark@enterprisedb.com> wrote:
> "Mark Wong" <markwkm@gmail.com> writes:
>
>> Thanks for the input.
>
> In a more constructive vein:
>
> 1) autovacuum doesn't seem to be properly tracked. It looks like you're just
>   tracking the autovacuum process and not the actual vacuum subprocesses
>   which it spawns.

Hrm, tracking just the launcher process certainly doesn't help.  Are
the spawned processed short lived?  I take a snapshot of
/proc/<pid>/io data every 60 seconds.  The only thing I see named
autovacuum is the launcher process.  Or perhaps I can't read?  Here is
the raw data of the /proc/<pid>/io captures:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/iopp.out

> 2) The response time graphs would be more informative if you excluded the
>   ramp-up portion of the test. As it is there are big spikes at the low end
>   but it's not clear whether they're really part of the curve or due to
>   ramp-up. This is especially visible in the stock-level graph where it
>   throws off the whole y scale.

Ok, we'll take note and see what we can do.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Mon, Dec 22, 2008 at 7:27 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>>>> "Mark Wong" <markwkm@gmail.com> wrote:
>
>> The DL380 G5 is an 8 core Xeon E5405 with 32GB of
>> memory.  The MSA70 is a 25-disk 15,000 RPM SAS array, currently
>> configured as a 25-disk RAID-0 array.
>
>> number of connections (250):
>
>> Moving forward, what other parameters (or combinations of) do people
>> feel would be valuable to illustrate with this workload?
>
> To configure PostgreSQL for OLTP on that hardware, I would strongly
> recommend the use of a connection pool which queues requests above
> some limit on concurrent queries.  My guess is that you'll see best
> results with a limit somewhere aound 40, based on my tests indicating
> that performance drops off above (cpucount * 2) + spindlecount.

Yeah, we are using a homegrown connection concentrator as part of the
test kit, but it's not very intelligent.

> I wouldn't consider tests of the other parameters as being very useful
> before tuning this.  This is more or less equivalent to the "engines"
> configuration in Sybase, for example.

Right, I have the database configured for 250 connections but I'm
using 200 of them.  I'm pretty sure for this scale factor 200 is more
than enough.  Nevertheless I should go through the exercise.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Greg Smith
Date:
On Mon, 22 Dec 2008, Mark Wong wrote:

> The shared_buffers are the default, 24MB.  The database parameters are
> saved, probably unclearly, here's an example link:
>
> http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/param.out

That's a bit painful to slog through to find what was changed from the
defaults.  How about saving the output from this query instead, or in
addition to the version sorted by name:

select name,setting,source,short_desc from pg_settings order by
source,name;

Makes it easier to ignore everything that isn't set.

> I should probably run your pgtune script, huh?

That's basically where the suggestions for center points I made came from.
The only other thing that does that might be interesting to examine is
that it bumps up checkpoint_completion_target to 0.9 once you've got a
large number of checkpoint_segments.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: dbt-2 tuning results with postgresql-8.3.5

From
Alvaro Herrera
Date:
Mark Wong escribió:

> Hrm, tracking just the launcher process certainly doesn't help.  Are
> the spawned processed short lived?  I take a snapshot of
> /proc/<pid>/io data every 60 seconds.

The worker processes can be short-lived, but if they are, obviously they
are not vacuuming the large tables.  If you want to track all autovacuum
actions, change autovacuum_log_min_messages to 0.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: dbt-2 tuning results with postgresql-8.3.5

From
Simon Riggs
Date:
Hi Mark,

Good to see you producing results again.


On Sat, 2008-12-20 at 16:54 -0800, Mark Wong wrote:
> Here are links to how the throughput changes when increasing shared_buffers:
>
> http://pugs.postgresql.org/node/505

Only starnge thing here is the result at 22528MB. It's the only normal
one there. Seems to be a freeze occurring on most tests around the 30
minute mark, which delays many backends and reduces writes.

Reduction in performance as shared_buffers increases looks normal.

Increase wal_buffers, but look for something else as well. Try to get a
backtrace from when the lock up happens. It may not be Postgres?

> And another series of tests to show how throughput changes when
> checkpoint_segments are increased:
>
> http://pugs.postgresql.org/node/503
>
> The links go to a graphical summary and raw data.  Note that the
> maximum theoretical throughput at this scale factor is approximately
> 12000 notpm.
>
> My first glance takes tells me that the system performance is quite
> erratic when increasing the shared_buffers.  I'm also not what to
> gather from increasing the checkpoint_segments.  Is it simply that the
> more checkpoint segments you have, the more time the database spends
> fsyncing when at a checkpoint?

I would ignore the checkpoint_segment tests because you aren't using a
realistic value of shared_buffers. I doubt any such effect is noticeable
when you use a realistic value determined from set of tests 505.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Mon, Dec 22, 2008 at 7:27 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>>>> "Mark Wong" <markwkm@gmail.com> wrote:
>
>> The DL380 G5 is an 8 core Xeon E5405 with 32GB of
>> memory.  The MSA70 is a 25-disk 15,000 RPM SAS array, currently
>> configured as a 25-disk RAID-0 array.
>
>> number of connections (250):
>
>> Moving forward, what other parameters (or combinations of) do people
>> feel would be valuable to illustrate with this workload?
>
> To configure PostgreSQL for OLTP on that hardware, I would strongly
> recommend the use of a connection pool which queues requests above
> some limit on concurrent queries.  My guess is that you'll see best
> results with a limit somewhere aound 40, based on my tests indicating
> that performance drops off above (cpucount * 2) + spindlecount.

It appears to peak around 220 database connections:

http://pugs.postgresql.org/node/514

Of course the system still isn't really tuned all that much...  I
wouldn't be surprised if the workload peaked at a different number of
connections as it is tuned more.

> I wouldn't consider tests of the other parameters as being very useful
> before tuning this.  This is more or less equivalent to the "engines"
> configuration in Sybase, for example.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Kevin Grittner"
Date:
>>> "Mark Wong" <markwkm@gmail.com> wrote:

> It appears to peak around 220 database connections:
>
> http://pugs.postgresql.org/node/514

Interesting.  What did you use for connection pooling?

My tests have never stayed that flat as the connections in use
climbed.  I'm curious why we're seeing such different results.

-Kevin

Re: dbt-2 tuning results with postgresql-8.3.5

From
"Mark Wong"
Date:
On Tue, Jan 13, 2009 at 7:40 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>>>> "Mark Wong" <markwkm@gmail.com> wrote:
>
>> It appears to peak around 220 database connections:
>>
>> http://pugs.postgresql.org/node/514
>
> Interesting.  What did you use for connection pooling?

It's a fairly dumb but custom built C program for the test kit:

http://git.postgresql.org/?p=~markwkm/dbt2.git;a=summary

I think the bulk of the logic is in src/client.c, src/db_threadpool.c,
and src/transaction_queue.c.

> My tests have never stayed that flat as the connections in use
> climbed.  I'm curious why we're seeing such different results.

I'm sure the difference in workloads makes a difference.  Like you
implied earlier, I think we have to figure out what works best in for
our own workloads.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Mark Wong
Date:
On Mon, Dec 22, 2008 at 12:59 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Sat, 20 Dec 2008, Mark Wong wrote:
>
>> Here are links to how the throughput changes when increasing
>> shared_buffers: http://pugs.postgresql.org/node/505 My first glance takes
>> tells me that the system performance is quite erratic when increasing the
>> shared_buffers.
>
> If you smooth that curve out a bit, you have to throw out the 22528MB figure
> as meaningless--particularly since it's way too close to the cliff where
> performance dives hard.  The sweet spot looks to me like 11264MB to 17408MB.
>  I'd say 14336MB is the best performing setting that's in the middle of a
> stable area.
>
>> And another series of tests to show how throughput changes when
>> checkpoint_segments are increased: http://pugs.postgresql.org/node/503 I'm
>> also not what to gather from increasing the checkpoint_segments.
>
> What was shared_buffers set to here?  Those two settings are not completely
> independent, for example at a tiny buffer size it's not as obvious there's a
> win in spreading the checkpoints out more.  It's actually a 3-D graph, with
> shared_buffers and checkpoint_segments as two axes and the throughput as the
> Z value.
>
> Since that's quite time consuming to map out in its entirety, the way I'd
> suggest navigating the territory more efficiently is to ignore the defaults
> altogether.  Start with a configuration that someone familiar with tuning
> the database would pick for this hardware:  8192MB for shared_buffers and
> 100 checkpoint segments would be a reasonable base point.  Run the same
> tests you did here, but with the value you're not changing set to those much
> larger values rather than the database defaults, and then I think you'd end
> with something more interesting. Also, I think the checkpoint_segments
> values >500 are a bit much, given what level of recovery time would come
> with a crash at that setting. Smaller steps from a smaller range would be
> better there I think.

Sorry for the long delay.  I have a trio of results (that I actually
ran about four weeks ago) setting the shared_buffers to 7680MB (I
don't know remember why it wasn't set to 8192MB :( ) and
checkpoint_segments to 100:

http://pugs.postgresql.org/node/517

I'm also capturing the PostgreSQL parameters as suggested so we can
see what's set in the config file, default, command line etc.  It's
the "Settings" link in the "System Summary" section on the report web
page.

So about a 7% change for this particular workload:

http://pugs.postgresql.org/node/502

We're re-running some filesystem tests for an upcoming conference, so
we'll get back to it shortly...

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Greg Smith
Date:
On Thu, 22 Jan 2009, Mark Wong wrote:

> I'm also capturing the PostgreSQL parameters as suggested so we can
> see what's set in the config file, default, command line etc.  It's
> the "Settings" link in the "System Summary" section on the report web
> page.

Those look good, much easier to pick out the stuff that's been customized.
I note that the Linux "Settings" links seems to be broken though.

To recap a summary here, what you had before were:

shared_buffers=24MB checkpoint_segments=100 notpm=7527
shared_buffers=8192MB checkpoint_segments=3 notpm=7996

And the new spots show:
shared_buffers=7680MB checkpoint_segments=100 notpm=9178

What's neat about your graphs now is that I think you can see the
checkpoints happening in the response time graphs.  For example, if you
look at
http://207.173.203.223/~markwkm/community6/dbt2/pgtune.1000.100.1/report/rt_d.png
and you focus on what happens just before each 10 minute mark, I'm
guessing that response time spike is the fsync phase at the end of the
checkpoint.  That's followed by a period where response time is really
fast.  That's because those writes are all pooling into the now cleared
out Linux buffer cache, but pdflush isn't really being aggressive about
writing them out yet.  On your server that can absorb quite a few writes
before clients start blocking on them, which is when response time climbs
back up.

A particularly interesting bit is to compare against the result with the
peak notpm you had in your earlier tests, where shared_buffers=15360MB:
http://207.173.203.223/~markwkm/community6/dbt2/shared_buffers/shared_buffers.15360MB/report/rt_d.png

While the average speed was faster on that one, the worst-case response
time was much worse.  You can really see this by comparing the response
time distribution.

Big shared buffers but low checkpoint_segments:
http://207.173.203.223/~markwkm/community6/dbt2/shared_buffers/shared_buffers.15360MB/report/dist_d.png

Medium shared buffers and medium checkpoint_segments:
http://207.173.203.223/~markwkm/community6/dbt2/pgtune.1000.100.1/report/dist_d.png

The checkpoint spreading logic is making a lot more transactions suffer
moderate write delays in order to get a big improvement in worst-case
behavior.

The next fine-tuning bit I'd normally apply in this situation is to see if
increasing checkpoint_completion_target from the default (0.5) to 0.9 does
anything to flatten out that response time graph.  I've seen a modest
increase in wal_buffers (from the default to, say, 1MB) help smooth out
the rough spots too.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: dbt-2 tuning results with postgresql-8.3.5

From
Mark Wong
Date:
On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Thu, 22 Jan 2009, Mark Wong wrote:
>
>> I'm also capturing the PostgreSQL parameters as suggested so we can
>> see what's set in the config file, default, command line etc.  It's
>> the "Settings" link in the "System Summary" section on the report web
>> page.
>
> Those look good, much easier to pick out the stuff that's been customized. I
> note that the Linux "Settings" links seems to be broken though.

Oh fudge, I think I see where my scripts are broken.  We're running
with a different Linux kernel now than before so I don't want to grab
the parameters yet.  I'll switch to the previous kernel to get the
parameters after the current testing is done, and fix the scripts in
the meantime.

Regards,
Mark

Re: dbt-2 tuning results with postgresql-8.3.5

From
Mark Wong
Date:
On Thu, Jan 22, 2009 at 10:10 PM, Mark Wong <markwkm@gmail.com> wrote:
> On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote:
>> On Thu, 22 Jan 2009, Mark Wong wrote:
>>
>>> I'm also capturing the PostgreSQL parameters as suggested so we can
>>> see what's set in the config file, default, command line etc.  It's
>>> the "Settings" link in the "System Summary" section on the report web
>>> page.
>>
>> Those look good, much easier to pick out the stuff that's been customized. I
>> note that the Linux "Settings" links seems to be broken though.
>
> Oh fudge, I think I see where my scripts are broken.  We're running
> with a different Linux kernel now than before so I don't want to grab
> the parameters yet.  I'll switch to the previous kernel to get the
> parameters after the current testing is done, and fix the scripts in
> the meantime.

Sorry for the continuing delays.  I have to make more time to spend on
this part.  One of the problems is that my scripts are listing the OS
of the main driver system, as opposed to the db system.  Mean while,
I've attached the sysctl output from kernel running on the database
system, 2.6.27-gentoo-r2..

Regards,
Mark

Attachment

Re: dbt-2 tuning results with postgresql-8.3.5

From
Greg Smith
Date:
On Thu, 5 Feb 2009, Mark Wong wrote:

> One of the problems is that my scripts are listing the OS of the main
> driver system, as opposed to the db system.

That's not a fun problem to deal with.  Last time I ran into it, I ended
up writing a little PL/PerlU function that gathered all the kernel-level
info I wanted, and then just pulled the info over by running that on the
driver system.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: dbt-2 tuning results with postgresql-8.3.5

From
Mark Wong
Date:
On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> The next fine-tuning bit I'd normally apply in this situation is to see if
> increasing checkpoint_completion_target from the default (0.5) to 0.9 does
> anything to flatten out that response time graph.  I've seen a modest
> increase in wal_buffers (from the default to, say, 1MB) help smooth out the
> rough spots too.

Hi all,

After yet another delay, I have .6 to .9 (I forgot .5. :():

http://pugs.postgresql.org/node/526

I don't think the effects of the checkpoint_completion_target are
significant, and I sort of feel it's because the entire database is on
a single device.  I've started doing some runs with the database log
on a separate device, so I'll be trying some of these parameters
again.

Regards,
Mark