Re: Tuning scenarios (was Changing the default configuration) - Mailing list pgsql-performance

Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > You don't have any real confidence in pgbench.  Fair enough.  What
> > *would* you have confidence in?
>
> Measurements on your actual application?

That unfortunately doesn't help us a whole lot in figuring out
defaults that will perform reasonably well under broad conditions,
unless there's some way to determine a reasonably consistent pattern
(or set of patterns) amongst a lot of those applications.

> In fairness to pgbench, most of its problems come from people running
> it at tiny scale factors, where it reduces to an exercise in how many
> angels can dance on the same pin (or, how many backends can contend to
> update the same row).

This isn't easy to fix, but I don't think it's impossible either.
It's probably sufficient to make the defaults dependent on information
gathered about the system.  I'd think total system memory would be the
primary thing to consider, since most database engines are pretty fast
once all the data and indexes are cached.  :-)

> And in that regime it runs into two or three different Postgres
> limitations that might or might not have any relevance to your
> real-world application --- dead-index-row references used to be the
> worst, but I think probably aren't anymore in 7.3.  But those same
> limitations cause the results to be unstable from run to run, which
> is why I don't have a lot of faith in reports of pgbench numbers.
> You need to work quite hard to get reproducible numbers out of it.

The interesting question is whether that's more an indictment of how
PG does things or how pg_bench does things.  I imagine it's probably
difficult to get truly reproducible numbers out of pretty much any
benchmark coupled with pretty much any database engine.  There are
simply far too many parameters to tweak on any but the simplest
database engines, and we haven't even started talking about tuning the
OS around the database...

And benchmarks (as well as real-world applications) will always run
into limitations of the database (locking mechanisms, IPC limits,
etc.).  In fact, that's another useful purpose: to see where the
limits of the database are.

Despite the limits, it's probably better to have a benchmark that only
gives you an order of magnitude idea of what to expect than to not
have anything at all.  And thus we're more or less right back where we
started: what kinds of benchmarking tests should go into a benchmark
for the purposes of tuning a database system (PG in particular but the
answer might easily apply to others as well) so that it will perform
decently, if not optimally, under the most likely loads?

I think we might be able to come up with some reasonable answers to
that, as long as we don't expect too much out of the resulting
benchmark.  The right people to ask are probably the people who are
actually running production databases.

Anyone wanna chime in here with some opinions and perspectives?





--
Kevin Brown                          kevin@sysexperts.com

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tuning scenarios (was Changing the default configuration)
Next
From: Bruce Momjian
Date:
Subject: Re: JBoss CMP Performance Problems with PostgreSQL 7.2.3