Re: Moving postgresql.conf tunables into 2003... - Mailing list pgsql-performance

From Sean Chittenden
Subject Re: Moving postgresql.conf tunables into 2003...
Date
Msg-id 20030706002413.GZ72567@perrin.int.nxad.com
Whole thread Raw
In response to Re: Moving postgresql.conf tunables into 2003...  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Moving postgresql.conf tunables into 2003...
List pgsql-performance
> > The SGML docs aren't in the DBA's face and are way out of the way
> > for DBAs rolling out a new system or who are tuning the system.
> > SGML == Developer, conf == DBA.
>
> That's exactly my point.  We cannot provide enough documentation in
> the CONF file without septupling its length.  IF we remove all
> commentary, and instead provide a pointer to the documentation, more
> DBAs will read it.

Which I don't think would happen and why I think the terse bits that
are included are worth while.  :)

> > Some of those parameters are based on hardware constraints and
> > should be pooled and organized as such.
> >
> > random_page_cost ==
> >     avg cost of a random disk seek/read (eg: disk seek time) ==
> >     constant integer for a given piece of hardware
>
> But, you see, this is exactly what I'm talking about.
> random_page_cost isn't static to a specific piece of hardware ... it
> depends as well on what else is on:

*) the disk/array

translation: how fast data is accessed and over how many drives.

*) concurrent disk activity

A disk/database activity metric is different than the cost of a seek
on the platters.  :) Because PostgreSQL doesn't currently support such
a disk concurrency metric doesn't mean that its definition should get
rolled into a different number in an attempt to accommodate for a lack
thereof.

*) disk controller settings

This class of settings falls into the same settings that affect random
seeks on the platters/disk array(s).

*) filesystem

Again, this influences avg seek time

*) OS

Again, avg seek time

*) distribution of records and tables

This has nothing to do with PostgreSQL's random_page_cost setting
other than that if data is fragmented on the platter, the disk is
going to have to do a lot of seeking.  This is a stat that should get
set by ANALYZE, not by a human.

*) arrangement of the partitions on disk

Again, avg seek time.

> One can certainly get a "good enough" value by benchmarking the
> disk's random seek and calculating based on that ... but to get an
> "ideal" value requires a long interactive session by someone with
> experience and in-depth knowledge of the machine and database.

An "ideal" value isn't obtained via guess and check.  Checking is only
the verification of some calculable set of settings....though right now
those calculated settings are guessed, unfortunately.

> > There are other settings that are RAM based as well, which should
> > be formulaic and derived though a formula hasn't been defined to
> > date.
>
> You seem pretty passionate about this ... how about you help me an
> Kevin define a benchmarking suite when I get back into the country
> (July 17)?  If we're going to define formulas, it requires that we
> have a near-comprehensive and consistent test database and test
> battery that we can run on a variety of machines and platforms.

Works for me, though a benchmark will be less valuable than adding a
disk concurrency stat, improving data trend/distribution analysis, and
using numbers that are concrete and obtainable through the OS kernel
API or an admin manually plunking numbers in.  I'm still recovering
from my move from Cali to WA so with any luck, I'll be settled in by
then.

-sc

--
Sean Chittenden

pgsql-performance by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: PostgreSQL vs. MySQL
Next
From: Josh Berkus
Date:
Subject: Re: Moving postgresql.conf tunables into 2003...