Thread: PostgreSQL Configuration Tool for Dummies

PostgreSQL Configuration Tool for Dummies

From

"Campbell, Lance"

Date:

19 June 2007, 15:15:01

Below is a link to the HTML JavaScript configuration page I am creating:

http://www.webservices.uiuc.edu/postgresql/

I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.

Memory

There are many different ways to ask about memory. Rather than ask a series of questions I went with a single question, #2. If it is better to ask about the memory in a series of questions then please give me the questions you would ask and why you would ask each of them. From my understanding the primary memory issue as it relates to PostgreSQL is “how much memory is available to PostgreSQL”. Remember that this needs to be as simple as possible.

My next step is to list the PostgreSQL parameters found in the postgresql.conf file and how I will generate their values based on the questions I have so far. I will primarily focus on PostgreSQL 8.2.x. Once I have a consensus from everyone then I will put functionality behind the “Generate Suggested Settings” button.

Thanks for all of the feedback,

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

Re: PostgreSQL Configuration Tool for Dummies

From

"Y Sidhu"

Date:

19 June 2007, 15:38:00

On 6/19/07, Campbell, Lance <lance@uiuc.edu> wrote:

Below is a link to the HTML JavaScript configuration page I am creating:

http://www.webservices.uiuc.edu/postgresql/

I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.

Memory
There are many different ways to ask about memory. Rather than ask a series of questions I went with a single question, #2. If it is better to ask about the memory in a series of questions then please give me the questions you would ask and why you would ask each of them. From my understanding the primary memory issue as it relates to PostgreSQL is "how much memory is available to PostgreSQL". Remember that this needs to be as simple as possible.

My next step is to list the PostgreSQL parameters found in the postgresql.conf file and how I will generate their values based on the questions I have so far. I will primarily focus on PostgreSQL 8.2.x. Once I have a consensus from everyone then I will put functionality behind the "Generate Suggested Settings" button.

Thanks for all of the feedback,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu

Lance,

Simply awesome!

--
Yudhvir Singh Sidhu
408 375 3134 cell

Re: PostgreSQL Configuration Tool for Dummies

From

Francisco Reyes

Date:

19 June 2007, 16:00:40

Campbell, Lance writes:

For the "6) Are your searches:"
How about having "many simple"

Re: PostgreSQL Configuration Tool for Dummies

From

david@lang.hm

Date:

19 June 2007, 16:11:28

On Tue, 19 Jun 2007, Campbell, Lance wrote:

> Memory
>
> There are many different ways to ask about memory.  Rather than ask a
> series of questions I went with a single question, #2.  If it is better
> to ask about the memory in a series of questions then please give me the
> questions you would ask and why you would ask each of them.  From my
> understanding the primary memory issue as it relates to PostgreSQL is
> "how much memory is available to PostgreSQL".  Remember that this needs
> to be as simple as possible.

there are three catagories of memory useage

1. needed by other software
2. available for postgres
3. needed by the OS

it's not clear if what you are asking is #2 or a combination of #2 and #3

IMHO you should ask for #2 and #3, possibly along the lines of "how much
memory is in the machine that isn't already used by other applications"

David Lang

Re: PostgreSQL Configuration Tool for Dummies

From

Josh Berkus

Date:

19 June 2007, 16:53:24

> there are three catagories of memory useage
>
> 1. needed by other software
> 2. available for postgres
> 3. needed by the OS

There's actually only two required memory questions:

M1) How much RAM do you have on this machine?
M2) Is this:
    () Dedicated PostgreSQL Server?
    () Server shared with a few other applications?
    () Desktop?

I don't think the "mostly reads / mostly writes" question covers anything,
nor is it likely to produce accurate answers.  Instead, we need to ask the
users to characterize what type of application they are running:

T1) Please characterize the general type of workload you will be running on
this database.  Choose one of the following four:
() WEB: any scripting-language application which mainly needs to support
90% or more data reads, and many rapid-fire small queries over a large
number of connections. Examples: forums, content management systems,
directories.
() OLTP: this application involves a large number of INSERTs, UPDATEs and
DELETEs because most users are modifying data instead of just reading it.
Examples: accounting, ERP, logging tools, messaging engines.
() Data Warehousing: also called "decision support" and "BI", these
database support a fairly small number of large, complicated reporting
queries, very large tables, and large batch data loads.
() Mixed/Other: if your application doesn't fit any of the above, our
script will try to pick "safe, middle-of-the-road" values.

Hmmm, drop question (6) too.

(2) should read: "What is the maximum number of database connections which
you'll need to support?  If you don't know, we'll pick a default."

Other questions we need:

How many/how fast processors do you have?  Pick the option which seems
closest to what you have:
() A single laptop processor
() Single or dual older processors (1ghz)
() Dual or quad current consumer processors (2ghz+)
() Large, recent multi-core server system

"What OS Are You Using", of course, needs to have Linux, Solaris, BSD, OSX
and Windows.  At some point, this tool will also need to generate for the
user any shmem settings that they need to make on the OS.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: PostgreSQL Configuration Tool for Dummies

From

Charles Sprickman

Date:

19 June 2007, 20:50:31

On Tue, 19 Jun 2007, Josh Berkus wrote:

> "What OS Are You Using", of course, needs to have Linux, Solaris, BSD, OSX
> and Windows.  At some point, this tool will also need to generate for the
> user any shmem settings that they need to make on the OS.

I also noticed that on FreeBSD (6.2) at least the stock config simply
won't run without building a new kernel that bumps up all the SHM stuff or
dropping down resource usage in the postgres config...

Overall, I like the idea.  I've been slowly working on weaning myself off
of mysql and I think removing any roadblocks that new users might stumble
upon seems like an excellent way to get more exposure.

Charles

> --
> --Josh
>
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>

Re: PostgreSQL Configuration Tool for Dummies

From

Greg Smith

Date:

20 June 2007, 03:24:10

On Tue, 19 Jun 2007, Josh Berkus wrote:

> I don't think the "mostly reads / mostly writes" question covers anything,
> nor is it likely to produce accurate answers.  Instead, we need to ask the
> users to characterize what type of application they are running:
> T1) Please characterize the general type of workload you will be running on
> this database.  Choose one of the following four...

We've hashed through this area before, but for Lance's benefit I'll
reiterate my dissenting position on this subject.  If you're building a
"tool for dummies", my opinion is that you shouldn't ask any of this
information.  I think there's an enormous benefit to providing something
that takes basic sizing information and gives conservative guidelines
based on that--as you say, "safe, middle-of-the-road values"--that are
still way, way more useful than the default values.  The risk in trying to
make a complicated tool that satisfies all the users Josh is aiming his
more sophisticated effort at is that you'll lose the newbies.

Scan the archives of this mailing list for a bit.  If you look at what
people discover they've being nailed by, it's rarely because they need to
optimize something like random_page_cost.  It's usually because they have
a brutally wrong value for one of the memory or vacuum parameters that are
very easy to provide reasonable suggestions for without needing a lot of
information about the server.

I wouldn't even bother asking how many CPUs somebody has for what Lance is
building.  The kind of optimizations you'd do based on that are just too
complicated to expect a tool to get them right and still be accessible to
a novice.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies

From

Scott Marlowe

Date:

21 June 2007, 12:12:37

Greg Smith wrote:
> On Tue, 19 Jun 2007, Josh Berkus wrote:
>
>> I don't think the "mostly reads / mostly writes" question covers
>> anything,
>> nor is it likely to produce accurate answers.  Instead, we need to
>> ask the
>> users to characterize what type of application they are running:
>> T1) Please characterize the general type of workload you will be
>> running on
>> this database.  Choose one of the following four...
>
> We've hashed through this area before, but for Lance's benefit I'll
> reiterate my dissenting position on this subject.  If you're building
> a "tool for dummies", my opinion is that you shouldn't ask any of this
> information.  I think there's an enormous benefit to providing
> something that takes basic sizing information and gives conservative
> guidelines based on that--as you say, "safe, middle-of-the-road
> values"--that are still way, way more useful than the default values.
> The risk in trying to make a complicated tool that satisfies all the
> users Josh is aiming his more sophisticated effort at is that you'll
> lose the newbies.
Generally I agree, however, how about a first switch, for beginner /
intermediate / advanced.

The choice you make determines how much detail we ask you about your
setup.  Beginners get two or three simple questions, intermediate a
handful, and advanced gets grilled on everything.  Then, just write the
beginner and maybe intermediate to begin with and ghost out the advanced
until it's ready.

Re: PostgreSQL Configuration Tool for Dummies

From

Greg Smith

Date:

22 June 2007, 03:32:05

On Thu, 21 Jun 2007, Scott Marlowe wrote:

> Generally I agree, however, how about a first switch, for beginner /
> intermediate / advanced.

You're describing a perfectly reasonable approach for a second generation
tool in this area.  I think it would be very helpful for the user
community to get a first generation one that works fairly well before
getting distracted at all by things like this.  The people capable of
filling out the intermediate/advanced settings can probably just do a bit
of reading and figure out most of what they should be doing themselves.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies

From

Adam Tauno Williams

Date:

22 June 2007, 09:07:35

On Fri, 2007-06-22 at 02:32 -0400, Greg Smith wrote:
> On Thu, 21 Jun 2007, Scott Marlowe wrote:
>
> > Generally I agree, however, how about a first switch, for beginner /
> > intermediate / advanced.
>
> You're describing a perfectly reasonable approach for a second generation
> tool in this area.  I think it would be very helpful for the user
> community to get a first generation one that works fairly well before
> getting distracted at all by things like this.  The people capable of
> filling out the intermediate/advanced settings can probably just do a bit
> of reading and figure out most of what they should be doing themselves.

Just as an aside; how come the installation/setup "Tutorial" section -
http://www.postgresql.org/docs/8.2/interactive/tutorial-start.html -
doesn't mention setting some rough reasonable defaults in
postgresql.conf or even a reference to the parameter documentation
section.  It seems like such a reference should exist between -
http://www.postgresql.org/docs/8.2/interactive/tutorial-arch.html - and
- http://www.postgresql.org/docs/8.2/interactive/tutorial-accessdb.html

At least something along those lines should be said at
http://www.postgresql.org/docs/8.2/interactive/install-post.html

Personally, as DBA for more than a decade,  I've got 0 sympathy for
people who setup a database but can't be bothered to read the
documentation.  But in the case of PostgreSQL the documentation could do
a better job of driving users to even the existence [and importance of]
postgresql.conf and routine maintenance techniques.
http://www.postgresql.org/docs/8.2/interactive/runtime-config.html
http://www.postgresql.org/docs/8.2/interactive/maintenance.html

Seems to me that even a remake of something like -
http://www.iiug.org/~waiug/old/forum2000/SQLTunning/sld001.htm - focused
on PostgreSQL would be novel and very interesting.

Just my two cents.

PostgreSQL is awesome, BTW.

Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control

From

"Sabin Coanda"

Date:

22 June 2007, 10:24:56

""Campbell, Lance"" <lance@uiuc.edu> wrote in message news:A3AC4FA47DC0B1458C3E5396E685E63302395E63@SAB-DC1.sab.uiuc.edu...
Below is a link to the HTML JavaScript configuration page I am creating:

http://www.webservices.uiuc.edu/postgresql/

I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.

Instead of (or in addition to) configure dozens of settings, what do you say about a feedback adjustable control based on the existing system statistics and parsing logs (e.g http://pgfouine.projects.postgresql.org/index.html ) ?

Such an application improved with notifications would be useful for experimented users, too. A database is not static and it may evolve to different requirements. The initial configuration may be deprecated after one year.

Regards,

Sabin

Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control

From

david@lang.hm

Date:

22 June 2007, 15:47:20

On Fri, 22 Jun 2007, Sabin Coanda wrote:

> Instead of (or in addition to) configure dozens of settings, what do you
> say about a feedback adjustable control based on the existing system
> statistics and parsing logs (e.g
> http://pgfouine.projects.postgresql.org/index.html ) ?

something like this would be useful for advanced tuneing, but the biggest
problem is that it's so difficult to fingoure out a starting point. bad
choices at the starting point can cause several orders of magnatude
difference in the database performsnce. In addition we know that the
current defaults are bad for just about everyone (we just can't decide
what better defaults would be)

this horrible starting point gives people a bad first impression that a
simple tool like what's being discussed can go a long way towards solving.

David Lang

Re: PostgreSQL Configuration Tool for Dummies

From

Greg Smith

Date:

23 June 2007, 16:32:49

On Fri, 22 Jun 2007, Adam Tauno Williams wrote:

> Just as an aside; how come the installation/setup "Tutorial" section -
> http://www.postgresql.org/docs/8.2/interactive/tutorial-start.html -
> doesn't mention setting some rough reasonable defaults in
> postgresql.conf or even a reference to the parameter documentation
> section.

I think that anyone who has been working with the software long to know
what should go into such a section has kind of forgotten about this part
of the documentation by the time they get there.  It is an oversight and
yours is an excellent suggestion.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control

From

Greg Smith

Date:

23 June 2007, 16:44:17

On Fri, 22 Jun 2007, Sabin Coanda wrote:

> Instead of (or in addition to) configure dozens of settings, what do you
> say about a feedback adjustable control based on the existing system
> statistics and parsing logs

Take a look at the archive of this list for the end of April/Early May.
There's a thread there named "Feature Request --- was: PostgreSQL
Performance Tuning" that addressed this subject in length I think you'll
find interesting reading.

I personally feel there's much more long-term potential for a tool that
inspects the database, but the needs of something looking for getting good
starting configuration file (before there necessarily is even a populated
database) is different enough that it may justify building two different
tools.

I would suggest you or anything else building the starter configuration
tool not stray from the path of getting the most important variables set
to reasonable values.  Trying to satisfy every possible user is the path
that leads to a design so complicated that it's unlikely you'll ever get a
finished build done at all.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control

From

"Sabin Coanda"

Date:

25 June 2007, 08:47:43

<david@lang.hm> wrote in message
news:Pine.LNX.4.64.0706221144290.14942@asgard.lang.hm...
> On Fri, 22 Jun 2007, Sabin Coanda wrote:
>
>> Instead of (or in addition to) configure dozens of settings, what do you
>> say about a feedback adjustable control based on the existing system
>> statistics and parsing logs (e.g
>> http://pgfouine.projects.postgresql.org/index.html ) ?
>
> something like this would be useful for advanced tuneing, but the biggest
> problem is that it's so difficult to fingoure out a starting point. bad
> choices at the starting point can cause several orders of magnatude
> difference in the database performsnce. In addition we know that the
> current defaults are bad for just about everyone (we just can't decide
> what better defaults would be)
>

You are right. But an automatic tool beeing able to take decisions by
different inputs, would be able to set a startup configuration too, based on
the hw/sw environment, and interactive user requirements.

> this horrible starting point gives people a bad first impression that a
> simple tool like what's being discussed can go a long way towards solving.
>

Well, I think to an automatic tool, not an utopian application good for
everything. For instance the existing automatic daemon have some abilities,
bat not all of the VACUUM command. I'm realistic that good things may be
done in steps, not once.

I would be super happy if an available automatic configuration tool would be
able to set for the beginning just the shared_buffers or  max_fsm_pages
based on the available memory. Adjustments can be done later.

Regards,
Sabin

Re: PostgreSQL Configuration Tool for Dummies

From

Josh Berkus

Date:

26 June 2007, 12:25:47

Greg,

> We've hashed through this area before, but for Lance's benefit I'll
> reiterate my dissenting position on this subject.  If you're building a
> "tool for dummies", my opinion is that you shouldn't ask any of this
> information.  I think there's an enormous benefit to providing something
> that takes basic sizing information and gives conservative guidelines
> based on that--as you say, "safe, middle-of-the-road values"--that are
> still way, way more useful than the default values.  The risk in trying to
> make a complicated tool that satisfies all the users Josh is aiming his
> more sophisticated effort at is that you'll lose the newbies.

The problem is that there are no "safe, middle-of-the-road" values for some
things, particularly max_connections and work_mem.  Particularly, there are
very different conf profiles between reporting applications and OLTP/Web.
We're talking about order-of-magnitude differences here, not just a few
points.  e.g.:

Web app, typical machine:
max_connections = 200
work_mem = 256kb
default_statistics_target=100
autovacuum=on

Reporting app, same machine:
max_connections = 20
work_mem = 32mb
default_statistics_target=500
autovacuum=off

Possibly we could make the language of the "application type" selection less
technical, but I don't see it as dispensible even for a basic tool.

> I wouldn't even bother asking how many CPUs somebody has for what Lance is
> building.  The kind of optimizations you'd do based on that are just too
> complicated to expect a tool to get them right and still be accessible to
> a novice.

CPUs affects the various cpu_cost parameters, but I can but the idea that this
should only be part of the "advanced" tool.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: PostgreSQL Configuration Tool for Dummies

From

Greg Smith

Date:

26 June 2007, 15:14:29

On Tue, 26 Jun 2007, Josh Berkus wrote:

> The problem is that there are no "safe, middle-of-the-road" values for some
> things, particularly max_connections and work_mem.

Your max_connections concern is one fact that haunts the idea of just
giving out some sample configs for people.  Lance's tool asks outright the
expectation for max_connections which I think is the right thing to do.

> Web app, typical machine:
> work_mem = 256kb
> default_statistics_target=100
> autovacuum=on

> Reporting app, same machine:
> work_mem = 32mb
> default_statistics_target=500
> autovacuum=off

I think people are stuck with actually learning a bit about work_mem
whether they like it or not, because it's important to make it larger but
we know going too high will be a problem with lots of connections doing
sorts.

As for turning autovacuum on/off and the stats target, I'd expect useful
defaults for those would come out of how the current sample is asking
about read vs. write workloads and expected database size.  Those simple
to understand questions might capture enough of the difference between
your two types here.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies

From

Josh Berkus

Date:

26 June 2007, 16:25:24

Greg,

> Your max_connections concern is one fact that haunts the idea of just
> giving out some sample configs for people.  Lance's tool asks outright the
> expectation for max_connections which I think is the right thing to do.
...
> I think people are stuck with actually learning a bit about work_mem
> whether they like it or not, because it's important to make it larger but
> we know going too high will be a problem with lots of connections doing
> sorts.

I find it extremely inconsistent that you want to select "middle-of-the-road"
defaults for some values and ask users detailed questions for other values.
Which are we trying to do, here?

Given an "application type" selection, which is a question which can be
written in easy-to-understand terms, these values can be set at reasonable
defaults.  In fact, for most performance tuning clients I had, we never
actually looped back and tested the defaults by monitoring pg_temp, memstat
and the log; performance was acceptable with the approximate values.

> As for turning autovacuum on/off and the stats target, I'd expect useful
> defaults for those would come out of how the current sample is asking
> about read vs. write workloads and expected database size.  Those simple
> to understand questions might capture enough of the difference between
> your two types here.

Both of the questions you cite above are unlikely to result in accurate
answers from users, and the read vs. write answer is actually quite useless
except for the extreme cases (e.g. read-only or mostly-write).  The deciding
answer in turning autovacuum off is whether or not the user does large bulk
loads / ETL operations, which autovac would interfere with.

The fact that we can't expect an accurate answer on database size (except from
the minority of users who already have a full production DB) will be a
chronic stumbling block for any conf tool we build.  Quite a number of
settings want to know this figure: max_fsm_pages, maintenance_work_mem,
max_freeze_age, etc.  Question is, would order-of-magnitude answers be likely
to have better results?  i.e.:

How large is your database expected to grow?
[] Less than 100MB / thousands of rows
[] 100mb to 1gb / tens to hundreds of thousands of rows
[] 1 gb to 10 gb / millions of rows
[] 10 gb to 100 gb / tens to hundreds of millions of rows
[] 100 gb to 1 TB / billions of rows
[] more than 1 TB / many billions of rows

... users might have better guesses within those rough ranges, and it would
give us enough data to set rough values.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: PostgreSQL Configuration Tool for Dummies

From

Greg Smith

Date:

26 June 2007, 19:05:25

On Tue, 26 Jun 2007, Josh Berkus wrote:

> I find it extremely inconsistent that you want to select "middle-of-the-road"
> defaults for some values and ask users detailed questions for other values.
> Which are we trying to do, here?

I'd like to see people have a really simple set of questions to get them
past the completely undersized initial configuration phase, then ship them
toward resources to help educate about the parts that could be problematic
for them based on what they do or don't know.  I don't see an
inconsistancy that I'd expect people to have a reasonable guess for
max_connections, while also telling them that setting sort_mem is
important, a middle value has been assigned, but a really correct setting
isn't something they can expect the simple config tool to figure out for
them; here's a pointer to the appropriate documentation to learn more.

> The fact that we can't expect an accurate answer on database size
> (except from the minority of users who already have a full production
> DB) will be a chronic stumbling block for any conf tool we build.

I'm still of the opinion that recommendations for settings like
max_fsm_pages and maintenance_work_mem should come out of a different type
of tool that connects to the database.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL Configuration Tool for Dummies

From

Josh Berkus

Date:

27 June 2007, 15:01:59

Greg,

> I'd like to see people have a really simple set of questions to get them
> past the completely undersized initial configuration phase, then ship them
> toward resources to help educate about the parts that could be problematic
> for them based on what they do or don't know.  I don't see an
> inconsistancy that I'd expect people to have a reasonable guess for
> max_connections, while also telling them that setting sort_mem is
> important, a middle value has been assigned, but a really correct setting
> isn't something they can expect the simple config tool to figure out for
> them; here's a pointer to the appropriate documentation to learn more.

I disagree that this is acceptable, especially when we could set a better
value using an easy-to-understand question. It's also been my experience (in
3 years of professional performance tuning) that most users *don't* have an
accurate guess for max_connections.

I'm really not clear on why you think "what flavor of application do you
have?" is a difficult question.  It's certainly one that my clients were able
to answer easily.  Overall, it seems like you're shooting for a conf tool
which only really works for web apps, which isn't my personal goal or I think
a good use of our time.

> I'm still of the opinion that recommendations for settings like
> max_fsm_pages and maintenance_work_mem should come out of a different type
> of tool that connects to the database.

Well, there's several steps to this:

1) Run conf tool when installing PG;
2) Run conf tool++ after application is first up and running;
3) Run conf tool++ after application has been in production

The (1) tool should at least provide a configuration which isn't going to lead
to long term issues.  For example, dramatically underallocating fsm_pages can
result in having to run VACUUM FULL and the associated downtime, so it's
something we want to avoid at the outset.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco