Greg,
> Your max_connections concern is one fact that haunts the idea of just
> giving out some sample configs for people. Lance's tool asks outright the
> expectation for max_connections which I think is the right thing to do.
...
> I think people are stuck with actually learning a bit about work_mem
> whether they like it or not, because it's important to make it larger but
> we know going too high will be a problem with lots of connections doing
> sorts.
I find it extremely inconsistent that you want to select "middle-of-the-road"
defaults for some values and ask users detailed questions for other values.
Which are we trying to do, here?
Given an "application type" selection, which is a question which can be
written in easy-to-understand terms, these values can be set at reasonable
defaults. In fact, for most performance tuning clients I had, we never
actually looped back and tested the defaults by monitoring pg_temp, memstat
and the log; performance was acceptable with the approximate values.
> As for turning autovacuum on/off and the stats target, I'd expect useful
> defaults for those would come out of how the current sample is asking
> about read vs. write workloads and expected database size. Those simple
> to understand questions might capture enough of the difference between
> your two types here.
Both of the questions you cite above are unlikely to result in accurate
answers from users, and the read vs. write answer is actually quite useless
except for the extreme cases (e.g. read-only or mostly-write). The deciding
answer in turning autovacuum off is whether or not the user does large bulk
loads / ETL operations, which autovac would interfere with.
The fact that we can't expect an accurate answer on database size (except from
the minority of users who already have a full production DB) will be a
chronic stumbling block for any conf tool we build. Quite a number of
settings want to know this figure: max_fsm_pages, maintenance_work_mem,
max_freeze_age, etc. Question is, would order-of-magnitude answers be likely
to have better results? i.e.:
How large is your database expected to grow?
[] Less than 100MB / thousands of rows
[] 100mb to 1gb / tens to hundreds of thousands of rows
[] 1 gb to 10 gb / millions of rows
[] 10 gb to 100 gb / tens to hundreds of millions of rows
[] 100 gb to 1 TB / billions of rows
[] more than 1 TB / many billions of rows
... users might have better guesses within those rough ranges, and it would
give us enough data to set rough values.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco