Re: go for a script! / ex: PostgreSQL vs. MySQL - Mailing list pgsql-performance

From Josh Berkus
Subject Re: go for a script! / ex: PostgreSQL vs. MySQL
Date
Msg-id 200310121330.45115.josh@agliodbs.com
Whole thread Raw
In response to Re: go for a script! / ex: PostgreSQL vs. MySQL  (Nick Barr <nicky@chuckie.co.uk>)
List pgsql-performance
Nick,

> I reckon do a system scan first, and parse the current PostgreSQL conf
> file to figure out what the settings are. Also back it up with a date
> and time appended to the end to make sure there is a backup before
> overwriting the real conf file. Then a bunch of questions. What sort of
> questions would need to be asked and which parameters would these
> questions affect? So far, and from my limited understanding of the .conf
> file, I reckon there should be the following

Hmmm ... but I do think that there should be a file to store the user's
previous answers.   That way, the script can easily be re-run to fix config
issues.

> Here is your config of your hardware as detected. Is this correct ?
>
>     This could potentially be several questions, i.e. one for proc, mem,
> os, hdd etc
>     Would affect shared_buffers, sort_mem, effective_cache_size,
> random_page_cost

Actually, I think this would break down into:
-- Are Proc & Mem correct?  If not, type in correct values
-- Is OS correct? If not, select from list
-- Your HDD: is it:
    1) IDE
    2) Fast multi-disk SCSI or low-end RAID
    3) Medium-to-high-end RAID

Other things, we don't care about.

> How was PostgreSQL compiled?
>
>     This would be parameters such as the block size and a few other
> compile time parameters. If we can get to some of these read-only
> parameters than that would make this step easier, certainly for the new
> recruits amongst us.

Actually, from my perspective, we shouldn't bother with this; if an admin
knows enough to set an alternate blaock size for PG, then they know enough to
tweak the Conf file by hand.  I think we should just issue a warning that
this script:
1) does not work for anyone who is using non-default block sizes,
2) may not work well for anyone using unusual locales, optimization flags, or
other non-default compile options except for language interfaces.
3) cannot produce good settings for embedded systems;
4) will not work well for systems which are extremely low on disk space,
memory, or other resouces.
    Basically, the script only really needs to work for the people who are
installing PostgreSQL with the default options or from RPM on regular server
or workstation machines with plenty of disk space for normal database
purposes.  People who have more complicated setups can read the darned
documentation and tune the conf file by hand.

> Is PostgreSQL the only thing being run on this computer?

First, becuase it affects a couple of other variables:

What kind of database server are you expecting to run?
A) Web Server (many small fast queries from many users, and not much update
activity)
B) Online Transaction Processing (OLTP) database (many small updates
constantly from many users; think "accounting application").
C) Online Analytical Reporting (OLAP) database (a few large and complicated
read-only queries aggregating large quantites of data for display)
D) Data Transformation tool (loading large amounts of data to process,
transform, and output to other software)
E) Mixed-Use Database Server (a little of all of the above)
F) Workstation (installing this database on a user machine which also has a
desktop, does word processing, etc.)

If the user answers anything but (F), then we ask:

Will you be running any other signficant software on this server, such as a
web server, a Java runtime engine, or a reporting application? (yes|no)

If yes, then:

How much memory do you expect this other software, in total, to regularly use
while PostgreSQL is in use?  (# in MB; should offer default of 50% of the RAM
scanned).

> How are the clients going to connect?
>
>     i.e. TCP or Unix sockets

We should warn them that they will still need to configure pg_hba.conf.

> How many clients can connect to this database at once?
>
>     Affects max_connections

Should add a parenthetical comment that for applications which use pooled
connections, or intermittent connection, such as Web applications, the number
of concurrent connections is often much lower than the number of concurrent
users.

> How many databases and how many tables in each database are going to be
> present?
>
>     Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout

Also need to ask if they have an idea of the total size of all databases, in
MB or GB, which has a stronger relationship to those variables.

Also, this will give us a chance to check the free space on the PGDATA
partition, and kick the user out with a warning if there is not at least
2xExpected Size available.

> Do you want to vacuum you database regularly?
>
>     Initial question for cron job
>
> It is recomended that you vacuum analyze every night, do you want to do
> this?
> It is also recomended that you vacuum full every month, do you want to
> do this?

Depends on size/type of database.  For large OLTP databases, I recommend
vacuum as often as every 5 mintues, analyze every hour, and Vacuum Full +
Reindex once a week.   For a workstation database, your frequencies are
probably OK.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Index/Foreign Key Question
Next
From: Josh Berkus
Date:
Subject: Re: go for a script! / ex: PostgreSQL vs. MySQL