Re: go for a script! / ex: PostgreSQL vs. MySQL - Mailing list pgsql-performance
From | Josh Berkus |
---|---|
Subject | Re: go for a script! / ex: PostgreSQL vs. MySQL |
Date | |
Msg-id | 200310121330.45115.josh@agliodbs.com Whole thread Raw |
In response to | Re: go for a script! / ex: PostgreSQL vs. MySQL (Nick Barr <nicky@chuckie.co.uk>) |
List | pgsql-performance |
Nick, > I reckon do a system scan first, and parse the current PostgreSQL conf > file to figure out what the settings are. Also back it up with a date > and time appended to the end to make sure there is a backup before > overwriting the real conf file. Then a bunch of questions. What sort of > questions would need to be asked and which parameters would these > questions affect? So far, and from my limited understanding of the .conf > file, I reckon there should be the following Hmmm ... but I do think that there should be a file to store the user's previous answers. That way, the script can easily be re-run to fix config issues. > Here is your config of your hardware as detected. Is this correct ? > > This could potentially be several questions, i.e. one for proc, mem, > os, hdd etc > Would affect shared_buffers, sort_mem, effective_cache_size, > random_page_cost Actually, I think this would break down into: -- Are Proc & Mem correct? If not, type in correct values -- Is OS correct? If not, select from list -- Your HDD: is it: 1) IDE 2) Fast multi-disk SCSI or low-end RAID 3) Medium-to-high-end RAID Other things, we don't care about. > How was PostgreSQL compiled? > > This would be parameters such as the block size and a few other > compile time parameters. If we can get to some of these read-only > parameters than that would make this step easier, certainly for the new > recruits amongst us. Actually, from my perspective, we shouldn't bother with this; if an admin knows enough to set an alternate blaock size for PG, then they know enough to tweak the Conf file by hand. I think we should just issue a warning that this script: 1) does not work for anyone who is using non-default block sizes, 2) may not work well for anyone using unusual locales, optimization flags, or other non-default compile options except for language interfaces. 3) cannot produce good settings for embedded systems; 4) will not work well for systems which are extremely low on disk space, memory, or other resouces. Basically, the script only really needs to work for the people who are installing PostgreSQL with the default options or from RPM on regular server or workstation machines with plenty of disk space for normal database purposes. People who have more complicated setups can read the darned documentation and tune the conf file by hand. > Is PostgreSQL the only thing being run on this computer? First, becuase it affects a couple of other variables: What kind of database server are you expecting to run? A) Web Server (many small fast queries from many users, and not much update activity) B) Online Transaction Processing (OLTP) database (many small updates constantly from many users; think "accounting application"). C) Online Analytical Reporting (OLAP) database (a few large and complicated read-only queries aggregating large quantites of data for display) D) Data Transformation tool (loading large amounts of data to process, transform, and output to other software) E) Mixed-Use Database Server (a little of all of the above) F) Workstation (installing this database on a user machine which also has a desktop, does word processing, etc.) If the user answers anything but (F), then we ask: Will you be running any other signficant software on this server, such as a web server, a Java runtime engine, or a reporting application? (yes|no) If yes, then: How much memory do you expect this other software, in total, to regularly use while PostgreSQL is in use? (# in MB; should offer default of 50% of the RAM scanned). > How are the clients going to connect? > > i.e. TCP or Unix sockets We should warn them that they will still need to configure pg_hba.conf. > How many clients can connect to this database at once? > > Affects max_connections Should add a parenthetical comment that for applications which use pooled connections, or intermittent connection, such as Web applications, the number of concurrent connections is often much lower than the number of concurrent users. > How many databases and how many tables in each database are going to be > present? > > Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout Also need to ask if they have an idea of the total size of all databases, in MB or GB, which has a stronger relationship to those variables. Also, this will give us a chance to check the free space on the PGDATA partition, and kick the user out with a warning if there is not at least 2xExpected Size available. > Do you want to vacuum you database regularly? > > Initial question for cron job > > It is recomended that you vacuum analyze every night, do you want to do > this? > It is also recomended that you vacuum full every month, do you want to > do this? Depends on size/type of database. For large OLTP databases, I recommend vacuum as often as every 5 mintues, analyze every hour, and Vacuum Full + Reindex once a week. For a workstation database, your frequencies are probably OK. -- Josh Berkus Aglio Database Solutions San Francisco
pgsql-performance by date: