Performace Optimization for Dummies - Mailing list pgsql-performance
From | Carlo Stonebanks |
---|---|
Subject | Performace Optimization for Dummies |
Date | |
Msg-id | efgu4v$276m$1@news.hub.org Whole thread Raw |
Responses |
Re: Performace Optimization for Dummies
Re: Performace Optimization for Dummies Re: Performace Optimization for Dummies Re: Performace Optimization for Dummies Re: Performace Optimization for Dummies Re: Performace Optimization for Dummies |
List | pgsql-performance |
I am a software developer who is acting in a (temporary) dba role for a project. I had recommended PostgreSQL be brought in to replace the proposed MySQL DB - I chose PostgreSQL because of its reputation as a more stable solution than MySQL. At this early stage in the project, we are initializing our portal's database with millions of rows of imported data in over 50 different flattened tables; each table's structure is unique to the data provider. This requires a pretty complex import program, because the data must be matched semantically, not literally. Even with all of the expression matching and fuzzy logic in the code,our performance statistics show that the program spends over 75% of its time in SQL queries looking for matching and/or duplicate data. The import is slow - and degrades as the tables grow. With even more millions of rows in dozens of import tables to come, the imports will take forever. My ability to analyse the queries is limited; because of the nature of the import process, the SQL queries are mutable, every imported row can change the structure of a SQL query as the program adds and subtracts search conditions to the SQL command text before execution. The import program is scripted in Tcl. An attempt to convert our queries to prepared queries (curiousy) did not bring any performance improvements, and we converted back to simplify the code. We urgently need a major performance improvement. We are running the PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual core 3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) disc subsystem. Sorry about the long intro, but here are my questions: 1) Are we paying any big penalties by running Windows vs LINUX (or any other OS)? 2) Has the debate over PostgreSQL and Xeon processors been settled? Is this a factor? 3) Are there any easy-to-use performance analysis/optimisation tools that we can use? I am dreaming of one that could point out problems and suggest and.or effect solutions. 4) Can anyone recommend any commercial PostgreSQL service providers that may be able to swiftly come in and assist us with our performance issues? Below, please find what I believe are the configuration settings of interest in our system Any help and advice will be much appreciated. TIA, Carlo max_connections = 100 shared_buffers = 50000 work_mem = 32768 maintenance_work_mem = 32768 checkpoint_segments = 128 effective_cache_size = 10000 random_page_cost = 3 stats_start_collector = on stats_command_string = on stats_row_level = on autovacuum = on
pgsql-performance by date: