Thread: Recommendations for configuring a 200 GB database
We have had four databases serving our web site, but due to licensing issues, we have had to take two out of production, and we are looking to bring those two onto PostgreSQL very quickly, with an eye toward moving everything in the longer term. The central web DBs are all copies of the same data, drawn from 72 servers at remote locations. We replicate modifications made at these 72 remote sites real-time to all central servers.
On each central server, there are 352 tables and 412 indexes holding about 700 million rows, taking almost 200 GB of disk space. The largest table has about 125 million of those rows, with several indexes. There are about 3 million database transactions modifying each central database every day, with each transaction typically containing many inserts and/or updates -- deletes are sparse. During idle time the replication process compares tables in the source databases to the central databases to log any differences and correct the central copies. To support the 2 million browser and SOAP hits per day, the web sites spread about 6 million SELECT statements across available central servers, using load balancing. Many of these queries involve a 10 or more tables with many subqueries; some involve unions.
The manager of the DBA team is reluctant to change both the OS and the DBMS at the same time, so unless I can make a strong case for why it is important to run postgresql under Linux, we will be running this on Windows. Currently, there are two Java-based middle tier processes running on each central database server, one for the replication and one for the web. We expect to keep it that way, so the database needs to play well with these processes.
I've been reading everything I can find on postgresql configuration, but would welcome any specific suggestions for this environment. I'd also be really happy to hear that we're not the first to use postgresql with this much data and load.
Thanks for any info you can provide.
-Kevin
Kevin Grittner wrote: > > The manager of the DBA team is reluctant to change both the OS and the > DBMS at the same time, so unless I can make a strong case for why it is > important to run postgresql under Linux, we will be running this on > Windows. Currently, there are two Java-based middle tier processes > running on each central database server, one for the replication and one > for the web. We expect to keep it that way, so the database needs to > play well with these processes. Well, there's a lot more experience running PG on various *nix systems and a lot more help available. Also, I don't think performance on Windows is as good as on Linux/*BSD yet. Against switching OS is the fact that you presumably don't have the skills in-house for it, and the hardware was chosen for Windows compatibility/performance. Speaking of which, what sort of hardware are we talking about? -- Richard Huxton Archonet Ltd