Thread: configuration according to the database
I am setting up a project using APache, PHP and Postgresql.
This application will be used by about 30 users.
The database is about this type :
between 12GB and 15GB
4 tables will have 1Million rows and 1000 columns with 90% of INT2 and the rest of float (20% of all the data will be 0)
the orther tables are less than 10 000 rows
Most of the queries will be SELECT being not very complicated (I think at this time)
I have 1 question regarding the hardware configuration :
DELL
bi-processor 2.8GHz
4GB RAM
76GB HD using Raid 5
Linux version to be defined (Redhat ?)
Do you think this configuration is enough to have good performance after setting up properly the database ?
Do you thing the big tables should be splitted in order to have less columns. This could mean that I would have some queries with JOIN ?
Thank you for your help !
If you are looking for speed, I would make the whole thing as arrays in memory in C++, and just do backups to the database on a regular basis. If you need continuous non loss of data, or true SQL compatibility and or portablity in your application, that won't apply. Guillaume Houssay wrote: > I am setting up a project using APache, PHP and Postgresql. > This application will be used by about 30 users. > > The database is about this type : > > between 12GB and 15GB > 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and > the rest of float (20% of all the data will be 0) > the orther tables are less than 10 000 rows > > Most of the queries will be SELECT being not very complicated (I think > at this time) > > I have 1 question regarding the hardware configuration : > > DELL > bi-processor 2.8GHz > 4GB RAM > 76GB HD using Raid 5 > Linux version to be defined (Redhat ?) > > Do you think this configuration is enough to have good performance after > setting up properly the database ? > > Do you thing the big tables should be splitted in order to have less > columns. This could mean that I would have some queries with JOIN ? > > Thank you for your help !
<snip> > Linux version to be defined (Redhat ?) > > Do you think this configuration is enough to have good performance after setting up properly the database ? </snip> Don't ignore your OS choice and assume that hardware is all important. It's certainly important; but your choice of OS can have a big impact on your server too. That being said, you should look at as many different distros as possible, install a few, and pick the one which is best for your needs. For instance: (My Personal Choice:) If security is a big concern you may want to consider OpenBSD (www.openbsd.com). Because they are not based in the US they aren't restricted about what kind of cryptographic software they can include in the distro, so they tout proactive security measures. This includes, but is not limited to, ipsec -- a protocol that completely encrypts all of your packets.[0] OpenBSD also includes a bunch of other security goodies. Not only that, OpenBSD was recently awarded funding by DARPA -- The United States Defense Advanced Research Projects Agency[1] (http://www.darpa.mil/) -- because it's security is just THAT good. Slackware (www.slackware.com) and Debian (www.debian.org) allow you to live close to your hardware.[2] As a matter of fact, Debian allows you to select exactly which packages you want installed into the kernel when you install it[3]. I'm not sure how much you will reduce the overhead on a 2U w/ 4GB RAM, but a streamlined kernel can't hurt. Mandrake (www.mandrake.com) is super friendly for noobs, while still allowing you to do enough advanced things -- if and only if -- you want to. Very good if you've never used *nix at all before. NetBSD (www.netbsd.org) is extremely conservative about upgrading. If you are super worried about getting r00t3d this may be the distro for you. FreeBSD (www.freebsd.com) is a favorite of many, including apache, yahoo, sony, and a few I can't remember off the top of my head. So take your time deciding on an OS, download and install as many as possible, and take some time to think about your requirements. What kind of network are you using? It's relatively hard to splice into a hardwired network cable, but wireless lan (802.11?) is accessible by anybody in range with an antenna.[4] Better go with OpenBSD in that case -- or install a plethora of crypto software. Are you a noob? Better check out Mandrake or another user friendly distro. Are you an old hand with a PhD in Computer Engineering? Debian will seem like childs play and give you the kind of custimization only accessible to a PhD C.E. And, of course, if you're planning on doing extensive customization (i.e. code) you may want to consider a *BSD box -- because the GNU public license declares derivative works open source! (Don't want to your boss why the competitors can use your software)[5] [0] Some purists will no doubt point out that Mandrake, or another distro, comes with ipsec preinstalled. ipsec was no more then an example of one of the programs that a US based distro may not be able to include due to silly US restrictions on cryptography. (You may have visited the Penguin Liberation Front (http://plf.zarb.org/) to download code to play dvds on a *nix box, said code being unavailable on US servers due to illegality). One other thing I should mention about ipsec is that you need a client with ipsec in order to take advantage of encrypted package. This could mean setting up an 802.11b or 802.11g network and running OpenBSD on all the clients (or just clients with ipsec installed). Some of the other features of OpenBSD may require that the clients have OpenBSD installed too -- so if you're just looking at random passerbys on the 'net looking into your database, forget about it. [1] This is as close as a wing of the American Military comes to equivalency to James Bond's "Q Branch". [2] i.e. /Not/ for newbies. [3] Although I can't confirm it for a fact, I would assume that most distros make educated guesses about what you will and won't need. And I think every other distro lets you use custom compiled kernels. However, there is something satisfying about running through the list of kernel modules and selecting exactly what you want. [4] There is a way to require a key to access the network, but I /think/ there was an article on slashdot (www.slashdot.com) a while ago about it being hacked. [5] Of course the competition would actually have to get a hold of the software first, but, hey, why take chances? Hope that helps, -- Daniel R. Anderson Great Lakes Industries, Inc. 80 Pineview Ave. Buffalo, NY 14218 (716) 691-5900 x218 "Never let your schooling interfere with your education" -- Mark Twain
On Fri, 2003-03-21 at 15:28, Dennis Gearon wrote: > If you are looking for speed, I would make the whole thing as arrays in memory > in C++, and just do backups to the database on a regular basis. You'd suggest storing "12 to 15GB" of data in main memory on an x86 machine with 4GB of RAM? > Guillaume Houssay wrote: > > 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and > > the rest of float (20% of all the data will be 0) 1,000 columns? That doesn't sound like the result of good database design... And if you'd like to try micro-optimizations, multiple NULL values in a single tuple are stored efficiently -- so if those "0" values show up more than once per tuple, consider storing them in the DB as NULL and then converting them back to 0 (perhaps using COALESCE) on output. > > DELL > > bi-processor 2.8GHz > > 4GB RAM > > 76GB HD using Raid 5 > > Linux version to be defined (Redhat ?) > > > > Do you think this configuration is enough to have good performance after > > setting up properly the database ? Without telling us more information on how frequently your clients are going to be accessing the DB, it's really impossible to say. Cheers, Neil
<snip> > > If you are looking for speed, I would make the whole thing as arrays in memory > > in C++, and just do backups to the database on a regular basis. > > You'd suggest storing "12 to 15GB" of data in main memory on an x86 > machine with 4GB of RAM? </snip> If I remember correctly, Sparc based computers can hold insane amounts of memory. They're insanely expensive though; even the decade old ones on e-bay. I don't suppose it's possible to RAID (or is it Beowolf?) enough computers together to get a giant RAM Disk? -- Daniel R. Anderson Great Lakes Industries, Inc. 80 Pineview Ave. Buffalo, NY 14218 (716) 691-5900 x218 "Never let your schooling interfere with your education" -- Mark Twain
Ever hear of swap space? Your application couldn't possibly be working on all 12 gig at one time. So what it is working on would be in memory. But, this is only if you can tolerate the loss of LOTS of data in the event of power failure, memory corrtiptons etx. Neil Conway wrote: > > On Fri, 2003-03-21 at 15:28, Dennis Gearon wrote: > > If you are looking for speed, I would make the whole thing as arrays in memory > > in C++, and just do backups to the database on a regular basis. > > You'd suggest storing "12 to 15GB" of data in main memory on an x86 > machine with 4GB of RAM? > > > Guillaume Houssay wrote: > > > 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and > > > the rest of float (20% of all the data will be 0) > > 1,000 columns? That doesn't sound like the result of good database > design... > > And if you'd like to try micro-optimizations, multiple NULL values in a > single tuple are stored efficiently -- so if those "0" values show up > more than once per tuple, consider storing them in the DB as NULL and > then converting them back to 0 (perhaps using COALESCE) on output. > > > > DELL > > > bi-processor 2.8GHz > > > 4GB RAM > > > 76GB HD using Raid 5 > > > Linux version to be defined (Redhat ?) > > > > > > Do you think this configuration is enough to have good performance after > > > setting up properly the database ? > > Without telling us more information on how frequently your clients are > going to be accessing the DB, it's really impossible to say. > > Cheers, > > Neil