Thread: configuration according to the database

configuration according to the database

From
"Guillaume Houssay"
Date:
I am setting up a project using APache, PHP and Postgresql.
This application will be used by about 30 users.
 
The database is about this type :
 
between 12GB and 15GB
4 tables will have 1Million rows and 1000 columns with 90% of INT2 and the rest of float (20% of all the data will be 0)
the orther tables are less than 10 000 rows
 
Most of the queries will be SELECT being not very complicated (I think at this time)
 
I have 1 question regarding the hardware configuration :
 
DELL
bi-processor 2.8GHz
4GB RAM
76GB HD using Raid 5
Linux version to be defined (Redhat ?)
 
Do you think this configuration is enough to have good performance after setting up properly the database ?
 
Do you thing the big tables should be splitted in order to have less columns. This could mean that I would have some queries with JOIN ?
 
Thank you for your help !

Re: configuration according to the database

From
Dennis Gearon
Date:
If you are looking for speed, I would make the whole thing as arrays in memory
in C++, and just do backups to the database on a regular basis.

If you need continuous non loss of data, or true SQL compatibility and or
portablity in your application, that won't apply.

Guillaume Houssay wrote:
> I am setting up a project using APache, PHP and Postgresql.
> This application will be used by about 30 users.
>
> The database is about this type :
>
> between 12GB and 15GB
> 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and
> the rest of float (20% of all the data will be 0)
> the orther tables are less than 10 000 rows
>
> Most of the queries will be SELECT being not very complicated (I think
> at this time)
>
> I have 1 question regarding the hardware configuration :
>
> DELL
> bi-processor 2.8GHz
> 4GB RAM
> 76GB HD using Raid 5
> Linux version to be defined (Redhat ?)
>
> Do you think this configuration is enough to have good performance after
> setting up properly the database ?
>
> Do you thing the big tables should be splitted in order to have less
> columns. This could mean that I would have some queries with JOIN ?
>
> Thank you for your help !


Re: configuration according to the database

From
"Daniel R. Anderson"
Date:
<snip>
> Linux version to be defined (Redhat ?)
>
> Do you think this configuration is enough to have good performance after setting up properly the database ?
</snip>

Don't ignore your OS choice and assume that hardware is all important.
It's certainly important; but your choice of OS can have a big impact on
your server too.  That being said, you should look at as many different
distros as possible, install a few, and pick the one which is best for
your needs.  For instance:

(My Personal Choice:)  If security is a big concern you may want to
consider OpenBSD (www.openbsd.com).  Because they are not based in the
US they aren't restricted about what kind of cryptographic software they
can include in the distro, so they tout proactive security measures.
This includes, but is not limited to, ipsec -- a protocol that
completely encrypts all of your packets.[0]  OpenBSD also includes a
bunch of other security goodies.  Not only that, OpenBSD was recently
awarded funding by DARPA -- The United States Defense Advanced Research
Projects Agency[1] (http://www.darpa.mil/) -- because it's security is
just THAT good.

Slackware (www.slackware.com) and Debian (www.debian.org) allow you to
live close to your hardware.[2]  As a matter of fact, Debian allows you
to select exactly which packages you want installed into the kernel when
you install it[3].  I'm not sure how much you will reduce the overhead
on a 2U w/ 4GB RAM, but a streamlined kernel can't hurt.

Mandrake (www.mandrake.com) is super friendly for noobs, while still
allowing you to do enough advanced things -- if and only if -- you want
to.  Very good if you've never used *nix at all before.

NetBSD (www.netbsd.org) is extremely conservative about upgrading.  If
you are super worried about getting r00t3d this may be the distro for
you.

FreeBSD (www.freebsd.com) is a favorite of many, including apache,
yahoo, sony, and a few I can't remember off the top of my head.

So take your time deciding on an OS, download and install as many as
possible, and take some time to think about your requirements.  What
kind of network are you using?  It's relatively hard to splice into a
hardwired network cable, but wireless lan (802.11?) is accessible by
anybody in range with an antenna.[4]  Better go with OpenBSD in that
case -- or install a plethora of crypto software.  Are you a noob?
Better check out Mandrake or another user friendly distro.  Are you an
old hand with a PhD in Computer Engineering?  Debian will seem like
childs play and give you the kind of custimization only accessible to a
PhD C.E.  And, of course, if you're planning on doing extensive
customization (i.e. code) you may want to consider a *BSD box -- because
the GNU public license declares derivative works open source!  (Don't
want to your boss why the competitors can use your software)[5]


[0]  Some purists will no doubt point out that Mandrake, or another
distro, comes with ipsec preinstalled.  ipsec was no more then an
example of one of the programs that a US based distro may not be able to
include due to silly US restrictions on cryptography.  (You may have
visited the Penguin Liberation Front (http://plf.zarb.org/) to download
code to play dvds on a *nix box, said code being unavailable on US
servers due to illegality).

     One other thing I should mention about ipsec is that you need a
client with ipsec in order to take advantage of encrypted package.  This
could mean setting up an 802.11b or 802.11g network and running OpenBSD
on all the clients (or just clients with ipsec installed).  Some of the
other features of OpenBSD may require that the clients have OpenBSD
installed too -- so if you're just looking at random passerbys on the
'net looking into your database, forget about it.


[1]  This is as close as a wing of the American Military comes to
equivalency to James Bond's "Q Branch".

[2]  i.e. /Not/ for newbies.

[3]  Although I can't confirm it for a fact, I would assume that most
distros make educated guesses about what you will and won't need.  And I
think every other distro lets you use custom compiled kernels.  However,
there is something satisfying about running through the list of kernel
modules and selecting exactly what you want.

[4]  There is a way to require a key to access the network, but I
/think/ there was an article on slashdot (www.slashdot.com) a while ago
about it being hacked.

[5]  Of course the competition would actually have to get a hold of the
software first, but, hey, why take chances?

Hope that helps,

--
Daniel R. Anderson
Great Lakes Industries, Inc.
80 Pineview Ave.
Buffalo, NY 14218
(716) 691-5900 x218

"Never let your schooling interfere with your education"
    -- Mark Twain


Re: configuration according to the database

From
Neil Conway
Date:
On Fri, 2003-03-21 at 15:28, Dennis Gearon wrote:
> If you are looking for speed, I would make the whole thing as arrays in memory
> in C++, and just do backups to the database on a regular basis.

You'd suggest storing "12 to 15GB" of data in main memory on an x86
machine with 4GB of RAM?

> Guillaume Houssay wrote:
> > 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and
> > the rest of float (20% of all the data will be 0)

1,000 columns? That doesn't sound like the result of good database
design...

And if you'd like to try micro-optimizations, multiple NULL values in a
single tuple are stored efficiently -- so if those "0" values show up
more than once per tuple, consider storing them in the DB as NULL and
then converting them back to 0 (perhaps using COALESCE) on output.

> > DELL
> > bi-processor 2.8GHz
> > 4GB RAM
> > 76GB HD using Raid 5
> > Linux version to be defined (Redhat ?)
> >
> > Do you think this configuration is enough to have good performance after
> > setting up properly the database ?

Without telling us more information on how frequently your clients are
going to be accessing the DB, it's really impossible to say.

Cheers,

Neil


Re: configuration according to the database

From
"Daniel R. Anderson"
Date:
<snip>
> > If you are looking for speed, I would make the whole thing as arrays in memory
> > in C++, and just do backups to the database on a regular basis.
>
> You'd suggest storing "12 to 15GB" of data in main memory on an x86
> machine with 4GB of RAM?
</snip>

If I remember correctly, Sparc based computers can hold insane amounts
of memory.  They're insanely expensive though; even the decade old ones
on e-bay.  I don't suppose it's possible to RAID (or is it Beowolf?)
enough computers together to get a giant RAM Disk?

--
Daniel R. Anderson
Great Lakes Industries, Inc.
80 Pineview Ave.
Buffalo, NY 14218
(716) 691-5900 x218

"Never let your schooling interfere with your education"
    -- Mark Twain


Re: configuration according to the database

From
Dennis Gearon
Date:
Ever hear of swap space? Your application couldn't possibly be working
on all 12 gig at one time. So what it is working on would be in memory.
But, this is only if you can tolerate the loss of LOTS of data in the
event of power failure, memory corrtiptons etx.

Neil Conway wrote:
>
> On Fri, 2003-03-21 at 15:28, Dennis Gearon wrote:
> > If you are looking for speed, I would make the whole thing as arrays in memory
> > in C++, and just do backups to the database on a regular basis.
>
> You'd suggest storing "12 to 15GB" of data in main memory on an x86
> machine with 4GB of RAM?
>
> > Guillaume Houssay wrote:
> > > 4 tables will have 1Million rows and 1000 columns with 90% of INT2 and
> > > the rest of float (20% of all the data will be 0)
>
> 1,000 columns? That doesn't sound like the result of good database
> design...
>
> And if you'd like to try micro-optimizations, multiple NULL values in a
> single tuple are stored efficiently -- so if those "0" values show up
> more than once per tuple, consider storing them in the DB as NULL and
> then converting them back to 0 (perhaps using COALESCE) on output.
>
> > > DELL
> > > bi-processor 2.8GHz
> > > 4GB RAM
> > > 76GB HD using Raid 5
> > > Linux version to be defined (Redhat ?)
> > >
> > > Do you think this configuration is enough to have good performance after
> > > setting up properly the database ?
>
> Without telling us more information on how frequently your clients are
> going to be accessing the DB, it's really impossible to say.
>
> Cheers,
>
> Neil