Thread: How to handle a large DB and simultaneous accesses?

How to handle a large DB and simultaneous accesses?

From
"Charles A. Landemaine"
Date:
Hello,

I have to develop a companies search engine (looks like the Yellow
pages). We're using PostgreSQL at the company, and the initial DB is
2GB large, as it
has companies from the entire world, with a fair amount of information.

What reading do you suggest so that we can develop the search engine
core, in order that the result pages show up instantly, no matter the
heavy load and
the DB size. The DB is 2GB but should grow to up to 10GB in 2 years,
and there should be 250,000 unique visitors per month by the end of
the year.

Are there special techniques? Maybe there's a way to sort of cache
search results? We're using PHP5 + phpAccelerator.
Thanks,

--
Charles A. Landemaine.

Re: How to handle a large DB and simultaneous accesses?

From
David Lang
Date:
On Tue, 10 Jan 2006, Charles A. Landemaine wrote:

> Hello,
>
> I have to develop a companies search engine (looks like the Yellow
> pages). We're using PostgreSQL at the company, and the initial DB is
> 2GB large, as it
> has companies from the entire world, with a fair amount of information.
>
> What reading do you suggest so that we can develop the search engine
> core, in order that the result pages show up instantly, no matter the
> heavy load and
> the DB size. The DB is 2GB but should grow to up to 10GB in 2 years,
> and there should be 250,000 unique visitors per month by the end of
> the year.
>
> Are there special techniques? Maybe there's a way to sort of cache
> search results? We're using PHP5 + phpAccelerator.
> Thanks,

frankly that is a small enough chunk of data compared to available memory
sizes that I think your best bet is to plan to have enough ram that you
only do disk I/O to write and on boot.

a dual socket Opteron system can hold 16G with 2G memory modules (32G as
4G modules become readily available over the next couple of years). this
should be enough to keep your data and indexes in ram at all times. if you
find that other system processes push the data out of ram consider loading
the data from disk to a ramfs filesystem, just make sure you don't update
the ram-only copy (or if you do that you have replication setup to
replicate from the ram copy to a copy on real disks somewhere). depending
on your load you could go with single core or dual core chips (and the
cpu's are a small enough cost compared to this much ram that you may as
well go with the dual core cpu's)

now even with your data in ram you can slow down if your queries, indexes,
and other settings are wrong, but if performance is important you should
be able to essentially eliminate disks for databases of this size.

David Lang