Re: large database - Mailing list pgsql-general
From | Chris Travers |
---|---|
Subject | Re: large database |
Date | |
Msg-id | CAKt_ZfsY3qf+mR5tDrV_Oj_4fa+MrM5q2rNLRK_oNP92df+C-A@mail.gmail.com Whole thread Raw |
In response to | large database ("Mihai Popa" <mihai@lattica.com>) |
List | pgsql-general |
On Mon, Dec 10, 2012 at 12:26 PM, Mihai Popa <mihai@lattica.com> wrote:
Hi,
I've recently inherited a project that involves importing a large set of
Access mdb files into a Postgres or MySQL database.
The process is to export the mdb's to comma separated files than import
those into the final database.
We are now at the point where the csv files are all created and amount
to some 300 GB of data.
Ok, this doesn't sound like anything that would require MySQL-only featuers.
I would like to get some advice on the best deployment option.
First, the project has been started using MySQL. Is it worth switching
to Postgres and if so, which version should I use?
Probably. I say probably because there are costs. See below.
Second, where should I deploy it? The cloud or a dedicated box?
The cloud is best for low-end db's. Once you have high disk I/O it breaks down. Large db's suggest significant disk I/O.
Amazon seems like the sensible choice; you can scale it up and down as
needed and backup is handled automatically.
Keep in mind there is a huge complexity cost there, when it comes to maintaining adequate performance.
I was thinking of an x-large RDS instance with 10000 IOPS and 1 TB of
storage. Would this do, or will I end up with a larger/ more expensive
instance?
Alternatively I looked at a Dell server with 32 GB of RAM and some
really good hard drives. But such a box does not come cheap and I don't
want to keep the pieces if it doesn't cut it
Without knowing your expected usage pattern, estimating memory needed. I would however suggest the first thing to do is to estimate the frequently used set of memory from normal operations and then multiply that by a reasonable factor to give you a reasonable assurance of the data you want being in cache.
Secondly you want to look at expected number of parallel queries and plan number of processors around that.
As for costs of switching there are some other things you need to be aware of:
1: MySQL and PostgreSQL have completely different assumptions about standard table usage, and these impact indexing strategies and bulk data performance. You can expect to do some relearning regarding performance tuning and indexing (typically you create a lot more indexes in MySQL than in PostgreSQL. Note this typically makes bulk inserts a bit faster on PostgreSQL).
2: MySQL and PostgreSQL have completely different assumptions about what you are doing with your data. MySQL assumes it's a single app db. PostgreSQL assumes it is a multi-app db. This explains a lot of sql_mode weirdness in MySQL and why the PostgreSQL team will never go down that path. This means also typically you have more mature tools for manipulating data inside the db (via views, functions, and the like) than you do in MySQL, but MySQL has some additional flexibility if you are just porting to it from another db.
If it were me I would definitely migrate. It sounds like these are internal line of business db's, and they may come from different MS Access apps. For this reason, I think you will have more options regarding centralizing your code and better managing your front-end access (I assume through MS Access?) in PostgreSQL than you would in MySQL. However the choice is yours.
Best Wishes,
Chris Travers
pgsql-general by date: