On Mon, Dec 10, 2012 at 12:26 PM, Mihai Popa <mihai@lattica.com> wrote:
> Hi,
>
> I've recently inherited a project that involves importing a large set of
> Access mdb files into a Postgres or MySQL database.
> The process is to export the mdb's to comma separated files than import
> those into the final database.
> We are now at the point where the csv files are all created and amount
> to some 300 GB of data.
Compressed or uncompressed?
> I would like to get some advice on the best deployment option.
>
> First, the project has been started using MySQL. Is it worth switching
> to Postgres and if so, which version should I use?
Why did you originally choose MySQL? What has changed that causes you
to rethink that decision? Does your team have experience with MySQL
but not with PostgreSQL?
I like PostgreSQL, of course, but if I already had an
already-functioning app on MySQL I'd be reluctant to change it.
If I were going to do so, though, I'd use 9.2. No reason to develop
against something other than the latest stable version.
> Second, where should I deploy it? The cloud or a dedicated box?
>
> Amazon seems like the sensible choice; you can scale it up and down as
> needed and backup is handled automatically.
> I was thinking of an x-large RDS instance with 10000 IOPS and 1 TB of
> storage. Would this do, or will I end up with a larger/ more expensive
> instance?
My understanding is that RDS does not support Postgres, so if you go
that route the decision is already made for you. Or am I wrong here?
1TB of storage sounds desperately small for loading 300GB of csv files.
IOPS would mostly depend on how you are using the system, not how large it is.
> Alternatively I looked at a Dell server with 32 GB of RAM and some
> really good hard drives. But such a box does not come cheap and I don't
> want to keep the pieces if it doesn't cut it
xlarge RDS with 1TB of storage and 10000 iops doesn't come cheap, either.
Cheers,
Jeff