Re: large database - Mailing list pgsql-general

From Tony CL Chan
Subject Re: large database
Date
Msg-id DD79AEAA-64B9-400B-BEB6-10D2F2319A76@emblocsoft.com
Whole thread Raw
In response to Re: large database  (Bill Moran <wmoran@potentialtech.com>)
List pgsql-general
Hi, 

If you have big table you could also think about Hadoop/HBase or Cassandra but do not put large data set in MySQL. I agree with Bill that "Despite the fact that lots of people have been able to make it (MySQL) work" (me too, another example), there are issues with it.  I have been using MySQL for a number of years, using it to handle large DBs with large number of users,  the MySQL is the bottleneck, especially when running table joins for large data set, CPU and I/O load went up ......

If switching to PostgreSQL, PostgreSQL 9.1.x is very good choice for production deployment. 

Thanks
Tony


P.S.  Today I did some stress tests on my PostgreSQL staging server: a)  insert 2 billions records into the test table, b) full scan the table. here are some test results:  

Facts:  
Number of records: 2 billions records inserted today
Full table scan: about 16.76 minutes to scan 2 billions of rows, really AMAZING!  
Database size: 109GB
PostgrSQL: 9.2.1
Physical RAM: 8GB
CPU: i5 


########

EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=33849559.60..33849559.61 rows=1 width=0) (actual time=1006476.308..1006476.309 rows=1 loops=1)
->  Seq Scan on test  (cost=0.00..28849559.28 rows=2000000128 width=0) (actual time=47.147..903264.427 rows=2000000000 loops=1)
Total runtime: 1006507.963 ms





On 11 Dec 2012, at 8:27 PM, Bill Moran wrote:

On Mon, 10 Dec 2012 15:26:02 -0500 (EST) "Mihai Popa" <mihai@lattica.com> wrote:

Hi,

I've recently inherited a project that involves importing a large set of
Access mdb files into a Postgres or MySQL database.
The process is to export the mdb's to comma separated files than import
those into the final database.
We are now at the point where the csv files are all created and amount
to some 300 GB of data.

I would like to get some advice on the best deployment option.

First, the project has been started using MySQL. Is it worth switching
to Postgres and if so, which version should I use?

I've been managing a few large databases this year, on both PostgreSQL and
MySQL.

Don't put your data in MySQL.  Ever.  If you feel like you need to use
something like MySQL, just go straight to a system that was designed with
no constraints right off the bat, like Mongo or something.

Don't put large amounts of data in MySQL.  There are lots of issuse with it.
Despite the fact that lots of people have been able to make it work (me,
for example) it's a LOT harder to keep running well than it is on
PostgreSQL.  MySQL just isn't designed to deal with large data.  As some
examples: lack of CREATE INDEX CONCURRENTLY, the fact that the default
configuration stores everything in a single file, the fact that any table
changes (including simple things like adding a comment, or seemingly
unrelated things like adding an index) require a complete table rebuild,
and the fact that if you use anything other than INT AUTO_INCREMENT for
your primary key you're liable to hit on awful inefficiencies.

PostgreSQL has none of these problems.

--
Bill Moran <wmoran@potentialtech.com>


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Problem with aborting entire transactions on error
Next
From: David Johnston
Date:
Subject: Re: Problem with aborting entire transactions on error