Data Warehouse Reevaluation - MySQL vs Postgres - Mailing list pgsql-performance
From | Mark Cotner |
---|---|
Subject | Data Warehouse Reevaluation - MySQL vs Postgres |
Date | |
Msg-id | 20040912042442.49915.qmail@web41508.mail.yahoo.com Whole thread Raw |
Responses |
Re: Data Warehouse Reevaluation - MySQL vs Postgres
Re: Data Warehouse Reevaluation - MySQL vs Postgres Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables |
List | pgsql-performance |
Hi all, I had a difficult time deciding which list to post this to, so please forgive me if this list doesn't perfectly match my questions. My decision will not solely be based on performance, but it is the primary concern. I would be very appreciative if you all could comment on my test plan. Real world examples of a Postgres implementation of >=600G with a web front-end would be great, or any data warehouse with some size to it. The dilemma: The time has come to reevaluate/rearchitect an application which I built about 3 years ago. There are no performance concerns with MySQL, but it would benefit greatly from stored procedures, views, etc. It is a very large rolling data warehouse that inserts about 4.5 million rows every 2 hours and subsequently rolls this data off the back end of a 90 day window. A web interface has been designed for querying the data warehouse. Migration planning is much easier with views and stored procedures and this is my primary reason for evaluating Postgres once again. As the application grows I want to have the ability to provide backward compatible views for those who are accustomed to the current structure. This is not possible in MySQL. Some of the mining that we do could benefit from stored procedures as well. MySQL may have these in the works, but we won't be able to move to a version of MySQL that supports stored procs for another year or two. Requirements: Merge table definition equivalent. We use these extensively. Merge table equivalent with all tables containing over 100M rows(and about 40 columns, some quite wide) will need to do index scans in at least 5 seconds(MySQL currently does 2, but we can live with 5) and return ~200 rows. Um, gonna sound silly, but the web interface has to remain "snappy" under load. I don't see this as a major concern since you don't require table locking. If business logic is moved to the database(likely with Postgres) performance for inserting with light logic on each insert has to keep up with the 4.5M inserts per 2 hours(which MySQL completes in ~35min currently). Acceptable numbers for this aggregation would be 45-55min using stored procedures. About 3 years ago I did some performance characterizations of Postgres vs. MySQL and didn't feel Postgres was the best solution. 3 years later we've won runner-up for MySQL application of the year(behind Saabre). Oddly enough this reevaluting database strategy is right on the coattails of this award. I'll begin writing my business logic within the next week and start migrating test data shortly thereafter. Case studies would be very beneficial as I put together my analysis. Also, this is for a Fortune 500 company that uses this data warehouse extensively. It is an internal application that is widely used and gets about 4 hits per employee per day. Much of customer care, data engineering, plant engineering(it's a cable company), and marketing use the interface. I've done a great deal of press for MySQL and would be equally willing to tout the benefits of Postgres to trade rags, magazines, etc provided the results are favorable. Here's our case study if you're interested . . . http://www.mysql.com/customers/customer.php?id=16 Thoughts, suggestions? 'njoy, Mark
pgsql-performance by date: