Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance
From | Mischa Sandberg |
---|---|
Subject | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables |
Date | |
Msg-id | pB21d.193816$X12.122954@edtnps84 Whole thread Raw |
In response to | Data Warehouse Reevaluation - MySQL vs Postgres (Mark Cotner <mcotner@yahoo.com>) |
Responses |
Re: Data Warehouse Reevaluation - MySQL vs Postgres --
|
List | pgsql-performance |
Mark Cotner wrote: > Hi all, > I had a difficult time deciding which list to post > this to, so please forgive me if this list doesn't > perfectly match my questions. My decision will not > solely be based on performance, but it is the primary > concern. I would be very appreciative if you all > could comment on my test plan. Real world examples of > a Postgres implementation of >=600G with a web > front-end would be great, or any data warehouse with > some size to it. I'm only in the 30GB range of database, in case that's a consideration for my comments that follow. At this time, I'm working out the best ROLAP options for our PG transaction store. The transaction store is highly volatile -- longest a txn stays in it is 15 days ... so you imagine the need for historic summaries :-) I've also implemented multiple data servers, including a federated server that had to make the best of existing engines (like MySQL, PG and everything from MSJet to RedBrick in the commercial world). > The time has come to reevaluate/rearchitect an > application which I built about 3 years ago. There > are no performance concerns with MySQL, but it would > benefit greatly from stored procedures, views, etc. If your company is currently happy with MySQL, there probably are other (nontechnical) reasons to stick with it. I'm impressed that you'd consider reconsidering PG. > Some of the mining that we do could benefit from > stored procedures as well. MySQL may have these in > the works, but we won't be able to move to a version > of MySQL that supports stored procs for another year > or two. And PG lets you back-end with some powerful pattern- and aggregate-handling languages, like Perl. This was definitely a plus for data mining of web traffic, for example. The power of server-side extensibility for bailing you out of a design dead-end is not inconsequential. PG doesn't have PIVOT operators (qv Oracle and MSSQL), but it makes the translation from data to column fairly painless otherwise. > Requirements: > Merge table definition equivalent. We use these > extensively. Looked all over mysql.com etc, and afaics merge table is indeed exactly a view of a union-all. Is that right? PG supports views, of course, as well (now) as tablespaces, allowing you to split tables/tablesets across multiple disk systems. PG is also pretty efficient in query plans on such views, where (say) you make one column a constant (identifier, sort of) per input table. > Merge table equivalent with all tables containing over > 100M rows(and about 40 columns, some quite wide) will > need to do index scans in at least 5 seconds(MySQL > currently does 2, but we can live with 5) and return > ~200 rows. PG has TOAST for handling REALLY BIG columns, and the generic TEXT type is as efficient as any size-specific VARCHAR() type ... should make things easier for you. > Um, gonna sound silly, but the web interface has to > remain "snappy" under load. I don't see this as a > major concern since you don't require table locking. Agreed. It's more in your warehouse design, and intelligent bounding of queries. I'd say PG's query analyzer is a few years ahead of MySQL for large and complex queries. > If business logic is moved to the database(likely with > Postgres) performance for inserting with light logic > on each insert has to keep up with the 4.5M inserts > per 2 hours(which MySQL completes in ~35min > currently). Acceptable numbers for this aggregation > would be 45-55min using stored procedures. Again, it's a matter of pipeline design. The tools for creating an efficient pipeline are at least as good in PG as MySQL. If you try to insert and postprocess information one row at a time, procedures or no, there's no offhand way to guarantee your performance without a test/prototype. On the other hand, if you do warehouse-style loading (Insert, or PG COPY, into a temp table; and then 'upsert' into the perm table), I can guarantee 2500 inserts/sec is no problem. > Here's our case study if you're interested . . . > http://www.mysql.com/customers/customer.php?id=16
pgsql-performance by date: