Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance
From | Christopher Browne |
---|---|
Subject | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables |
Date | |
Msg-id | m3brgajzh5.fsf@wolfe.cbbrowne.com Whole thread Raw |
In response to | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables (Christopher Browne <cbbrowne@acm.org>) |
Responses |
Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
|
List | pgsql-performance |
A long time ago, in a galaxy far, far away, mcotner@yahoo.com (Mark Cotner) wrote: > Agreed, I did some preliminary testing today and am very impressed. > I wasn't used to running analyze after a data load, but once I did > that everything was snappy. Something worth observing is that this is true for _any_ of the database systems supporting a "cost-based" optimization system, including Oracle and DB2. When working with SAP R/3 Payroll, on one project, we found that when the system was empty of data, the first few employee creates were quick enough, but it almost immediately got excruciatingly slow. One of the DBAs told the Oracle instance underneath to collect statistics on the main table, and things _immediately_ got snappy again. But it didn't get snappy until the conversion folk had run the conversion process for several minutes, to the point to which it would get painfully slow :-(. There, with MILLIONS of dollars worth of license fees being paid, across the various vendors, it still took a fair bit of manual fiddling. MySQL(tm) is just starting to get into cost-based optimization; in that area, they're moving from where the "big DBs" were about 10 years ago. It was either version 7 or 8 where Oracle started moving to cost-based optimization, and (as with the anecdote above) it took a release or two for people to get accustomed to the need to 'feed' the optimizer with statistics. This is a "growing pain" that bites users with any database where this optimization gets introduced. It's worthwhile, but is certainly not costless. I expect some forseeable surprises will be forthcoming for MySQL AB's customers in this regard... > My best results from MySQL bulk inserts was around 36k rows per > second on a fairly wide table. Today I got 42k using the COPY > command, but with the analyze post insert the results were similar. > These are excellent numbers. It basically means we could have our > cake(great features) and eat it too(performance that's good enough > to run the app). In the end, performance for inserts is always fundamentally based on how much disk I/O there is, and so it should come as no shock that when roughly the same amount of data is getting laid down on disk, performance won't differ much on these sorts of essentials. There are a few places where there's some need for cleverness; if you see particular queries running unusually slowly, it's worth doing an EXPLAIN or EXPLAIN ANALYZE on them, to see how the query plans are being generated. There's some collected wisdom out here on how to encourage the right plans. There are also unexpected results that are OK. We did a system upgrade a few days ago that led to one of the tables starting out totally empty. A summary report that looks at that table wound up with a pretty wacky looking query plan (compared to what's usual) because the postmaster knew that the query would be reading in essentially the entire table. You'd normally expect an index scan, looking for data for particular dates. In this case, it did a "scan the whole table; filter out a few irrelevant entries" plan. It looked wacky, compared to what's usual, but it ran in about 2 seconds, which was way FASTER than what's usual. So the plan was exactly the right one. Telling the difference between the right plan and a poor one is a bit of an art; we quite regularly take a look at query plans on this list to figure out what might be not quite right. If you find slow ones, make sure you have run ANALYZE on the tables recently, to be sure that the plans are sane, and you may want to consider posting some of them to see if others can point to improvements that can be made. -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://linuxfinances.info/info/linuxdistributions.html "I can't believe my room doesn't have Ethernet! Why wasn't it wired when the house was built?" "The house was built in 1576." -- Alex Kamilewicz on the Oxford breed of `conference American.'
pgsql-performance by date: