Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance

From Mark Cotner
Subject Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date
Msg-id 20040914073943.95193.qmail@web41502.mail.yahoo.com
Whole thread Raw
In response to Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables  (Christopher Browne <cbbrowne@acm.org>)
Responses Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
List pgsql-performance
You all have been so very helpful so far and I really
appreciate it.

The data in these tables is thankfully static since
they are logging tables and an analyze only takes
about 4 minutes for the largest of them.

I've finished porting the schema and am importing the
data now.  My estimates for just two-thirds(60 of the
90 days) of one of our 30 cable systems(MySQL dbs) is
estimated to take about 16 hours.  This may seem like
a lot, but I'm satisfied with the performance.  I've
created a slightly normalized version and some stored
procedures to help me normalize the data.  When this
finishes I'm going to query the data as is with the
views as you suggested, and I'm going to create views
for the normalized version to test that as well.  This
will then be contrasted to the MySQL query results and
I plan to write a white paper of my findings.

I don't have any concerns that Postgres will do fine,
but if I run into any performance problems I'll be
sure and post them here first.

It should be noted that our development life cycle is
currently severely hindered by lack of features in
MySQL like views and stored procedures.  Frankly I've
implemented some pretty ugly SQL using as many as 5
temp tables to generate a result set with MySQL.
Having stored procedures and views is going to help us
tremendously.  This performance evaluation is to
verify that Postgres can handle what we're going to
throw at it, not to find out if it's faster in
milliseconds than MySQL.  We love the speed and ease
of maintenance with MySQL, but have simply outgrown
it.  This will be reflected in the white paper.

I have already imported our customer tables, which
aren't too small(2.4M rows x 3 tables), and stuck a
view in front of it.  The view queried faster than
MySQL would query a pre-joined flat table.

Getting carried away . . . needless to say I'm really
excited about the possiblity of Postgres, but I won't
bore you with the details just yet.  I'll send the
link out to the white paper so you all can review it
before I send it anywhere else.  If anything could
have been optimized more please let me know and I'll
see that it gets updated before it's widely published.

Thanks again for all the great feedback!

'njoy,
Mark

--- Christopher Browne <cbbrowne@acm.org> wrote:

> A long time ago, in a galaxy far, far away,
> mcotner@yahoo.com (Mark Cotner) wrote:
> > Agreed, I did some preliminary testing today and
> am very impressed.
> > I wasn't used to running analyze after a data
> load, but once I did
> > that everything was snappy.
>
> Something worth observing is that this is true for
> _any_ of the
> database systems supporting a "cost-based"
> optimization system,
> including Oracle and DB2.
>
> When working with SAP R/3 Payroll, on one project,
> we found that when
> the system was empty of data, the first few employee
> creates were
> quick enough, but it almost immediately got
> excruciatingly slow.  One
> of the DBAs told the Oracle instance underneath to
> collect statistics
> on the main table, and things _immediately_ got
> snappy again.  But it
> didn't get snappy until the conversion folk had run
> the conversion
> process for several minutes, to the point to which
> it would get
> painfully slow :-(.  There, with MILLIONS of dollars
> worth of license
> fees being paid, across the various vendors, it
> still took a fair bit
> of manual fiddling.
>
> MySQL(tm) is just starting to get into cost-based
> optimization; in
> that area, they're moving from where the "big DBs"
> were about 10 years
> ago.  It was either version 7 or 8 where Oracle
> started moving to
> cost-based optimization, and (as with the anecdote
> above) it took a
> release or two for people to get accustomed to the
> need to 'feed' the
> optimizer with statistics.  This is a "growing pain"
> that bites users
> with any database where this optimization gets
> introduced.  It's
> worthwhile, but is certainly not costless.
>
> I expect some forseeable surprises will be
> forthcoming for MySQL AB's
> customers in this regard...
>
> > My best results from MySQL bulk inserts was around
> 36k rows per
> > second on a fairly wide table.  Today I got 42k
> using the COPY
> > command, but with the analyze post insert the
> results were similar.
> > These are excellent numbers.  It basically means
> we could have our
> > cake(great features) and eat it too(performance
> that's good enough
> > to run the app).
>
> In the end, performance for inserts is always
> fundamentally based on
> how much disk I/O there is, and so it should come as
> no shock that
> when roughly the same amount of data is getting laid
> down on disk,
> performance won't differ much on these sorts of
> essentials.
>
> There are a few places where there's some need for
> cleverness; if you
> see particular queries running unusually slowly,
> it's worth doing an
> EXPLAIN or EXPLAIN ANALYZE on them, to see how the
> query plans are
> being generated.  There's some collected wisdom out
> here on how to
> encourage the right plans.
>
> There are also unexpected results that are OK.  We
> did a system
> upgrade a few days ago that led to one of the tables
> starting out
> totally empty.  A summary report that looks at that
> table wound up
> with a pretty wacky looking query plan (compared to
> what's usual)
> because the postmaster knew that the query would be
> reading in
> essentially the entire table.  You'd normally expect
> an index scan,
> looking for data for particular dates.  In this
> case, it did a "scan
> the whole table; filter out a few irrelevant
> entries" plan.
>
> It looked wacky, compared to what's usual, but it
> ran in about 2
> seconds, which was way FASTER than what's usual.  So
> the plan was
> exactly the right one.
>
> Telling the difference between the right plan and a
> poor one is a bit
> of an art; we quite regularly take a look at query
> plans on this list
> to figure out what might be not quite right.  If you
> find slow ones,
> make sure you have run ANALYZE on the tables
> recently, to be sure that
> the plans are sane, and you may want to consider
> posting some of them
> to see if others can point to improvements that can
> be made.
> --
> If this was helpful,
> <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
>
http://linuxfinances.info/info/linuxdistributions.html
> "I can't believe my room doesn't have Ethernet!  Why
> wasn't it wired
> when the house was built?"
> "The house was built in 1576."
> -- Alex Kamilewicz on the Oxford breed of
> `conference American.'
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map
> settings
>


pgsql-performance by date:

Previous
From: Stephen Crowley
Date:
Subject: Re: Large # of rows in query extremely slow, not using index
Next
From: Pierre-Frédéric Caillaud
Date:
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables