Re: Perfomance Tuning - Mailing list pgsql-performance

From Christopher Browne
Subject Re: Perfomance Tuning
Date
Msg-id 603cg5v15d.fsf@dev6.int.libertyrms.info
Whole thread Raw
In response to Re: Perfomance Tuning  (Christopher Browne <cbbrowne@acm.org>)
Responses Re: Perfomance Tuning
Re: Perfomance Tuning
List pgsql-performance
threshar@torgo.978.org (Jeff) writes:
> On Tue, 12 Aug 2003, Christopher Browne wrote:
>> Are you _certain_ that's still true?  Have you a metric that shows
>> Informix being 10x faster on a modern system?  That would be quite
>> surprising...

> We were forced (for budget reason) to switch from raw disk to cooked
> files on our informix db. We took a huge hit - about 5-6x slower.
> Granted part of that was because informix takes number of spindles,
> etc into account when generating query plans and the fact running
> UPDATE STATISTICS (think Vacuum analyze) on the version we run locks
> the table exclusively. And it is unacceptable to have our "main
> table" unavailable for hours and hours while the update runs. (For
> the record: its a 8cpu sun e4500 running sol2.6.  The raw disks were
> on a hitachi fibre array and the cooked files were on a raid5
> (scsi). Forget how many spindles in the raid.  There were 20 raw
> disks)

Sounds like what you were forced to do was to do TWO things:

 1.  Switch from raw disk to cooked files, and
 2.  Switch from a fibre array to a RAID array

You're attributing the 5-6x slowdown to 1., when it seems likely that
2. is a far more significant multiple.

What with there being TWO big changes that took place that might be
expected to affect performance, it seems odd to attribute a
factor-of-many change to just one aspect of that.

> Informix, etc. have spent a lot of time and money working on it.
> They also have the advantage of having many paid fulltime developers
> who are doing this for a job, not as a weekend hobby (Compared to
> the what? 2-3 full time PG developers).

<flame on>
Sure, and I'm sure the PG developers hardly know _anything_ about
implementing databases, either.
<flame off>

Certainly IBM (who bought Informix) has lots of time and money to
devote to enhancements.  But I think you underestimate the time,
skill, and effort involved with PG work.  It's quite typical for
people to imagine free software projects to basically be free-wheeling
efforts mostly involving guys that still have acne that haven't much
knowledge of the area.  Reality, for the projects that are of some
importance, is staggeringly different from that.  The number of people
with graduate degrees tends to surprise everyone.

The developers may not have time to add frivolous things to the
system, like building sophisticated Java-based GUI installers, XML
processors, or such.  That does, however, improve their focus, and so
PostgreSQL does not suffer from the way Oracle has fifty different
bundlings most of which nobody understands.

> The other advantage (which I hinted to above) with raw disks is
> being able to optimize queries to take advantage of it.  Informix is
> multithreaded and it will spawn off multiple "readers" to do say, a
> seq scan (and merge the results at the end).
>
> So if you have a table across say, 3 disks and you need to do a seq
> scan it will spawn three readers to do the read. Result: nice and
> fast (Yes, It may not always spawn the three readers, only when it
> thinks it will be a good thing to do)

Andrew Sullivan's fairly regular response is that he tried (albeit not
VASTLY extensively) to distinguish between disks when working with
fibre arrays, and he couldn't measure an improvement in shifting WAL
(the OBVIOUS thing to shift) to separate disks.

There's a lot of guesswork as to precisely why that result falls out.

One of the better guesses seems to be that if you've got enough
battery-backed memory cache on the array, that lets updates get pushed
to cache so fast that it doesn't too much matter which disk they hit.

If you've got enough spindles, and build much of the array in a
striped manner, you'll get data splitting across disks without having
to specify any "table options" to force it to happen.

You raise a good point vis-a-vis the thought of spawning multiple
readers; that could conceivably be a useful approach to improve
performance for very large queries.  If you could "stripe" the tables
in some manner so they could be doled out to multiple worker
processes, that could indeed provide some benefits.  If there are
three workers, they might round-robin to grab successive pages from
the table to do their work, and then end with a merge step.

That's probably a 7.7 change, mind you :-), but once other simpler
approaches to making the engine faster have been exhausted, that's the
sort of thing to look into next.

> I think for PG the effort would be much better spent on other
> features...  like replication and whatnot.

At this point, sure.
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/lisp.html
"Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons."
-- Author Unknown

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Perfomance Tuning
Next
From: Ron Johnson
Date:
Subject: Re: Perfomance Tuning