Re: Perfomance Tuning - Mailing list pgsql-performance
From | Christopher Browne |
---|---|
Subject | Re: Perfomance Tuning |
Date | |
Msg-id | 604r0ki8wy.fsf@dev6.int.libertyrms.info Whole thread Raw |
In response to | Re: Perfomance Tuning (Christopher Browne <cbbrowne@acm.org>) |
List | pgsql-performance |
threshar@torgo.978.org (Jeff) writes: > On Wed, 13 Aug 2003, Christopher Browne wrote: >> You raise a good point vis-a-vis the thought of spawning multiple >> readers; that could conceivably be a useful approach to improve >> performance for very large queries. If you could "stripe" the tables >> in some manner so they could be doled out to multiple worker >> processes, that could indeed provide some benefits. If there are >> three workers, they might round-robin to grab successive pages from >> the table to do their work, and then end with a merge step. > > The way informix does this is two fold: > 1. it handles the raw disks, it knows where table data is The thing is, this isn't something where there is guaranteed to be a permanent _massive_ difference in performance between "raw" and "cooked." Traditionally, "handling raw disks" was a big deal because the DBMS could then decide where to stick the data, possibly down to specifying what sector of what track of what spindle. There are four reasons for this to not be such a big deal anymore: 1. Disk drives lie to you. They don't necessarily provide information that even _resembles_ their true geometry. So the best you can get is to be sure that "this block was on drive 4, that block was on drive 7." 2. On a big system, you're more than likely using hardware RAID, where there's further cacheing, and where the disk array may not be honest to the DBMS about where the drives actually are. 3. The other traditional benefit to "raw" disks was that they allowed the DBMS to be _certain_ that data was committed in some particular order. But 1. and 2. provide regrettable opportunities for the DBMS' belief to be forlorn. (With the degree to which disk drives lie about things, I have to be a bit skeptical of some of the BSD FFS claims which at least appear to assume that they _do_ control the disk drive... This is NOT reason, by the way, to consider FFS to be, in any way, "bad," but rather just that some of the guarantees may get stolen by your disk drive...) 4. Today's filesystems _aren't_ Grandpa's UFS. We've got better stuff than we had back in the Ultrix days. > 2. it can "partition" tables in a number of ways: round robin, > concatination or expression (Expression is nifty, allows you to use a > basic "where" clause to decide where to put data. ie > create table foo ( > a int, > b int, > c int ) fragment on c > 0 and c < 100 in dbspace1, c > 100 c < 200 in > dbspace 2; > > that kind of thing. I remember thinking this was rather neat when I first saw it. The "fragment on" part was most interesting at the time, when everyone else (including filesystem makers) were decrying fragmentation as the ultimate evil. In effect, Informix was saying that they would _improve_ performance through fragmentation... Sort of like the rash claim that performance can be improved _without_ resorting to a threading-based model... > and yeah, I would not expect to see it for a long time.. Without > threading it would be rather difficult to implement.. but who knows > what the future will bring us. The typical assumption is that threading is a magical talisman that will bring all sorts of benefits. There have been enough cases where PostgreSQL has demonstrated stunning improvements _without_ threading that I am very skeptical that it is necessarily necessary. -- output = reverse("gro.gultn" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/sap.html Rules of the Evil Overlord #204. "I will hire an entire squad of blind guards. Not only is this in keeping with my status as an equal opportunity employer, but it will come in handy when the hero becomes invisible or douses my only light source." <http://www.eviloverlord.com/>
pgsql-performance by date: