Re: [HACKERS] Optimizer fails? - Mailing list pgsql-hackers
From | Michal Mosiewicz |
---|---|
Subject | Re: [HACKERS] Optimizer fails? |
Date | |
Msg-id | 351C26F6.F7ADD7B6@interdata.com.pl Whole thread Raw |
In response to | Re: [HACKERS] Optimizer fails? (dg@illustra.com (David Gould)) |
Responses |
Re: [HACKERS] Optimizer fails?
|
List | pgsql-hackers |
David Gould wrote: > > Now, let's note, that there has been only a few IO transfers by now. No > > more than few pages. And we have tupple identifiers pointing us to 64 > > records. Now we may sort this tids in ascending order to optimise IO. > > But, we do not do this tid sort. It really isn't easy as you might have > millions of tids, not just a few. Which would mean doing an external sort. > This might be a nice thing to do, but it isn't there now as far as I know. No, you don't need a full set. You may sort it in portions of the predefined size. You may even try ascending/descending order to optimise your hd heads movements, however it may be not very good idea, since it's against the read-ahead feature of most disk IO and it may negatively influence ORDER BY performance. Anyhow, you may accomplish sawtooth readings, that certainly decrease the access time. > > Everything took us 3 + 128 nodes from index + 64 records from table. > > This is defnitely better than reading all 1024 records. > > I am not trying to flame you, but it seems to me that you have some > misconceptions about how the indexes work and I am trying only to explain > them a little better. >[cut] > Using index (with cache): > > IOs time why > > 1 20msec read root > 1 20msec read leaf0 > 10 200msec read 10 unique data pages (assumes desired keys are > not uniformily distributed, this favors index case) > --- --------- OK, this example may be simple. In fact I would agree that in this case figures looks like seq scan is sufficient. But that's all started from my example of 250MB file with about 2M of records, that I wanted to select only about 25k. (I ommit the fact, that those record were naturally clustered by the time they came into database. So those 20.000 of records were pretty continuos). As I observed postgres scanned this database at rate of 100kBps(sequentially). Much less than the actuall I/O throughput on this machine. Even when I prepared a condition to return no records it also scanned it sequentially, while it would cost only 20msec. Anyhow... I have to admit that similiar question asked to mysql takes... mysql> select count(*) from log where dt < 19980209000000 and dt>19980208000000; +----------+ | count(*) | +----------+ | 26707 | +----------+ 1 row in set (7.61 sec) Of course, if I ask it without the index it takes ~3 minutes. That's why expected that postgres would make some use of index. (The table is in both cases the same). Mike -- WWW: http://www.lodz.pdi.net/~mimo tel: Int. Acc. Code + 48 42 148340 add: Michal Mosiewicz * Bugaj 66 m.54 * 95-200 Pabianice * POLAND
pgsql-hackers by date: