Re: [HACKERS] Optimizer fails? - Mailing list pgsql-hackers

From Michal Mosiewicz
Subject Re: [HACKERS] Optimizer fails?
Date
Msg-id 351C26F6.F7ADD7B6@interdata.com.pl
Whole thread Raw
In response to Re: [HACKERS] Optimizer fails?  (dg@illustra.com (David Gould))
Responses Re: [HACKERS] Optimizer fails?
List pgsql-hackers
David Gould wrote:

> > Now, let's note, that there has been only a few IO transfers by now. No
> > more than few pages. And we have tupple identifiers pointing us to 64
> > records. Now we may sort this tids in ascending order to optimise IO.
>
> But, we do not do this tid sort. It really isn't easy as you might have
> millions of tids, not just a few. Which would mean doing an external sort.
> This might be a nice thing to do, but it isn't there now as far as I know.

No, you don't need a full set. You may sort it in portions of the
predefined size. You may even try ascending/descending order to optimise
your hd heads movements, however it may be not very good idea, since
it's against the read-ahead feature of most disk IO and it may
negatively influence ORDER BY performance. Anyhow, you may accomplish
sawtooth readings, that certainly decrease the access time.

> > Everything took us 3 + 128 nodes from index + 64 records from table.
> > This is defnitely better than reading all 1024 records.
>
> I am not trying to flame you, but it seems to me that you have some
> misconceptions about how the indexes work and I am trying only to explain
> them a little better.
>[cut]
>  Using index (with cache):
>
>     IOs        time       why
>
>      1         20msec     read root
>      1         20msec     read leaf0
>     10        200msec     read 10 unique data pages (assumes desired keys are
>                           not uniformily distributed, this favors index case)
>    ---      ---------

OK, this example may be simple. In fact I would agree that in this case
figures looks like seq scan is sufficient. But that's all started from
my example of 250MB file with about 2M of records, that I wanted to
select only about 25k. (I ommit the fact, that those record were
naturally clustered by the time they came into database. So those 20.000
of records were pretty continuos).

As I observed postgres scanned this database at rate of
100kBps(sequentially). Much less than the actuall I/O throughput on this
machine.  Even when I prepared a condition to return no records it also
scanned it sequentially, while it would cost only 20msec.

Anyhow... I have to admit that similiar question asked to mysql takes...
mysql> select count(*) from log where dt < 19980209000000 and
dt>19980208000000;
+----------+
| count(*) |
+----------+
|    26707 |
+----------+
1 row in set (7.61 sec)

Of course, if I ask it without the index it takes ~3 minutes. That's why
expected that postgres would make some use of index. (The table is in
both cases the same).

Mike

--
WWW: http://www.lodz.pdi.net/~mimo  tel: Int. Acc. Code + 48 42 148340
add: Michal Mosiewicz  *  Bugaj 66 m.54 *  95-200 Pabianice  *  POLAND

pgsql-hackers by date:

Previous
From: "Henry B. Hotz"
Date:
Subject: Re: [QUESTIONS] Re Trivial mSQL/MySQL DoS method
Next
From: teunis
Date:
Subject: ODBC drivers bundled with postgres