Re: Optimizing further - Mailing list pgsql-general

From Curt Sampson
Subject Re: Optimizing further
Date
Msg-id Pine.NEB.4.43.0206141451280.664-100000@angelic.cynic.net
Whole thread Raw
In response to Optimizing further  (Ken Williams <ken@mathforum.org>)
List pgsql-general
On Fri, 14 Jun 2002, Ken Williams wrote:

>    select price, date FROM foo
>     WHERE date < '2000-07-01 00:29:00+10' and code='UCL'
>     ORDER by date DESC, stock DESC
>     LIMIT 100;
>
> I've got over 7 million rows in the table, and a btree
> index(date, code).  According to "explain", the query will be
> executed as follows:
>
>    Limit  (cost=0.00..393.45 rows=100 width=32)
>      ->  Index Scan Backward using foo_date_code on trades
>                                 (cost=0.00..12309.13 rows=3128 width=32)
>
> When I run the query, it takes a really long time (10 seconds)
> the first time I execute it, and then returns instantly the
> second time.

My first suspicion is that, as someone else explained, the data is
being dragged (very slowly) off the disk with the first query, and
then is being read from the cache in the second query. So it's disk
I/O that's your problem.  You can do a "set show_query_stats =
true", run the query, and check the log you'll find out how many
I/O operations you're doing.

So let's see how we might process this query. (Experts should
correct me if I go wrong, here.) First we find the last occurance
of the given date in the index, thus doing perhaps 2-4 reads
(depending on the depth of the btree). Then we scan backwards from
that point, looking at the code value in the index, and collecting
up the ones that have a code of UCL. This should be pretty quick
as, even though we probably are not using OS read-ahead, the blocks
are going to be darn close together and the index entries are all
clustered together in a few blocks. They estimate 3128 rows read,
so let's say it's 300 or so blocks we read. (It's probably less.)

Once we've got a hundred of these babies, we have to go and fetch
their tuples, which could be up to 100 reads, if they're scattered
randomly about the table.

Then we sort the whole lot, and we're done.

So one would think you'd see only about 400 blocks read for the
query, 300 of which are very close together. On modern disks this
should only take a couple of seconds. You don't happen to have this on
a really old, slow disk, do you? I suspect not.

Maybe it's reading a lot more than 400 blocks? Perhaps the optimizer
doesn't realize it can get the value of code from the index, and
is instead reading the tuple for every item in the index it scans.
If that's the case, maybe reversing the two values in your index
(foo_code_date instead of foo_date_code) would help?

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC


pgsql-general by date:

Previous
From: Justin Clift
Date:
Subject: Re: read this and puke
Next
From: tony
Date:
Subject: Re: read this and puke