Home > mailing lists

Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date	April 17, 2002 14:16:19
Msg-id	6599.1019056576@sss.pgh.pa.us Whole thread Raw
In response to	Re: Index Scans become Seq Scans after VACUUM ANALYSE ("Luis Alberto Amigo Navarro" <lamigo@atc.unican.es>)
Responses	Re: Index Scans become Seq Scans after VACUUM ANALYSE (Hannu Krosing <hannu@tm.ee>) Re: Index Scans become Seq Scans after VACUUM ANALYSE (Mark Pritchard <mark@tangent.net.au>)
List	pgsql-hackers

Tree view

"Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes:
> On my own few experience I think this could be solved decreasing
> random_page_cost, if you would prefer to use indexes than seq scans, then
> you can lower random_page_cost to a point in which postgres works as you
> want. So the planner would prefer indexes when in standard conditions it
> would prefer seq scans.

It's entirely possible that the default value of random_page_cost is too
high, at least for many modern machines.  The experiments I did to get
the 4.0 figure were done a couple years ago, on hardware that wasn't
exactly new at the time.  I have not heard of anyone else trying to
measure it though.

I don't think I have the source code I used anymore, but the principle
is simple enough:

1. Make a large file (several times the size of your machine's RAM, to
ensure you swamp out kernel disk buffering effects).  Fill with random
data. (NB: do not fill with zeroes, some filesystems optimize this away.)

2. Time reading the file sequentially, 8K per read request.
Repeat enough to get a statistically trustworthy number.

3. Time reading randomly-chosen 8K pages from the file.  Repeat
enough to get a trustworthy number (the total volume of pages read
should be several times the size of your RAM).

4. Divide.

The only tricky thing about this is making sure you are measuring disk
access times and not being fooled by re-accessing pages the kernel still
has cached from a previous access.  (The PG planner does try to account
for caching effects, but that's a separate estimate; the value of
random_page_cost isn't supposed to include caching effects.)  AFAIK the
only good way to do that is to use a large test, which means it takes
awhile to run; and you need enough spare disk space for a big test file.

It'd be interesting to get some numbers for this across a range of
hardware, filesystems, etc ...
        regards, tom lane

pgsql-hackers by date:

From: Michael Loftis
Date: 17 April 2002, 14:11:14
Subject: Re: Index Scans become Seq Scans after VACUUM ANALYSE

From: Tom Lane
Date: 17 April 2002, 14:19:45
Subject: Re: date_in function

Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

Previous

Next