Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date
Msg-id 6599.1019056576@sss.pgh.pa.us
Whole thread Raw
In response to Re: Index Scans become Seq Scans after VACUUM ANALYSE  ("Luis Alberto Amigo Navarro" <lamigo@atc.unican.es>)
Responses Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Hannu Krosing <hannu@tm.ee>)
Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Mark Pritchard <mark@tangent.net.au>)
List pgsql-hackers
"Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes:
> On my own few experience I think this could be solved decreasing
> random_page_cost, if you would prefer to use indexes than seq scans, then
> you can lower random_page_cost to a point in which postgres works as you
> want. So the planner would prefer indexes when in standard conditions it
> would prefer seq scans.

It's entirely possible that the default value of random_page_cost is too
high, at least for many modern machines.  The experiments I did to get
the 4.0 figure were done a couple years ago, on hardware that wasn't
exactly new at the time.  I have not heard of anyone else trying to
measure it though.

I don't think I have the source code I used anymore, but the principle
is simple enough:

1. Make a large file (several times the size of your machine's RAM, to
ensure you swamp out kernel disk buffering effects).  Fill with random
data. (NB: do not fill with zeroes, some filesystems optimize this away.)

2. Time reading the file sequentially, 8K per read request.
Repeat enough to get a statistically trustworthy number.

3. Time reading randomly-chosen 8K pages from the file.  Repeat
enough to get a trustworthy number (the total volume of pages read
should be several times the size of your RAM).

4. Divide.

The only tricky thing about this is making sure you are measuring disk
access times and not being fooled by re-accessing pages the kernel still
has cached from a previous access.  (The PG planner does try to account
for caching effects, but that's a separate estimate; the value of
random_page_cost isn't supposed to include caching effects.)  AFAIK the
only good way to do that is to use a large test, which means it takes
awhile to run; and you need enough spare disk space for a big test file.

It'd be interesting to get some numbers for this across a range of
hardware, filesystems, etc ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Michael Loftis
Date:
Subject: Re: Index Scans become Seq Scans after VACUUM ANALYSE
Next
From: Tom Lane
Date:
Subject: Re: date_in function