Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From mlw
Subject Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date
Msg-id 3CBDEF7F.6C40DFCA@mohawksoft.com
Whole thread Raw
In response to Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian wrote:
> 
> mlw wrote:
> > Bruce Momjian wrote:
> > >
> > > mlw wrote:
> > > > Now, given the choice of the two strategies on a table, both pretty close to
> > > > one another, the risk of poor performance for using the index scan is minimal
> > > > based on the statistics, but the risk of poor performance for using the
> > > > sequential scan is quite high on a large table.
> >
> > > My second point, that index scan is more risky than sequential scan, is
> > > outlined above.  A sequential scan reads each page once, and uses the
> > > file system read-ahead code to prefetch the disk buffers.  Index scans
> > > are random, and could easily re-read disk pages to plow through a
> > > significant portion of the table, and because the reads are random,
> > > the file system will not prefetch the rows so the index scan will have
> > > to wait for each non-cache-resident row to come in from disk.
> >
> > That is a very interesting point, but shouldn't that be factored into the cost
> > (random_tuple_cost?) In which case my point still stands.
> 
> Yes, I see your point.  I think on the high end that index scans can get
> very expensive if you start to do lots of cache misses and have to wait
> for i/o.  I know the random cost is 4, but I think that number is not
> linear.  It can be much higher for lots of cache misses and waiting for
> I/O, and think that is why it feels more risky to do an index scan on a
> sample size that is not perfectly known.

In an active system, sequential scans are still OS random access to a file. Two
or more queries running at the same time will blow out most of the expected
gain.

> 
> Actually, you pretty much can know sequential scan size because you know
> the number of blocks in the table.  It is index scan that is more
> unknown because you don't know how many index lookups you will need, and
> how well they will stay in the cache.

Again, shouldn't that be factored into the cost?

> 
> Does that help?  Wow, this _is_ confusing.  I am still looking for that
> holy grail that will allow this all to be codified so others can learn
> from it and we don't have to rehash this repeatedly, but frankly, this
> whole discussion is covering new ground that we haven't covered yet.

Path planning by probabilities derived from statistical analysis is always big
science, regardless of application. The cost based optimizer will *never* be
finished because it can never be perfect.

When all is said and done, it could very well be as good as it ever needs to
be, and that a method for giving hints to the optimizer, ala Oracle, is the
answer.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: regexp character class locale awareness patch
Next
From: Bruce Momjian
Date:
Subject: Re: [SQL] A bug in gistPageAddItem()/gist_tuple_replacekey() ???