Re: Why is the query not using the index for sorting? - Mailing list pgsql-performance

From Craig Ringer
Subject Re: Why is the query not using the index for sorting?
Date
Msg-id 4B093BBD.3020505@postnewspapers.com.au
Whole thread Raw
In response to Why is the query not using the index for sorting?  (Jonathan Blitz <jblitz@013.net>)
Responses Re: Why is the query not using the index for sorting?  (Jonathan Blitz <jblitz@013.net>)
List pgsql-performance
On 22/11/2009 8:50 PM, Jonathan Blitz wrote:
> I have a table with a number of columns.
>
> I perform
>
> Select *
> from table
> order by a,b
>
> There is an index on a,b which is clustered (as well as indexes on a and
> b alone).
> I have issued the cluster and anyalze commands.
>
> Nevertheless, PostgreSQL performs a Sequential Scan on the table and
> then performs a sort.

PostgreSQL's query planner probably thinks it'll be faster to read the
pages off the disk sequentially then sort them in memory. To use an
index instead, Pg would have to read the whole index from disk
(sequentially) then fetch all the pages off the disk in a probably
near-random order. So it'd be doing more disk I/O, and much more of it
would be random I/O, which is a LOT slower.

So Pg does it the fast way, reading the table into memory then sorting
it there.

The most important thing to understand is that sometimes, a sequential
scan is just the fastest way to do the job.

I suspect you're working on the assumption that Pg can get all the data
it needs from the index, so it doesn't need to read the tables proper.
In some other database systems this *might* be possible if you had an
index on fields "a" and "b" and issued a "select a,b from table" instead
of a "select *". PostgreSQL, though, can not do this. PostgreSQL's
indexes do not contain all the information required to return values
from queries, only enough information to find the places in the main
tables where those values are to be found.

If you want to know more and understand why that's the case, search for
the phrase "covered index" and the words "index visibility". Suffice it
to say that there are pretty good reasons why it works how it does, and
there would be very large downsides to changing how it works as well as
large technical problems to solve to even make it possible. It's to do
with the trade-off between update/insert/delete speeds and query speeds,
the cost of "fatter" indexes taking longer to read from disk, and lots more.

By the way, if you want to test out different query plans for a query to
see which way is faster, you can use the "enable_" parameters like
"enable_seqscan", "enable_hashjoin" etc to control how PostgreSQL
performs queries. There's *LOTS* to be learned about this in the mailing
list archives. You should also read the following page:

 http://www.postgresql.org/docs/current/static/runtime-config-query.html

but understand that the planner method configuration parameters are
intended mostly for testing and performance analysis, not for production
use.

If you find a query that's lots faster with a particular enable_
parameter set to "off", try increasing your statistics targets on the
tables / columns of interest, re-ANALYZEing, and re-testing. See these
pages re statistics:

http://www.postgresql.org/docs/current/static/using-explain.html
http://www.postgresql.org/docs/current/static/planner-stats.html
http://www.postgresql.org/docs/current/static/planner-stats-details.html

If after increasing your stats targets the planner still picks a vastly
slower plan, consider posting to the mailing list with the full output
of "EXPLAIN ANALYZE SELECT myquery....", the full exact text of your
query, and your table schema as shown by "\d tablename" in psql. Someone
may be able to help you or at least explain why it's happening.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Thom Brown
Date:
Subject: Re: Why is the query not using the index for sorting?
Next
From: Jonathan Blitz
Date:
Subject: Re: Why is the query not using the index for sorting?