Re: What is the right way to deal with a table with rows that are not in a random order? - Mailing list pgsql-general

From Jeff Davis
Subject Re: What is the right way to deal with a table with rows that are not in a random order?
Date
Msg-id 1243472067.11796.2.camel@monkey-cat.sm.truviso.com
Whole thread Raw
In response to What is the right way to deal with a table with rows that are not in a random order?  (Douglas Alan <darkwater42@gmail.com>)
Responses Re: What is the right way to deal with a table with rows that are not in a random order?  (Douglas Alan <darkwater42@gmail.com>)
List pgsql-general
On Wed, 2009-05-27 at 19:53 -0400, Douglas Alan wrote:
> We have a very large table (150 million rows) where the rows are not
> in a random order.  Some common queries will have millions of results,
> and this of course is slow.  For an interactive interface to the
> database, we want to put a limit on all queries so that queries will
> return quickly even if there are millions of results.
>
> The problem we are seeing at the moment is that the statistics
> histograms are actually often hurting us.  Postgres sees that for some
> queries there are so many results that it decides to do a sequential
> scan, rather than using the index.  Unfortunately, sometimes all of
> these millions of results are at the end of the table, rather than
> being randomly distributed in the table, so a sequential scan is the
> worst possible approach.

If you're putting a LIMIT on it, why does it return millions of results?

> To fix this, we could use an "order by" clause to force Postgres to
> use the index, but this isn't so easy as we are using Postgres through
> an ORM (i.e, Django in this case), and it's not so easy to try to
> jury-rig  things this way on a per-query basis.
>

ORDER BY won't always choose an index scan, for instance if the
correlation is low and the expected results are many.

Can you pick out an interesting query and give some specifics, like:
* the query
* the EXPLAIN ANALYZE output (or EXPLAIN without ANALYZE if it takes too
long to even run once)
* EXPLAIN ANALYZE output if you force the index scan
* the statistics for the relevant columns, such as histogram and
correlation

Regards,
    Jeff Davis


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bloated Table
Next
From: Douglas Alan
Date:
Subject: Re: What is the right way to deal with a table with rows that are not in a random order?