Home > mailing lists

Large tables, ORDER BY and sequence/index scans - Mailing list pgsql-general

From	Milan Zamazal
Subject	Large tables, ORDER BY and sequence/index scans
Date	January 5, 2010 09:03:49
Msg-id	873a2k9ae9.fsf@blackbird.nest.zamazal.org Whole thread Raw
Responses	Re: Large tables, ORDER BY and sequence/index scans Re: Large tables, ORDER BY and sequence/index scans Re: Large tables, ORDER BY and sequence/index scans
List	pgsql-general

Tree view

My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

  BEGIN;
  DECLARE ... SELECT ... ORDER BY ...;
  FETCH ...;
  ...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command.  This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column.  When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation.  Nevertheless such behavior leads
to unacceptably poor performance in my particular case.  It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

- Is there a way to convince PostgreSQL to use index scans automatically
  in cases where it is much more efficient?  I tried using ANALYZE,
  VACUUM and SET STATISTICS, but without success.

- Is it a good idea to set enable_seqscan or enable_sort to "off"
  globally in my case?  Or to set them to "off" just before working with
  large tables?  My databases contain short and long tables, often
  connected through REFERENCES or joined into views and many of shorter
  tables serve as codebooks.  Can setting one of the parameters to off
  have clearly negative impacts?

- Is there a recommended way to keep indexes in good shape so that the
  performance of initial rows retrievals remains good?  The large tables
  are typically append-only tables with a SERIAL primary key.

Thanks for any tips.

pgsql-general by date:

From: Greg Smith
Date: 05 January 2010, 06:38:43
Subject: Re: timestams in the the pg_standby output

From: Alban Hertroys
Date: 05 January 2010, 10:04:25
Subject: Re: PostgreSQL Write Performance

Large tables, ORDER BY and sequence/index scans - Mailing list pgsql-general

Previous

Next