Thread: Large tables, ORDER BY and sequence/index scans

Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

  BEGIN;
  DECLARE ... SELECT ... ORDER BY ...;
  FETCH ...;
  ...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command.  This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column.  When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation.  Nevertheless such behavior leads
to unacceptably poor performance in my particular case.  It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

- Is there a way to convince PostgreSQL to use index scans automatically
  in cases where it is much more efficient?  I tried using ANALYZE,
  VACUUM and SET STATISTICS, but without success.

- Is it a good idea to set enable_seqscan or enable_sort to "off"
  globally in my case?  Or to set them to "off" just before working with
  large tables?  My databases contain short and long tables, often
  connected through REFERENCES or joined into views and many of shorter
  tables serve as codebooks.  Can setting one of the parameters to off
  have clearly negative impacts?

- Is there a recommended way to keep indexes in good shape so that the
  performance of initial rows retrievals remains good?  The large tables
  are typically append-only tables with a SERIAL primary key.

Thanks for any tips.


Re: Large tables, ORDER BY and sequence/index scans

From
Filip Rembiałkowski
Date:

2010/1/5 Milan Zamazal <pdm@brailcom.org>
My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

 
I typically retrieve the data using cursors, to display them in UI:

 BEGIN;
 DECLARE ... SELECT ... ORDER BY ...;
 FETCH ...;
 ...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command.  This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column.  When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation.  Nevertheless such behavior leads
to unacceptably poor performance in my particular case.  It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

Without a WHERE clause postgres will almost always choose a sequential scan.



- Is there a way to convince PostgreSQL to use index scans automatically
 in cases where it is much more efficient?  I tried using ANALYZE,
 VACUUM and SET STATISTICS, but without success.


By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to optimize the whole operation.


 
- Is it a good idea to set enable_seqscan or enable_sort to "off"
 globally in my case?  Or to set them to "off" just before working with
 large tables?  My databases contain short and long tables, often
 connected through REFERENCES or joined into views and many of shorter
 tables serve as codebooks.  Can setting one of the parameters to off
 have clearly negative impacts?

IMHO, no, no and yes.
 

- Is there a recommended way to keep indexes in good shape so that the
 performance of initial rows retrievals remains good?  The large tables
 are typically append-only tables with a SERIAL primary key.


Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree layout. But that's just a half-solution.




Thanks for any tips.


tips:

1. get rid of cursors, unless you have a strong need for them (eg. seeking back and forth and updating).

2. switch to "chunked" processing, like this:

SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch] ORDER by idxcol LIMIT 1000;
... and so on.



pozdrawiam,
Filip


--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

Re: Large tables, ORDER BY and sequence/index scans

From
Pavel Stehule
Date:
Hello

please, send explain result

postgres=# explain analyze declare x cursor for select * from foo;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4) (actual
time=0.010..0.012 rows=2 loops=1)
 Total runtime: 0.033 ms
(2 rows)

regards
Pavel Stehule

2010/1/5 Milan Zamazal <pdm@brailcom.org>:
> My problem is that retrieving sorted data from large tables is sometimes
> very slow in PostgreSQL (8.4.1, FWIW).
>
> I typically retrieve the data using cursors, to display them in UI:
>
>  BEGIN;
>  DECLARE ... SELECT ... ORDER BY ...;
>  FETCH ...;
>  ...
>
> On a newly created table of about 10 million rows the FETCH command
> takes about one minute by default, with additional delay during the
> contingent following COMMIT command.  This is because PostgreSQL uses
> sequence scan on the table even when there is an index on the ORDER BY
> column.  When I can force PostgreSQL to perform index scan (e.g. by
> setting one of the options enable_seqscan or enable_sort to off), FETCH
> response is immediate.
>
> PostgreSQL manual explains motivation for sequence scans of large tables
> and I can understand the motivation.  Nevertheless such behavior leads
> to unacceptably poor performance in my particular case.  It is important
> to get first resulting rows quickly, to display them to the user without
> delay.
>
> My questions are:
>
> - What is your experience with using ORDER BY + indexes on large tables?
>
> - Is there a way to convince PostgreSQL to use index scans automatically
>  in cases where it is much more efficient?  I tried using ANALYZE,
>  VACUUM and SET STATISTICS, but without success.
>
> - Is it a good idea to set enable_seqscan or enable_sort to "off"
>  globally in my case?  Or to set them to "off" just before working with
>  large tables?  My databases contain short and long tables, often
>  connected through REFERENCES or joined into views and many of shorter
>  tables serve as codebooks.  Can setting one of the parameters to off
>  have clearly negative impacts?
>
> - Is there a recommended way to keep indexes in good shape so that the
>  performance of initial rows retrievals remains good?  The large tables
>  are typically append-only tables with a SERIAL primary key.
>
> Thanks for any tips.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "FR" == Filip Rembiałkowski <plk.zuber@gmail.com> writes:

    FR> 2010/1/5 Milan Zamazal <pdm@brailcom.org>
    >> - Is it a good idea to set enable_seqscan or enable_sort to "off"
    >> globally in my case?  Or to set them to "off" just before working
    >> with large tables?  My databases contain short and long tables,
    >> often connected through REFERENCES or joined into views and many
    >> of shorter tables serve as codebooks.  Can setting one of the
    >> parameters to off have clearly negative impacts?

    FR> IMHO, no, no and yes.

Why (especially the "yes" part)?  Any details and/or pointers?

    FR> 1. get rid of cursors, unless you have a strong need for them
    FR> (eg. seeking back and forth and updating).

Cursors are very convenient for me, because they allow easy browsing
data in the user interface (fetching limited sets of rows while seeking
forward and backward) and they prevent contingent seeking and other
troubles when concurrent updates happen.

    FR> 2. switch to "chunked" processing, like this:

    FR> SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
    FR> (process the records)
    FR> SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch]
    FR> ORDER by idxcol LIMIT 1000;
    FR> ... and so on.

Not counting the convenience of cursors, this wouldn't work as the
values in idxcol needn't be unique.

Thanks,
Milan Zamazal


Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

    PS> please, send explain result

For ~ 10 million rows table:

  explain declare c cursor for select * from foo2 order by value;
                                QUERY PLAN
  ----------------------------------------------------------------------
   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10)
     Sort Key: value
     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10)
  (3 rows)

For ~ 1 million rows table, containing the first million rows from foo2
(`value' column contains random integer data):

  explain declare c cursor for select * from foo order by value;
                                      QUERY PLAN
  -----------------------------------------------------------------------------------
   Index Scan using foo_value_idx on foo  (cost=0.00..47604.02 rows=999999 width=10)
  (1 row)

When seqscan is disabled for the 10 million rows table:

  set enable_seqscan = off;
  explain declare c cursor for select * from foo2 order by value;
                                         QUERY PLAN
  -----------------------------------------------------------------------------------------
   Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10)
  (1 row)

Regards,
Milan Zamazal


Re: Large tables, ORDER BY and sequence/index scans

From
"Albe Laurenz"
Date:
Milan Zamazal wrote:
> My problem is that retrieving sorted data from large tables
> is sometimes
> very slow in PostgreSQL (8.4.1, FWIW).
>
> I typically retrieve the data using cursors, to display them in UI:
>
>   BEGIN;
>   DECLARE ... SELECT ... ORDER BY ...;
>   FETCH ...;
>   ...
>
> On a newly created table of about 10 million rows the FETCH command
> takes about one minute by default, with additional delay during the
> contingent following COMMIT command.  This is because PostgreSQL uses
> sequence scan on the table even when there is an index on the ORDER BY
> column.  When I can force PostgreSQL to perform index scan (e.g. by
> setting one of the options enable_seqscan or enable_sort to off), FETCH
> response is immediate.
>
> PostgreSQL manual explains motivation for sequence scans of large tables
> and I can understand the motivation.  Nevertheless such behavior leads
> to unacceptably poor performance in my particular case.  It is important
> to get first resulting rows quickly, to display them to the user without
> delay.

Did you try to reduce the cursor_tuple_fraction parameter?

Yours,
Laurenz Albe

Re: Large tables, ORDER BY and sequence/index scans

From
Grzegorz Jaśkiewicz
Date:
2010/1/5 Milan Zamazal <pdm@brailcom.org>:

> Cursors are very convenient for me, because they allow easy browsing
> data in the user interface (fetching limited sets of rows while seeking
> forward and backward) and they prevent contingent seeking and other
> troubles when concurrent updates happen.
>

Sounds to me like a borked app design. Do you seriously need to walk
the user through couple of million rows of data ?

I mean, databases are designed to work on data, and give you the
result back. Use this capability intelligently.


--
GJ

Re: Large tables, ORDER BY and sequence/index scans

From
Pavel Stehule
Date:
please EXPLAIN ANALYZE

Pavel

2010/1/5 Milan Zamazal <pdm@brailcom.org>:
>>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:
>
>    PS> please, send explain result
>
> For ~ 10 million rows table:
>
>  explain declare c cursor for select * from foo2 order by value;
>                                QUERY PLAN
>  ----------------------------------------------------------------------
>   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10)
>     Sort Key: value
>     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10)
>  (3 rows)
>
> For ~ 1 million rows table, containing the first million rows from foo2
> (`value' column contains random integer data):
>
>  explain declare c cursor for select * from foo order by value;
>                                      QUERY PLAN
>  -----------------------------------------------------------------------------------
>   Index Scan using foo_value_idx on foo  (cost=0.00..47604.02 rows=999999 width=10)
>  (1 row)
>
> When seqscan is disabled for the 10 million rows table:
>
>  set enable_seqscan = off;
>  explain declare c cursor for select * from foo2 order by value;
>                                         QUERY PLAN
>  -----------------------------------------------------------------------------------------
>   Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10)
>  (1 row)
>
> Regards,
> Milan Zamazal
>
>

Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

    PS> please EXPLAIN ANALYZE Pavel

I see, I'm sorry.  Here are the results:

  set enable_seqscan = on;
  explain analyze declare c cursor for select * from foo2 order by value;
                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1)
     Sort Key: value
     Sort Method:  external merge  Disk: 204208kB
     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999
loops=1)
   Total runtime: 53693.012 ms
  (5 rows)

  set enable_seqscan = off;
  explain analyze declare c cursor for select * from foo2 order by value;

                                                                   QUERY PLAN
                       

---------------------------------------------------------------------------------------------------------------------------------------------
   Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10) (actual
time=0.225..55055.583rows=9999999 loops=1) 
   Total runtime: 56394.283 ms
  (2 rows)


Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "AL" == Albe Laurenz <laurenz.albe@wien.gv.at> writes:

    AL> Did you try to reduce the cursor_tuple_fraction parameter?

No, good idea, thanks.  It helps.

The question is whether it's a good idea to reduce cursor_tuple_fraction
universally, without knowing the table size before (and I'm not going to
use SELECT COUNT(*) for well known reasons).  But I can probably
experiment a bit and will see, it looks promising.


Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "GJ" == Grzegorz Jaśkiewicz <gryzman@gmail.com> writes:

    GJ> Do you seriously need to walk the user through couple of million
    GJ> rows of data ?

Typically not.  Data can be of any size.  Some tables may be large and
I'd like to understand what happens.  It is a general data browser.


Re: Large tables, ORDER BY and sequence/index scans

From
Pavel Stehule
Date:
2010/1/5 Milan Zamazal <pdm@brailcom.org>:
>>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:
>
>    PS> please EXPLAIN ANALYZE Pavel
>
> I see, I'm sorry.  Here are the results:

Have you original values random_page_cost and seq_page_cost in postgres.conf?

it is strange.

Pavel



>
>  set enable_seqscan = on;
>  explain analyze declare c cursor for select * from foo2 order by value;
>                                                         QUERY PLAN
>
 -------------------------------------------------------------------------------------------------------------------------
>   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1)
>     Sort Key: value
>     Sort Method:  external merge  Disk: 204208kB
>     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999
loops=1)
>   Total runtime: 53693.012 ms
>  (5 rows)
>
>  set enable_seqscan = off;
>  explain analyze declare c cursor for select * from foo2 order by value;
>
>                                                                   QUERY PLAN
>
 ---------------------------------------------------------------------------------------------------------------------------------------------
>   Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10) (actual
time=0.225..55055.583rows=9999999 loops=1) 
>   Total runtime: 56394.283 ms
>  (2 rows)
>
>

Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

    PS> Have you original values random_page_cost and seq_page_cost in
    PS> postgres.conf?

Yes.  To be sure I uncommented the values in postgresql.conf

  seq_page_cost = 1.0            # measured on an arbitrary scale
  random_page_cost = 4.0        # same scale as above

and restarted PostgreSQL.  The result looks basically the same:

  explain analyze declare c cursor for select * from foo2 order by value;
                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1)
     Sort Key: value
     Sort Method:  external merge  Disk: 204208kB
     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999
loops=1)
   Total runtime: 54399.967 ms
  (5 rows)


Re: Large tables, ORDER BY and sequence/index scans

From
Pavel Stehule
Date:
2010/1/5 Milan Zamazal <pdm@brailcom.org>:
>>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:
>
>    PS> Have you original values random_page_cost and seq_page_cost in
>    PS> postgres.conf?
>
> Yes.  To be sure I uncommented the values in postgresql.conf
>
>  seq_page_cost = 1.0                   # measured on an arbitrary scale
>  random_page_cost = 4.0                # same scale as above

and value efective_cache_size ???

what is CREATE INDEX stament for index?

Pavel

>
> and restarted PostgreSQL.  The result looks basically the same:
>
>  explain analyze declare c cursor for select * from foo2 order by value;
>                                                         QUERY PLAN
>
 -------------------------------------------------------------------------------------------------------------------------
>   Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1)
>     Sort Key: value
>     Sort Method:  external merge  Disk: 204208kB
>     ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999
loops=1)
>   Total runtime: 54399.967 ms
>  (5 rows)
>
>

Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

    PS> and value efective_cache_size ???

effective_cache_size = 128MB

    PS> what is CREATE INDEX stament for index?

create index foo2_value_idx on foo2(value);


Re: Large tables, ORDER BY and sequence/index scans

From
John R Pierce
Date:
Milan Zamazal wrote:
>     PS> and value efective_cache_size ???
>
> effective_cache_size = 128MB
>

thats rather small unless your systme is very memory constrained.
assuming postgres is the primary disk IO consumer on this ysstem, take a
look at the 'cached' value on TOP or whatever after its been running,
thats a good first order estimate of effective_cache_size.... this is
often half or more of your physical memory.



Re: Large tables, ORDER BY and sequence/index scans

From
Milan Zamazal
Date:
>>>>> "JRP" == John R Pierce <pierce@hogranch.com> writes:

    >> effective_cache_size = 128MB

    JRP> thats rather small unless your systme is very memory
    JRP> constrained.  assuming postgres is the primary disk IO consumer
    JRP> on this ysstem, take a look at the cached' value on TOP or
    JRP> whatever after its been running, thats a good first order
    JRP> estimate of effective_cache_size.... this is often half or more
    JRP> of your physical memory.

Indeed, increasing effective_cache_size helps, thanks.

Thank you all for the tips, I hope I understand the planner behavior for
my queries better now.

Regards,
Milan Zamazal