Home > mailing lists
Re: [GENERAL] index on large table - Mailing list pgsql-general

From	posting-system@google.com
Subject	Re: [GENERAL] index on large table
Date	March 22, 2002 13:24:10
Msg-id	467a1aa7bc75bbd7755a40d317d6180f Whole thread
List	pgsql-general
Tree view
From: n.j.andrews@investsystems.co.uk (Nigel Andrews)
Newsgroups: muc.lists.postgres.questions
Subject: Re: [GENERAL] index on large table
References: <200203132051.g2DKphU12227@candle.pha.pa.us>
NNTP-Posting-Host: 62.49.196.164
Message-ID: <1b03419b.0203191759.28e1e543@posting.google.com>

Bruce Momjian <pgman@candle.pha.pa.us> wrote in message news:<200203132051.g2DKphU12227@candle.pha.pa.us>...
> Tom Lane wrote:
> > >> "explain select * from events order by oid limit 10 offset 1000000"
> > >> NOTICE:  QUERY PLAN:
> > >> Limit  (cost=424863.54..424863.54 rows=10 width=130)
> > >> ->  Sort  (cost=424863.54..424863.54 rows=1025245 width=130)
> > >> ->  Seq Scan on events  (cost=0.00..35645.45 rows=1025245 width=130)
> > >>
> > >> Bummer. This is very slow again, sequential scan again. Why the index is
> > >> not used for this query? Use of index would make it very fast!
> >
> > Not necessarily.  Using the index for this would require fetching
> > 1000000+10 values in the indexscan (and throwing away all but 10).
> >
> > The planner is counting on its fingers and guessing that the sort
> > is faster.  It might or might not be right about that (have you
> > compared timings?) but certainly the index method won't be
> > instantaneous.
>
> This question is being asked a lot. I hope my new FAQ item 4.8 wording
> helps, but it will take time for people to read the new version.  I will
> add it to 7.2.X CVS.

I've read lots of messages on this subject, and the FAQ, recently but I'm still
confused. None of what I have read seems to account for the execution plans I'm
seeing, except for this mention of scanning a million values in an index and
discarding all but a small number. However, even this I can not see applies to
a primary key on a million row table. I upgraded from 7.0.x to 7.2 because I
was thinking it was an oddity that was probably fixed.

First, my apologies for the length of this posting.

Next some background:

I have a database where one of it's tables records 'sessions', called
chat_sessions. It has an integer field, session_id, declared as primary key.
This table is the small table in the example with only about 2000 rows.

There is another table called chat_post, the large table holding about 1
million rows. It has two integer fields, session_id and post_number, which
together form the primary key. session_id is further constrained to reference
session_id in chat_session. There are also indexes created on time and
poster_name fields in this table, although these are not unique indexes.

Both tables are loaded in a 'natural' order, i.e. in increasing time value,
although the large table may have some groups of rows out of order. That is
doing a select on the tables without ordering by a time will probably yield
rows in ascending time order, mostly. And within the larget table rows are
added already grouped by session_id.

Now, below is the output from a little sequence of queries, with comments and
plans (not all queries I have shown plans for have the results of execution,
mainly because they're not of interest).

The main point is illustrated by the last few queries shown. These show that
when selecting from the large table using the primary key, searching for known
session_id and post_number values, a sequential scan is performed, even though
the planner has thinks it's only going to be interested in 1 row, which isn't
surprising by definition of a primary key.

Earlier queries show that joining the large table with the small by testing the
session_id fields uses the primary key of chat_post, the large table, provided
that the session_id tested against the known value is the one from the small
table.

The question being, why would a plain look up of a primary key require a
sequential scan and given that this is deemed best plan in this simple case why
would the more complicated case of using a join to specify one of the two
columns in the primary key cause the planner to switch to using the index?

Another question raised is that I have read how the planner will use a
sequential scan if it thinks more than a small proportion of the table's rows
are likely to be selected. However, selecting by time range in the tests below,
a significant proportion of the time range covered by the tables contents are
selected, approx. a third of the range in fact. The query returns about a third
of the rows from the table as well and the planner has noted that it is
probably interested in about a third of the table. The planner, however,
chooses an index scan not a sequential scan. Why?

I look forward to understanding this more; yes I know I'm sick.

Nigel Andrews


-- script output

Large table size and distribution;

select count(*) as total_rows, min(time) as earliest, max(time) as
latest from chat_post;
 total_rows |        earliest        |         latest
------------+------------------------+------------------------
    1003159 | 1998-03-05 21:41:00+00 | 2002-03-19 21:27:00+00
(1 row)



Small table size

select count(*) from chat_session;
 count
-------
  2176
(1 row)



Planner settings

show enable_seqscan;
NOTICE:  enable_seqscan is on
SHOW VARIABLE
show enable_indexscan;
NOTICE:  enable_indexscan is on
SHOW VARIABLE


Analyse (just to check)

vacuum analyze;
NOTICE:  Skipping "pg_group" --- only table or database owner can
VACUUM it
NOTICE:  Skipping "pg_database" --- only table or database owner can
VACUUM it
NOTICE:  Skipping "pg_shadow" --- only table or database owner can
VACUUM it
VACUUM


Show looking up in large table by time, selecting approx. 1.5 years
from the 4 covered, uses an index (non unique)

explain analyze select count(*) from chat_post cp where cp.time >
1998-01-01 12:00:00+00 and cp.time < 1999-10-01 10:00:00+00 ;
NOTICE:  QUERY PLAN:

Aggregate  (cost=23065.45..23065.45 rows=1 width=0) (actual
time=18846.78..18846.78 rows=1 loops=1)
  ->  Index Scan using chat_post_time_key on chat_post cp
(cost=0.00..22109.54 rows=382364 width=0) (actual time=44.25..14509.81
rows=382941 loops=1)
Total runtime: 18847.45 msec

EXPLAIN
 count
--------
 382941
(1 row)



Show looking up in large table by poster_name uses an index (non
unique)

explain analyze select count(*) as count, min(time) as earliest,
max(time) as latest from chat_post cp where cp.poster_name = fatbull ;
NOTICE:  QUERY PLAN:

Aggregate  (cost=1051.43..1051.43 rows=1 width=8) (actual
time=87.39..87.40 rows=1 loops=1)
  ->  Index Scan using chat_post_user_key on chat_post cp
(cost=0.00..1049.43 rows=265 width=8) (actual time=79.15..85.94
rows=99 loops=1)
Total runtime: 88.03 msec

EXPLAIN
 count |        earliest        |         latest
-------+------------------------+------------------------
    99 | 1999-04-13 00:36:00+01 | 1999-08-06 01:03:00+01
(1 row)



Show looking up in large table by joining through foriegn key to small
table, selecting on partial primary (also foriegn) key in large, uses
sequential scan on large and index on small

explain analyze select count(*) from chat_post cp, chat_session cs
where cp.session_id = cs.session_id and cp.session_id = 123;
NOTICE:  QUERY PLAN:

Aggregate  (cost=20494.67..20494.67 rows=1 width=4) (actual
time=25826.39..25826.40 rows=1 loops=1)
  ->  Merge Join  (cost=20413.91..20494.64 rows=13 width=4) (actual
time=25814.88..25823.51 rows=321 loops=1)
        ->  Index Scan using chat_session_pkey on chat_session cs
(cost=0.00..74.94 rows=2176 width=2) (actual time=19.70..55.51
rows=122 loops=1)
        ->  Sort  (cost=20413.91..20413.91 rows=77 width=2) (actual
time=25754.39..25756.81 rows=321 loops=1)
              ->  Seq Scan on chat_post cp  (cost=0.00..20411.49
rows=77 width=2) (actual time=350.35..25750.71 rows=321 loops=1)
Total runtime: 25827.84 msec

EXPLAIN
 count
-------
   321
(1 row)



Show looking up in large table by joining through foriegn key to small
table, selecting on primary key in small, uses sequential scan on
small and index on large

explain analyze select count(*) from chat_post cp, chat_session cs
where cp.session_id = cs.session_id and cs.session_id = 123;
NOTICE:  QUERY PLAN:

Aggregate  (cost=311.03..311.03 rows=1 width=4) (actual
time=283.82..283.83 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..310.83 rows=79 width=4) (actual
time=71.70..277.81 rows=321 loops=1)
        ->  Seq Scan on chat_session cs  (cost=0.00..69.20 rows=1
width=2) (actual time=0.59..161.40 rows=1 loops=1)
        ->  Index Scan using chat_post_pkey on chat_post cp
(cost=0.00..240.64 rows=79 width=2) (actual time=71.04..107.89
rows=321 loops=1)
Total runtime: 284.56 msec

EXPLAIN
 count
-------
   321
(1 row)



Show looking up in large table by joining through foriegn key to small
table, selecting on primary key in large, uses sequential scan on
large and index on small

explain analyze select count(*) from chat_post cp, chat_session cs
where cp.session_id = cs.session_id and cp.session_id = 123 and
cp.post_number = 10;
NOTICE:  QUERY PLAN:

Aggregate  (cost=22922.41..22922.41 rows=1 width=4) (actual
time=31371.73..31371.74 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..22922.41 rows=1 width=4) (actual
time=1821.99..31371.64 rows=1 loops=1)
        ->  Seq Scan on chat_post cp  (cost=0.00..22919.39 rows=1
width=2) (actual time=1796.64..31346.22 rows=1 loops=1)
        ->  Index Scan using chat_session_pkey on chat_session cs
(cost=0.00..3.01 rows=1 width=2) (actual time=25.24..25.27 rows=1
loops=1)
Total runtime: 31372.46 msec

EXPLAIN
 count
-------
     1
(1 row)



Show looking up in large table by joining through foriegn key to small
table, selecting on primary key in small and parital primary key in
large, uses sequential scan on small and index on large

explain analyze select count(*) from chat_post cp, chat_session cs
where cp.session_id = cs.session_id and cs.session_id = 123 and
cp.post_number = 10;
NOTICE:  QUERY PLAN:

Aggregate  (cost=310.06..310.06 rows=1 width=4) (actual
time=304.09..304.10 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..310.05 rows=1 width=4) (actual
time=103.50..303.99 rows=1 loops=1)
        ->  Seq Scan on chat_session cs  (cost=0.00..69.20 rows=1
width=2) (actual time=23.38..194.53 rows=1 loops=1)
        ->  Index Scan using chat_post_pkey on chat_post cp
(cost=0.00..240.84 rows=1 width=2) (actual time=76.66..105.96 rows=1
loops=1)
Total runtime: 304.82 msec

EXPLAIN
 count
-------
     1
(1 row)



Show looking up in large table, selecting on partial primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123;
NOTICE:  QUERY PLAN:

Aggregate  (cost=20411.68..20411.68 rows=1 width=0) (actual
time=31691.92..31691.93 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=0.00..20411.49 rows=77 width=0)
(actual time=1736.29..31688.80 rows=321 loops=1)
Total runtime: 31692.35 msec

EXPLAIN


Show looking up in large table, selecting on primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123 and cp.post_number = 10;
NOTICE:  QUERY PLAN:

Aggregate  (cost=22919.39..22919.39 rows=1 width=0) (actual
time=32304.62..32304.63 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=0.00..22919.39 rows=1 width=0)
(actual time=1884.73..32304.49 rows=1 loops=1)
Total runtime: 32305.17 msec

EXPLAIN


suggest not to use sequential scanning

set enable_seqscan to off;
SET VARIABLE


Show looking up in large table, selecting on partial primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100020411.68..100020411.68 rows=1 width=0) (actual
time=31086.04..31086.05 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=100000000.00..100020411.49
rows=77 width=0) (actual time=1822.31..31078.88 rows=321 loops=1)
Total runtime: 31086.53 msec

EXPLAIN


Show looking up in large table, selecting on primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123 and cp.post_number = 10;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100022919.39..100022919.39 rows=1 width=0) (actual
time=34508.52..34508.53 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=100000000.00..100022919.39
rows=1 width=0) (actual time=1804.37..34508.40 rows=1 loops=1)
Total runtime: 34509.05 msec

EXPLAIN


try and lower cost of random page fetches to further remove the
preference for sequential scanning

set random_page_cost to 1;
SET VARIABLE


Show looking up in large table, selecting on partial primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100020411.68..100020411.68 rows=1 width=0) (actual
time=31131.36..31131.37 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=100000000.00..100020411.49
rows=77 width=0) (actual time=1796.37..31128.28 rows=321 loops=1)
Total runtime: 31131.90 msec

EXPLAIN


Show looking up in large table, selecting on primary key, uses
sequential scan on large

explain analyze select count(*) from chat_post cp where cp.session_id
= 123 and cp.post_number = 10;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100022919.39..100022919.39 rows=1 width=0) (actual
time=31445.16..31445.17 rows=1 loops=1)
  ->  Seq Scan on chat_post cp  (cost=100000000.00..100022919.39
rows=1 width=0) (actual time=1889.51..31445.03 rows=1 loops=1)
Total runtime: 31445.68 msec

EXPLAIN

-- end of output
pgsql-general by date:
From: Masaru Sugawara
Date: 22 March 2002, 13:17:05
Subject: Re: Huge Performance Difference on Similar Query in Pg7.2
From: Tom Lane
Date: 22 March 2002, 13:44:15
Subject: Re: Small question
Re: [GENERAL] index on large table - Mailing list pgsql-general

Previous

Next