Thread: Question on Explain : Index Scan

Question on Explain : Index Scan

From

Date:

20 October 2010, 22:47:39

Question on Index scan:
--------------------------------------------------------------------------->
test=# \d test_seqindex1
       Table "public.test_seqindex1"
Column |         Type          | Modifiers
--------+-----------------------+-----------
sid    | character varying(13) | not null
name   | character varying(80) |
Indexes:
    "test_seqindex1_pkey" PRIMARY KEY, btree (sid)

test=# explain select * from test_seqindex1 where sid='AA023';
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using test_seqindex1_pkey on test_seqindex1 (cost=0.00..8.27 rows=1 width=28)
   Index Cond: ((sid)::text = 'AA023'::text)

test=# \d test_seqindex2
       Table "public.test_seqindex2"
Column |         Type          | Modifiers
--------+-----------------------+-----------
eid    | integer               | not null
sid    | character varying(13) |
ename | character varying(80) |
Indexes:
    "test_seqindex2_pkey" PRIMARY KEY, btree (eid)
    "idx_test_seqindex2_sid" btree (sid)

test=# explain select * from test_seqindex2 where sid='AA023';
                                      QUERY PLAN
--------------------------------------------------------------------------------------
Bitmap Heap Scan on test_seqindex2 (cost=4.95..275.53 rows=73 width=30)
   Recheck Cond: ((sid)::text = 'AA023'::text)
   -> Bitmap Index Scan on idx_test_seqindex2_sid (cost=0.00..4.93 rows=73 width=0)
         Index Cond: ((sid)::text = 'AA023'::text)

test=explain select * from test_seqindex1 t1,test_seqindex2 t2 where t1.sid=t2.sid;
                                     QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=1231.55..46386.19 rows=920544 width=58)
   Hash Cond: ((t2.sid)::text = (t1.sid)::text)
   -> Seq Scan on test_seqindex2 t2 (cost=0.00..16225.97 rows=920697 width=30)
   -> Hash (cost=581.80..581.80 rows=33580 width=28)
         -> Seq Scan on test_seqindex1 t1 (cost=0.00..581.80 rows=33580 width=28)
(5 rows)

I was hoping the optimizer would do a join using index scan.

Could some one please explain me why its not doing an index scan rather than sequential scan .

Thanks
Deepak

Re: Question on Explain : Index Scan

From

Mathieu De Zutter

Date:

21 October 2010, 07:21:14

On Thu, Oct 21, 2010 at 3:47 AM, DM <dm.aeqa@gmail.com> wrote:
> I was hoping the optimizer would do a join using index scan.
>
> Could some one please explain me why its not doing an index scan rather than
> sequential scan .

A index scan would be probably slower here because you're asking for a
lot of rows. A lot of rows means a lot of I/O, and an index scan is
more I/O intensive (since it has to read the index too). If you limit
the result (by being more selective in your where clause, just like
you do in the first two queries), postgres will most likely switch to
index scan.

You can see for yourself if index-scan would be faster in your case by
running the following command before "explain (analyze)":

set enable_seqscan = off;

BTW, try to use explain analyze instead of explain, that way you'll
see the actual timings too instead of just the planner estimates.

Kind regards,
Mathieu

Re: Question on Explain : Index Scan

From

Date:

21 October 2010, 15:10:07

perfecto, thank you for the explanation.

- Deepak

On Thu, Oct 21, 2010 at 3:20 AM, Mathieu De Zutter <mathieu@dezutter.org> wrote:

On Thu, Oct 21, 2010 at 3:47 AM, DM <dm.aeqa@gmail.com> wrote:
> I was hoping the optimizer would do a join using index scan.
>
> Could some one please explain me why its not doing an index scan rather than
> sequential scan .

A index scan would be probably slower here because you're asking for a
lot of rows. A lot of rows means a lot of I/O, and an index scan is
more I/O intensive (since it has to read the index too). If you limit
the result (by being more selective in your where clause, just like
you do in the first two queries), postgres will most likely switch to
index scan.

You can see for yourself if index-scan would be faster in your case by
running the following command before "explain (analyze)":

set enable_seqscan = off;

BTW, try to use explain analyze instead of explain, that way you'll
see the actual timings too instead of just the planner estimates.

Kind regards,
Mathieu

Re: Question on Explain : Index Scan

From

Date:

21 October 2010, 15:51:16

Why is the difference in query plan, and the total runtime. Both tables have the same btree index

test=# explain analyze select * from test_seqindex1 where sid='AA023';
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_seqindex1_pkey on test_seqindex1 (cost=0.00..8.27 rows=1 width=28) (actual time=0.017..0.018 rows=1 loops=1)
   Index Cond: ((sid)::text = 'AA023'::text)
Total runtime: 0.035 ms
(3 rows)

test=# explain analyze select * from test_seqindex2 where sid='AA023';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_seqindex2 (cost=4.95..275.53 rows=73 width=30) (actual time=57.833..71.577 rows=2 loops=1)
   Recheck Cond: ((sid)::text = 'AA023'::text)
   -> Bitmap Index Scan on idx_test_seqindex2_sid (cost=0.00..4.93 rows=73 width=0) (actual time=34.374..34.374 rows=2 loops=1)
         Index Cond: ((sid)::text = 'AA023'::text)
Total runtime: 71.599 ms
(5 rows)

test=# \d test_seqindex1
       Table "public.test_seqindex1"
Column |         Type          | Modifiers
--------+-----------------------+-----------
sid    | character varying(13) | not null
name   | character varying(80) |
Indexes:
    "test_seqindex1_pkey" PRIMARY KEY, btree (sid)

test=# \d test_seqindex2
       Table "public.test_seqindex2"
Column |         Type          | Modifiers
--------+-----------------------+-----------
eid    | integer               | not null
sid    | character varying(13) |
ename | character varying(80) |
Indexes:
    "test_seqindex2_pkey" PRIMARY KEY, btree (eid)
    "idx_test_seqindex2_sid" btree (sid)

===============================================================================================

On Thu, Oct 21, 2010 at 11:09 AM, DM <dm.aeqa@gmail.com> wrote:

perfecto, thank you for the explanation.

- Deepak

On Thu, Oct 21, 2010 at 3:20 AM, Mathieu De Zutter <mathieu@dezutter.org> wrote:
On Thu, Oct 21, 2010 at 3:47 AM, DM <dm.aeqa@gmail.com> wrote:
> I was hoping the optimizer would do a join using index scan.
>
> Could some one please explain me why its not doing an index scan rather than
> sequential scan .

A index scan would be probably slower here because you're asking for a
lot of rows. A lot of rows means a lot of I/O, and an index scan is
more I/O intensive (since it has to read the index too). If you limit
the result (by being more selective in your where clause, just like
you do in the first two queries), postgres will most likely switch to
index scan.

You can see for yourself if index-scan would be faster in your case by
running the following command before "explain (analyze)":

set enable_seqscan = off;

BTW, try to use explain analyze instead of explain, that way you'll
see the actual timings too instead of just the planner estimates.

Kind regards,
Mathieu

Re: Question on Explain : Index Scan

From

Mathieu De Zutter

Date:

22 October 2010, 05:27:03

On Thu, Oct 21, 2010 at 8:51 PM, DM <dm.aeqa@gmail.com> wrote:
> Why is the difference in query plan, and the total runtime. Both tables have
> the same  btree index
>
>
> test=# explain analyze select * from test_seqindex1 where sid='AA023';
>                                                              QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_seqindex1_pkey on test_seqindex1  (cost=0.00..8.27
> rows=1 width=28) (actual time=0.017..0.018 rows=1 loops=1)
>    Index Cond: ((sid)::text = 'AA023'::text)
>  Total runtime: 0.035 ms
> (3 rows)
>
> test=# explain analyze select * from test_seqindex2 where sid='AA023';
>                                                             QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on test_seqindex2  (cost=4.95..275.53 rows=73 width=30)
> (actual time=57.833..71.577 rows=2 loops=1)
>    Recheck Cond: ((sid)::text = 'AA023'::text)
>    ->  Bitmap Index Scan on idx_test_seqindex2_sid  (cost=0.00..4.93 rows=73
> width=0) (actual time=34.374..34.374 rows=2 loops=1)
>          Index Cond: ((sid)::text = 'AA023'::text)
>  Total runtime: 71.599 ms
> (5 rows)
>
>
> test=# \d test_seqindex1
>        Table "public.test_seqindex1"
>  Column |         Type          | Modifiers
> --------+-----------------------+-----------
>  sid    | character varying(13) | not null
>  name   | character varying(80) |
> Indexes:
>     "test_seqindex1_pkey" PRIMARY KEY, btree (sid)
>
> test=# \d test_seqindex2
>        Table "public.test_seqindex2"
>  Column |         Type          | Modifiers
> --------+-----------------------+-----------
>  eid    | integer               | not null
>  sid    | character varying(13) |
>  ename  | character varying(80) |
> Indexes:
>     "test_seqindex2_pkey" PRIMARY KEY, btree (eid)
>     "idx_test_seqindex2_sid" btree (sid)


Index scans are very efficient when they return very few rows. If they
return more (like in this case: estimate is 73 rows), the cost goes up
linearly (here: 73 times slower). An index does not contain the
elements according to their physical location, so it results in a lot
of random I/O. Here the planner thinks it's better to do a bitmap
index scan (first mark all pages that contain results and then read
all these pages sequentially) so to have sequential I/O instead, which
is usually faster. If your query really returns 73 rows, it would be a
good choice.

However, for your case, the estimates are off (73 matches estimated vs
2 matches actual). Therefore the planner needs better statistics on
the actual values in your table.

Look for "SET STATISTICS" here:
http://www.postgresql.org/docs/9.0/static/sql-altertable.html

Kind regards,
Mathieu