Thread: Query Plan - Index Scan & Seq Scan

Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
When joining two tables the query plan is doing a seq scan rather than index
scan. I do have indexes on the columns used for joining the tables.

Example:
SELECT a.id FROM a, b WHERE a.id = b.id;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Hash Join  (cost=13865.30..326413.23 rows=6451 width=18)
   Hash Cond: ("outer".id = "inner".id)
   ->  Seq Scan on a  (cost=0.00..125076.37 rows=6450937 width=18)
   ->  Hash  (cost=10168.64..10168.64 rows=500664 width=4)
         ->  Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4)
(5 rows)


The planner used to perform a index scan. I have added a lot of data in those
two tables. Right now both tables have millions of records. After adding the new
records the planner is going for a seq scan while doing the join.

Is there any tunning I can do so that the query planner would do a index scan?

I did a vacuum analyze but no change.

Thanks,
-Prasanth.

Re: Query Plan - Index Scan & Seq Scan

From
Tom Lane
Date:
Prasanth <dbadmin@nqadmin.com> writes:
>  Hash Join  (cost=13865.30..326413.23 rows=6451 width=18)
>    Hash Cond: ("outer".id = "inner".id)
>    ->  Seq Scan on a  (cost=0.00..125076.37 rows=6450937 width=18)
>    ->  Hash  (cost=10168.64..10168.64 rows=500664 width=4)
>          ->  Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4)
> (5 rows)

> The planner used to perform a index scan. I have added a lot of data in those
> two tables. Right now both tables have millions of records.

It is highly unlikely that you want an index scan for joining millions
of records ...

However, if you do have millions in both tables, why does the planner
think there are only 500664 rows in b?  Maybe you are overdue for ANALYZE.

            regards, tom lane

Re: Query Plan - Index Scan & Seq Scan

From
Scott Marlowe
Date:
On Thu, 2005-05-12 at 10:05, Prasanth wrote:
> When joining two tables the query plan is doing a seq scan rather than index
> scan. I do have indexes on the columns used for joining the tables.
>
> Example:
> SELECT a.id FROM a, b WHERE a.id = b.id;
>
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Hash Join  (cost=13865.30..326413.23 rows=6451 width=18)
>    Hash Cond: ("outer".id = "inner".id)
>    ->  Seq Scan on a  (cost=0.00..125076.37 rows=6450937 width=18)
>    ->  Hash  (cost=10168.64..10168.64 rows=500664 width=4)
>          ->  Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4)
> (5 rows)
>
>
> The planner used to perform a index scan. I have added a lot of data in those
> two tables. Right now both tables have millions of records. After adding the new
> records the planner is going for a seq scan while doing the join.
>
> Is there any tunning I can do so that the query planner would do a index scan?
>
> I did a vacuum analyze but no change.

try this:

explain analyze SELECT a.id FROM a, b WHERE a.id = b.id;
set enable_seqscan=off;
explain analyze SELECT a.id FROM a, b WHERE a.id = b.id;

and see which is faster.

It's quite likely that using an index here makes no sense, since there's
no selectivity happening, and you need all the data anyway.

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
Thanks for the prompt reply.

Table a has about 6 million and table b had a little more than half a million.

Sorry I wasn't exact about my numbers before.

I will be having the where conditions on both the tables that would bring down
the count drastically. Even in this case the planner is going for a seq scan.

A where condition I always use is shown below. This is bringing down the number
of rows from 6.5m to 1210. I have an index on code also. Even here it is going
for seq scan.

EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2;
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=141321.09..141321.09 rows=1 width=0) (actual
time=6454.063..6454.064 rows=1 loops=1)
   ->  Seq Scan on a  (cost=0.00..141318.40 rows=1076 width=0) (actual
time=15.687..6453.292 rows=1210 loops=1)
         Filter: (code > 2)
 Total runtime: 6454.140 ms
(4 rows)


Below is example where I have where conditions on both the tables this in effect
is limiting the number of rows from each tables a & b to 171 & 1076 respectively.

EXPLAIN (SELECT fund_value FROM b INNER JOIN a ON a.id = b.id WHERE code >2 AND
b.account_id = 16221);
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=141372.58..141462.28 rows=1 width=8)
   Merge Cond: ("outer".id = "inner".id)
   ->  Index Scan using id_idx on b  (cost=0.00..14415.96 rows=171 width=4)
         Filter: (account_id = 16221)
   ->  Sort  (cost=141372.58..141375.27 rows=1076 width=12)
         Sort Key: a.id
         ->  Seq Scan on a  (cost=0.00..141318.40 rows=1076 width=12)
               Filter: (code > 2)
(8 rows)


Thanks,
-Prasanth.

Tom Lane wrote:
> Prasanth <dbadmin@nqadmin.com> writes:
>
>> Hash Join  (cost=13865.30..326413.23 rows=6451 width=18)
>>   Hash Cond: ("outer".id = "inner".id)
>>   ->  Seq Scan on a  (cost=0.00..125076.37 rows=6450937 width=18)
>>   ->  Hash  (cost=10168.64..10168.64 rows=500664 width=4)
>>         ->  Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4)
>>(5 rows)
>
>
>>The planner used to perform a index scan. I have added a lot of data in those
>>two tables. Right now both tables have millions of records.
>
>
> It is highly unlikely that you want an index scan for joining millions
> of records ...
>
> However, if you do have millions in both tables, why does the planner
> think there are only 500664 rows in b?  Maybe you are overdue for ANALYZE.
>
>             regards, tom lane
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
I agree with you.

But I have the where conditions on the tables I was expecting the planner to
user index scan but it went for seq scan.

I did a little testing using what you said.

Below are the results.

SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221;

Total runtime: 18194.936 ms

Then I set the seqscan off and ran the same query.

Total runtime: 27.554 ms

Thanks,
-Prasanth.

Scott Marlowe wrote:
> On Thu, 2005-05-12 at 10:05, Prasanth wrote:
>
>>When joining two tables the query plan is doing a seq scan rather than index
>>scan. I do have indexes on the columns used for joining the tables.
>>
>>Example:
>>SELECT a.id FROM a, b WHERE a.id = b.id;
>>
>>                                       QUERY PLAN
>>-----------------------------------------------------------------------------------------
>> Hash Join  (cost=13865.30..326413.23 rows=6451 width=18)
>>   Hash Cond: ("outer".id = "inner".id)
>>   ->  Seq Scan on a  (cost=0.00..125076.37 rows=6450937 width=18)
>>   ->  Hash  (cost=10168.64..10168.64 rows=500664 width=4)
>>         ->  Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4)
>>(5 rows)
>>
>>
>>The planner used to perform a index scan. I have added a lot of data in those
>>two tables. Right now both tables have millions of records. After adding the new
>>records the planner is going for a seq scan while doing the join.
>>
>>Is there any tunning I can do so that the query planner would do a index scan?
>>
>>I did a vacuum analyze but no change.
>
>
> try this:
>
> explain analyze SELECT a.id FROM a, b WHERE a.id = b.id;
> set enable_seqscan=off;
> explain analyze SELECT a.id FROM a, b WHERE a.id = b.id;
>
> and see which is faster.
>
> It's quite likely that using an index here makes no sense, since there's
> no selectivity happening, and you need all the data anyway.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Scott Marlowe
Date:
On Thu, 2005-05-12 at 10:51, Prasanth wrote:
> I agree with you.
>
> But I have the where conditions on the tables I was expecting the planner to
> user index scan but it went for seq scan.
>
> I did a little testing using what you said.
>
> Below are the results.
>
> SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221;
>
> Total runtime: 18194.936 ms
>
> Then I set the seqscan off and ran the same query.
>
> Total runtime: 27.554 ms

Good!  This tells us two things, 1:  Your database can use the indexes
(sometimes indexes can't be used for various reasons, which are quickly
disappearing by the way.) and 2:  Your database is making the wrong
choice about when to use a seq scan versus an index.

What does the explain analyze output from that query say about row
estimates versus actual rows returned?

Re: Query Plan - Index Scan & Seq Scan

From
Tom Lane
Date:
Prasanth <dbadmin@nqadmin.com> writes:
> A where condition I always use is shown below. This is bringing down the number
> of rows from 6.5m to 1210. I have an index on code also. Even here it is going
> for seq scan.

> EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=141321.09..141321.09 rows=1 width=0) (actual
> time=6454.063..6454.064 rows=1 loops=1)
>    ->  Seq Scan on a  (cost=0.00..141318.40 rows=1076 width=0) (actual
> time=15.687..6453.292 rows=1210 loops=1)
>          Filter: (code > 2)
>  Total runtime: 6454.140 ms
> (4 rows)

I'm going to hazard a guess that "code" is not of type integer, and that
you're using a pre-8.0 PG release.  Cross-type comparisons are not
indexable before 8.0, so you need to cast the integer constant 2 to
whatever type "code" is.

            regards, tom lane

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
EXPLAIN ANALYZE (SELECT id FROM a,b WHERE a.id = b.id AND code >2 AND
b.account_id = 16221);

    QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=141372.58..141462.28 rows=1 width=8) (actual
time=726.172..726.172 rows=0 loops=1)
   Merge Cond: ("outer".id = "inner".id)
   ->  Index Scan using id_idx on b  (cost=0.00..14415.96 rows=171 width=4)
(actual time=726.168..726.168 rows=0 loops=1)
         Filter: (account_id = 16221)
   ->  Sort  (cost=141372.58..141375.27 rows=1076 width=12) (never executed)
         Sort Key: a.id
         ->  Seq Scan on a  (cost=0.00..141318.40 rows=1076 width=12) (never
executed)
               Filter: (code > 2)
 Total runtime: 726.253 ms
(9 rows)

Thanks,
-Prasanth.

Scott Marlowe wrote:
> On Thu, 2005-05-12 at 10:51, Prasanth wrote:
>
>>I agree with you.
>>
>>But I have the where conditions on the tables I was expecting the planner to
>>user index scan but it went for seq scan.
>>
>>I did a little testing using what you said.
>>
>>Below are the results.
>>
>>SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221;
>>
>>Total runtime: 18194.936 ms
>>
>>Then I set the seqscan off and ran the same query.
>>
>>Total runtime: 27.554 ms
>
>
> Good!  This tells us two things, 1:  Your database can use the indexes
> (sometimes indexes can't be used for various reasons, which are quickly
> disappearing by the way.) and 2:  Your database is making the wrong
> choice about when to use a seq scan versus an index.
>
> What does the explain analyze output from that query say about row
> estimates versus actual rows returned?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
code data type is int2.

I am sorry that I did not mention the version number

I am using 7.4.7.

Thanks,
-Prasanth.

Tom Lane wrote:
> Prasanth <dbadmin@nqadmin.com> writes:
>
>>A where condition I always use is shown below. This is bringing down the number
>>of rows from 6.5m to 1210. I have an index on code also. Even here it is going
>>for seq scan.
>
>
>>EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2;
>>                                                          QUERY PLAN

>>-------------------------------------------------------------------------------------------------------------------------------
>> Aggregate  (cost=141321.09..141321.09 rows=1 width=0) (actual
>>time=6454.063..6454.064 rows=1 loops=1)
>>   ->  Seq Scan on a  (cost=0.00..141318.40 rows=1076 width=0) (actual
>>time=15.687..6453.292 rows=1210 loops=1)
>>         Filter: (code > 2)
>> Total runtime: 6454.140 ms
>>(4 rows)
>
>
> I'm going to hazard a guess that "code" is not of type integer, and that
> you're using a pre-8.0 PG release.  Cross-type comparisons are not
> indexable before 8.0, so you need to cast the integer constant 2 to
> whatever type "code" is.
>
>             regards, tom lane
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Tom Lane
Date:
Prasanth <dbadmin@nqadmin.com> writes:
> code data type is int2.

Ah-hah.  So "where code > 2::int2" should work noticeably better for
you.  Or you could do "where code > '2'" to avoid hard-wiring the data
type knowledge into your queries.  Or just change it to int4 ;-)
Or update to 8.0.

            regards, tom lane

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
Sorry seems like I am missing some thing here.

What is the difference between int2 & int4 as far as index scan is concerned?

I did try to update to 8.0 but it is not taking my dump from 7.4.7. If I
remember right it was complaining about pg_database being not present. I posted
a message on forums no response so I had to abort it. I will give it a short
again when I have some time.

I think pg_database belongs to pgadmin. I can drop those tables and create a
dump but then if for some reason I have to restore a old dump them I will be in
trouble.

Thanks,
-Prasanth.

Tom Lane wrote:
> Prasanth <dbadmin@nqadmin.com> writes:
>
>>code data type is int2.
>
>
> Ah-hah.  So "where code > 2::int2" should work noticeably better for
> you.  Or you could do "where code > '2'" to avoid hard-wiring the data
> type knowledge into your queries.  Or just change it to int4 ;-)
> Or update to 8.0.
>
>             regards, tom lane
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
Thanks Tom.

Seems like having int2 after code is doing the trick.

Can you please explain the reasons behind this.

I Really appreciate your time.

Thanks,
-Prasanth.

Tom Lane wrote:
> Prasanth <dbadmin@nqadmin.com> writes:
>
>>code data type is int2.
>
>
> Ah-hah.  So "where code > 2::int2" should work noticeably better for
> you.  Or you could do "where code > '2'" to avoid hard-wiring the data
> type knowledge into your queries.  Or just change it to int4 ;-)
> Or update to 8.0.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Prasanth
Date:
Ok read about the indexes and type matches.

So is this fixed in 8.0?

Thanks,
-Prasanth.

Tom Lane wrote:
> Prasanth <dbadmin@nqadmin.com> writes:
>
>>code data type is int2.
>
>
> Ah-hah.  So "where code > 2::int2" should work noticeably better for
> you.  Or you could do "where code > '2'" to avoid hard-wiring the data
> type knowledge into your queries.  Or just change it to int4 ;-)
> Or update to 8.0.
>
>             regards, tom lane
>
>

Re: Query Plan - Index Scan & Seq Scan

From
Tom Lane
Date:
Prasanth <dbadmin@nqadmin.com> writes:
> Seems like having int2 after code is doing the trick.
> Can you please explain the reasons behind this.

When you write "int2col > 2", the operator that is selected is int2-gt-int4.
However the index on an int2 column can only deal with int2-gt-int2.

8.0 generalized the index mechanism enough to allow int2 indexes to deal
with int2-gt-int4, but in earlier releases you have to take care to make
the constant an int2.

int8 columns have the same issue from the other direction ...

            regards, tom lane