Thread: Indices arent being used

Indices arent being used

From
rantunes@student.dei.uc.pt
Date:
Hi guys

Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to
choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a
small problem which I hope could be resolved here.

I'm trying to speed up this query:

select count(*) from actvars, prodlevel where
actvars.product_level=prodlevel.code_level and
prodlevel.division_level='OY3S5LAPALL6';

ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700
rows. Both have btree indices.

I executed the query and it took me almost half an hour to execute! Running the
same query on MySQL the result came 6 seconds after. As you can see there is a
large differences between execution times.

After running an explain:

Aggregate  (cost=3123459.62..3123459.62 rows=1 width=32)
   ->  Merge Join  (cost=3021564.79..3119827.17 rows=1452981 width=32)
         Merge Cond: ("outer".product_level = "inner".code_level)
         ->  Sort  (cost=3020875.00..3060938.81 rows=16025523 width=16)
               Sort Key: actvars.product_level
               ->  Seq Scan on actvars  (cost=0.00..365711.23 rows=16025523
width=16)
         ->  Sort  (cost=689.79..694.48 rows=1877 width=16)
               Sort Key: prodlevel.code_level
               ->  Seq Scan on prodlevel  (cost=0.00..587.75 rows=1877 width=16)
                     Filter: (division_level = 'OY3S5LAPALL6'::bpchar)

I found that the indices werent being used.

The database has been vacuumed and analyze has been executed.

I tried disabling the seqscan, so as to force index usage. The planner uses
index scans but the query stil takes a very long time to execute.

Any suggestions on resolving this would would be appreciated.

P.S: Im running PostgrSQL
7.3.2

---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/



Re: Indices arent being used

From
Rod Taylor
Date:
> Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to
> choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a
> small problem which I hope could be resolved here.
>
> I'm trying to speed up this query:
>
> select count(*) from actvars, prodlevel where
> actvars.product_level=prodlevel.code_level and
> prodlevel.division_level='OY3S5LAPALL6';

How about EXPLAIN ANALYZE output?

> ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700
> rows. Both have btree indices.

> The database has been vacuumed and analyze has been executed.

The usual postgresql.conf adjustments have also been made?


Attachment

Re: Indices arent being used

From
Shridhar Daithankar
Date:
rantunes@student.dei.uc.pt wrote:
> Hi guys
>
> Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to
> choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a
> small problem which I hope could be resolved here.
>
> I'm trying to speed up this query:
>
> select count(*) from actvars, prodlevel where
> actvars.product_level=prodlevel.code_level and
> prodlevel.division_level='OY3S5LAPALL6';
>
> ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700
> rows. Both have btree indices.
>
> I executed the query and it took me almost half an hour to execute! Running the
> same query on MySQL the result came 6 seconds after. As you can see there is a
> large differences between execution times.
>
> After running an explain:
>
> Aggregate  (cost=3123459.62..3123459.62 rows=1 width=32)
>    ->  Merge Join  (cost=3021564.79..3119827.17 rows=1452981 width=32)
>          Merge Cond: ("outer".product_level = "inner".code_level)
>          ->  Sort  (cost=3020875.00..3060938.81 rows=16025523 width=16)
>                Sort Key: actvars.product_level
>                ->  Seq Scan on actvars  (cost=0.00..365711.23 rows=16025523
> width=16)

Damn.. Seq. scan for actvars? I would say half an hour is a good throughput.

Are there any indexes on both actvars.product_level and prodlevel.code_level?
Are they exactly compatible type? int2 and int4 are not compatible in postgresql
lingo.

That plan should go for index scan. Can you show us the table definitions?

And yes, what tuning you did to postgresql?

  Shridhar


Re: Indices arent being used

From
rantunes@student.dei.uc.pt
Date:
>Damn.. Seq. scan for actvars? I would say half an hour is a good throughput.
>
>Are there any indexes on both actvars.product_level and prodlevel.code_level?
Are >they exactly compatible type? int2 and int4 are not compatible in
postgresql >lingo.
>
>That plan should go for index scan. Can you show us the table definitions?
>
>And yes, what tuning you did to postgresql?
>
>Shridhar
The alterations done upon postgresql.conf with 512 RAM were these:

max_connections = 3
shared_buffers = 6000
wal_buffers = 32
sort_mem = 2048
fsync = false
effective_cache_size = 44800
random_page_cost = 3
default_statistics_target = 50

Yes I have an index on actvars.product_level and an index on
prodlevel.code_level.Both indices have character(12) data types.



---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/



Re: Indices arent being used

From
Tom Lane
Date:
rantunes@student.dei.uc.pt writes:
> sort_mem = 2048

2 meg sort_mem seems on the small side.

> Yes I have an index on actvars.product_level and an index on
> prodlevel.code_level.Both indices have character(12) data types.

Can you force an indexscan to be chosen by setting enable_seqscan off?
If so, what does the explain look like?

BTW, it's always much more useful to show EXPLAIN ANALYZE output than
plain EXPLAIN.  The issue is generally "why did the planner misestimate"
and so knowing how its estimates diverge from reality is always a
critical bit of information.

            regards, tom lane

Re: Indices arent being used

From
rantunes@student.dei.uc.pt
Date:
Here is the explain analyze of the query:

explain analyze select count(*) from actvars, prodlevel where
actvars.product_level=prodlevel.code_level and
prodlevel.division_level='OY3S5LAPALL6';

 Aggregate  (cost=3123459.62..3123459.62 rows=1 width=32) (actual
time=1547173.60..1547173.60 rows=1 loops=1)
   ->  Merge Join  (cost=3021564.79..3119827.17 rows=1452981 width=32) (actual
time=1400269.29..1545793.13 rows=1918466 loops=1)
         Merge Cond: ("outer".product_level = "inner".code_level)
         ->  Sort  (cost=3020875.00..3060938.81 rows=16025523 width=16) (actual
time=1400117.06..1518059.84 rows=16020985 loops=1)
               Sort Key: actvars.product_level
               ->  Seq Scan on actvars  (cost=0.00..365711.23 rows=16025523
width=16) (actual time=29.14..51259.82 rows=16025523 loops=1)
         ->  Sort  (cost=689.79..694.48 rows=1877 width=16) (actual
time=92.90..1217.15 rows=1917991 loops=1)
               Sort Key: prodlevel.code_level
               ->  Seq Scan on prodlevel  (cost=0.00..587.75 rows=1877 width=16)
(actual time=16.48..82.72 rows=1802 loops=1)
                     Filter: (division_level = 'OY3S5LAPALL6'::bpchar)
 Total runtime: 1547359.08 msec

I have tried diabeling the seqscan:

set enable_seqscan=false;

explain select count(*) from actvars, prodlevel where
actvars.product_level=prodlevel.code_level and
prodlevel.division_level='OY3S5LAPALL6';

Aggregate  (cost=6587448.25..6587448.25 rows=1 width=32)
   ->  Nested Loop  (cost=0.00..6583815.80 rows=1452981 width=32)
         ->  Index Scan using division_level_prodlevel_index on prodlevel
(cost=0.00..999.13 rows=1877 width=16)
               Index Cond: (division_level = 'OY3S5LAPALL6'::bpchar)
         ->  Index Scan using product_level_actvars_index on actvars
(cost=0.00..3492.95 rows=1161 width=16)
               Index Cond: (actvars.product_level = "outer".code_level)

This method forces the indices to work but it looks like it takes a long to
finish executing, I had to cancel the query after 10 min. Using vmstat i found
that there were alot of swap outs and swap ins, affecting the overall performance.

How can i speed this
up?

---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/



Re: Indices arent being used

From
Tom Lane
Date:
rantunes@student.dei.uc.pt writes:
> Here is the explain analyze of the query:
> explain analyze select count(*) from actvars, prodlevel where
> actvars.product_level=prodlevel.code_level and
> prodlevel.division_level='OY3S5LAPALL6';

> [ slow merge join ]

I wonder whether a hash join wouldn't work better.  Can you force a hash
join?  (Try "enable_mergejoin = 0" and if needed "enable_nestloop = 0";
don't disable seqscans though.)  If you can get such a plan, please post
the explain analyze results for it.

> This method forces the indices to work but it looks like it takes a long to
> finish executing, I had to cancel the query after 10 min.

"Force use of the indexes" is not always an answer to performance issues.

            regards, tom lane