Thread: Indices arent being used
Hi guys Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a small problem which I hope could be resolved here. I'm trying to speed up this query: select count(*) from actvars, prodlevel where actvars.product_level=prodlevel.code_level and prodlevel.division_level='OY3S5LAPALL6'; ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700 rows. Both have btree indices. I executed the query and it took me almost half an hour to execute! Running the same query on MySQL the result came 6 seconds after. As you can see there is a large differences between execution times. After running an explain: Aggregate (cost=3123459.62..3123459.62 rows=1 width=32) -> Merge Join (cost=3021564.79..3119827.17 rows=1452981 width=32) Merge Cond: ("outer".product_level = "inner".code_level) -> Sort (cost=3020875.00..3060938.81 rows=16025523 width=16) Sort Key: actvars.product_level -> Seq Scan on actvars (cost=0.00..365711.23 rows=16025523 width=16) -> Sort (cost=689.79..694.48 rows=1877 width=16) Sort Key: prodlevel.code_level -> Seq Scan on prodlevel (cost=0.00..587.75 rows=1877 width=16) Filter: (division_level = 'OY3S5LAPALL6'::bpchar) I found that the indices werent being used. The database has been vacuumed and analyze has been executed. I tried disabling the seqscan, so as to force index usage. The planner uses index scans but the query stil takes a very long time to execute. Any suggestions on resolving this would would be appreciated. P.S: Im running PostgrSQL 7.3.2 --------------------------------------------- This message was sent using Endymion MailMan. http://www.endymion.com/products/mailman/
> Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to > choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a > small problem which I hope could be resolved here. > > I'm trying to speed up this query: > > select count(*) from actvars, prodlevel where > actvars.product_level=prodlevel.code_level and > prodlevel.division_level='OY3S5LAPALL6'; How about EXPLAIN ANALYZE output? > ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700 > rows. Both have btree indices. > The database has been vacuumed and analyze has been executed. The usual postgresql.conf adjustments have also been made?
Attachment
rantunes@student.dei.uc.pt wrote: > Hi guys > > Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to > choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a > small problem which I hope could be resolved here. > > I'm trying to speed up this query: > > select count(*) from actvars, prodlevel where > actvars.product_level=prodlevel.code_level and > prodlevel.division_level='OY3S5LAPALL6'; > > ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700 > rows. Both have btree indices. > > I executed the query and it took me almost half an hour to execute! Running the > same query on MySQL the result came 6 seconds after. As you can see there is a > large differences between execution times. > > After running an explain: > > Aggregate (cost=3123459.62..3123459.62 rows=1 width=32) > -> Merge Join (cost=3021564.79..3119827.17 rows=1452981 width=32) > Merge Cond: ("outer".product_level = "inner".code_level) > -> Sort (cost=3020875.00..3060938.81 rows=16025523 width=16) > Sort Key: actvars.product_level > -> Seq Scan on actvars (cost=0.00..365711.23 rows=16025523 > width=16) Damn.. Seq. scan for actvars? I would say half an hour is a good throughput. Are there any indexes on both actvars.product_level and prodlevel.code_level? Are they exactly compatible type? int2 and int4 are not compatible in postgresql lingo. That plan should go for index scan. Can you show us the table definitions? And yes, what tuning you did to postgresql? Shridhar
>Damn.. Seq. scan for actvars? I would say half an hour is a good throughput. > >Are there any indexes on both actvars.product_level and prodlevel.code_level? Are >they exactly compatible type? int2 and int4 are not compatible in postgresql >lingo. > >That plan should go for index scan. Can you show us the table definitions? > >And yes, what tuning you did to postgresql? > >Shridhar The alterations done upon postgresql.conf with 512 RAM were these: max_connections = 3 shared_buffers = 6000 wal_buffers = 32 sort_mem = 2048 fsync = false effective_cache_size = 44800 random_page_cost = 3 default_statistics_target = 50 Yes I have an index on actvars.product_level and an index on prodlevel.code_level.Both indices have character(12) data types. --------------------------------------------- This message was sent using Endymion MailMan. http://www.endymion.com/products/mailman/
rantunes@student.dei.uc.pt writes: > sort_mem = 2048 2 meg sort_mem seems on the small side. > Yes I have an index on actvars.product_level and an index on > prodlevel.code_level.Both indices have character(12) data types. Can you force an indexscan to be chosen by setting enable_seqscan off? If so, what does the explain look like? BTW, it's always much more useful to show EXPLAIN ANALYZE output than plain EXPLAIN. The issue is generally "why did the planner misestimate" and so knowing how its estimates diverge from reality is always a critical bit of information. regards, tom lane
Here is the explain analyze of the query: explain analyze select count(*) from actvars, prodlevel where actvars.product_level=prodlevel.code_level and prodlevel.division_level='OY3S5LAPALL6'; Aggregate (cost=3123459.62..3123459.62 rows=1 width=32) (actual time=1547173.60..1547173.60 rows=1 loops=1) -> Merge Join (cost=3021564.79..3119827.17 rows=1452981 width=32) (actual time=1400269.29..1545793.13 rows=1918466 loops=1) Merge Cond: ("outer".product_level = "inner".code_level) -> Sort (cost=3020875.00..3060938.81 rows=16025523 width=16) (actual time=1400117.06..1518059.84 rows=16020985 loops=1) Sort Key: actvars.product_level -> Seq Scan on actvars (cost=0.00..365711.23 rows=16025523 width=16) (actual time=29.14..51259.82 rows=16025523 loops=1) -> Sort (cost=689.79..694.48 rows=1877 width=16) (actual time=92.90..1217.15 rows=1917991 loops=1) Sort Key: prodlevel.code_level -> Seq Scan on prodlevel (cost=0.00..587.75 rows=1877 width=16) (actual time=16.48..82.72 rows=1802 loops=1) Filter: (division_level = 'OY3S5LAPALL6'::bpchar) Total runtime: 1547359.08 msec I have tried diabeling the seqscan: set enable_seqscan=false; explain select count(*) from actvars, prodlevel where actvars.product_level=prodlevel.code_level and prodlevel.division_level='OY3S5LAPALL6'; Aggregate (cost=6587448.25..6587448.25 rows=1 width=32) -> Nested Loop (cost=0.00..6583815.80 rows=1452981 width=32) -> Index Scan using division_level_prodlevel_index on prodlevel (cost=0.00..999.13 rows=1877 width=16) Index Cond: (division_level = 'OY3S5LAPALL6'::bpchar) -> Index Scan using product_level_actvars_index on actvars (cost=0.00..3492.95 rows=1161 width=16) Index Cond: (actvars.product_level = "outer".code_level) This method forces the indices to work but it looks like it takes a long to finish executing, I had to cancel the query after 10 min. Using vmstat i found that there were alot of swap outs and swap ins, affecting the overall performance. How can i speed this up? --------------------------------------------- This message was sent using Endymion MailMan. http://www.endymion.com/products/mailman/
rantunes@student.dei.uc.pt writes: > Here is the explain analyze of the query: > explain analyze select count(*) from actvars, prodlevel where > actvars.product_level=prodlevel.code_level and > prodlevel.division_level='OY3S5LAPALL6'; > [ slow merge join ] I wonder whether a hash join wouldn't work better. Can you force a hash join? (Try "enable_mergejoin = 0" and if needed "enable_nestloop = 0"; don't disable seqscans though.) If you can get such a plan, please post the explain analyze results for it. > This method forces the indices to work but it looks like it takes a long to > finish executing, I had to cancel the query after 10 min. "Force use of the indexes" is not always an answer to performance issues. regards, tom lane