Re: Analyzer is clueless - Mailing list pgsql-performance
From | Joshua D. Drake |
---|---|
Subject | Re: Analyzer is clueless |
Date | |
Msg-id | 419BBC83.7020802@commandprompt.com Whole thread Raw |
In response to | Analyzer is clueless (David Brown <time@bigpond.net.au>) |
List | pgsql-performance |
Hello, Have you tried increasing the statistics target for orderdate and rerunning analyze? Sincerely, Joshua D. Drake David Brown wrote: > I'm doing some performance profiling with a simple two-table query: > > SELECT L."ProductID", sum(L."Amount") > FROM "drinv" H > JOIN "drinvln" L ON L."OrderNo" = H."OrderNo" > WHERE > ("OrderDate" between '2003-01-01' AND '2003-04-30') > GROUP BY L."ProductID" > > drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB.OrderNo is indexed in both tables, as is OrderDate. > > The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking variousconf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer doesn'tseem to take resources into account like some of the doco suggests. > > The date selection represents about 5% of the range. Here's the plan summaries: > > Three months (2003-01-01 to 2003-03-30) = 1 second > > HashAggregate (cost=119365.53..119368.74 rows=642 width=26) > -> Nested Loop (cost=0.00..118791.66 rows=114774 width=26) > -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..200.27 rows=3142 width=8) > Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date)) > -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..28.73 rows=721 width=34) > Index Cond: (l."OrderNo" = "outer"."OrderNo") > > > Four months (2003-01-01 to 2003-04-30) = 60 seconds > > HashAggregate (cost=126110.53..126113.74 rows=642 width=26) > -> Hash Join (cost=277.55..125344.88 rows=153130 width=26) > Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo") > -> Seq Scan on drinvln l (cost=0.00..106671.35 rows=3372935 width=34) > -> Hash (cost=267.07..267.07 rows=4192 width=8) > -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) > Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) > > > Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds > > > HashAggregate (cost=130565.83..130569.04 rows=642 width=26) > -> Merge Join (cost=519.29..129800.18 rows=153130 width=26) > Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo") > -> Sort (cost=519.29..529.77 rows=4192 width=8) > Sort Key: h."OrderNo" > -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) > Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) > -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..119296.29 rows=3372935 width=34) > > Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see, theestimated costs are wild anyway. > > As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster. > > MySQL (InnoDB) took 2 seconds, which is 30 times faster. > > The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong? > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
pgsql-performance by date: