how to interpret/improve bad row estimates - Mailing list pgsql-performance
From | Robert Treat |
---|---|
Subject | how to interpret/improve bad row estimates |
Date | |
Msg-id | 1140712182.2190.297.camel@camel Whole thread Raw |
List | pgsql-performance |
postgresql 8.1, I have two tables, bot hoth vacuumed and analyzed. on msg307 I have altered the entityid and msgid columns statistics values to 400. dev20001=# explain analyze SELECT ewm.entity_id, m.agentname, m.filecreatedate AS versioninfo FROM msg307 m join entity_watch_map ewm on (ewm.entity_id = m.entityid AND ewm.msgid = m.msgid AND ewm.msg_type = 307); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=6.62..5227.40 rows=1 width=36) (actual time=0.583..962.346 rows=75322 loops=1) -> Bitmap Heap Scan on entity_watch_map ewm (cost=6.62..730.47 rows=748 width=8) (actual time=0.552..7.017 rows=1264loops=1) Recheck Cond: (msg_type = 307) -> Bitmap Index Scan on ewm_msg_type (cost=0.00..6.62 rows=748 width=0) (actual time=0.356..0.356 rows=1264 loops=1) Index Cond: (msg_type = 307) -> Index Scan using msg307_entityid_msgid_idx on msg307 m (cost=0.00..6.00 rows=1 width=40) (actual time=0.011..0.295rows=60 loops=1264) Index Cond: (("outer".entity_id = m.entityid) AND ("outer".msgid = m.msgid)) Total runtime: 1223.469 ms (8 rows) I guess that the planner can not tell there is no correlation between the distinctness of those two columns, and so makes a really bad estimate on the indexscan, and pushes that estimate up into the nested loop? (luckily in this case doing an index scan is generally a good idea, so it works out, but it wouldn't always be a good idea) some pg_statistics information for those two columns entityid: starelid | 25580 staattnum | 1 stanullfrac | 0 stawidth | 4 stadistinct | 1266 stakind1 | 1 stakind2 | 2 stakind3 | 3 stakind4 | 0 staop1 | 96 staop2 | 97 staop3 | 97 staop4 | 0 stanumbers1 | {0.00222976,0.00222976,0.00153048,0.00137216,0.00137216} stanumbers2 | stanumbers3 | {0.100312} stanumbers4 | msgid: starelid | 25580 staattnum | 2 stanullfrac | 0 stawidth | 4 stadistinct | 1272 stakind1 | 1 stakind2 | 2 stakind3 | 3 stakind4 | 0 staop1 | 96 staop2 | 97 staop3 | 97 staop4 | 0 stanumbers1 | {0.00164923,0.00163604,0.00163604,0.00163604,0.00137216} stanumbers2 | stanumbers3 | {-0.0660856} stanumbers4 | is my interpretation of why i am seeing such bad estimates correct? I don't really think it is, because looking at a similar scenario on a 7.3 machine: ------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=1531.39..5350.90 rows=1 width=48) (actual time=118.44..899.37 rows=58260 loops=1) Merge Cond: (("outer".entityid = "inner".entity_id) AND ("outer".msgid = "inner".msgid)) -> Index Scan using msg307_entityid_msgid_idx on msg307 m (cost=0.00..3669.42 rows=58619 width=40) (actual time=0.31..390.01rows=58619 loops=1) -> Sort (cost=1531.39..1533.16 rows=709 width=8) (actual time=118.09..157.45 rows=58218 loops=1) Sort Key: ewm.entity_id, ewm.msgid -> Seq Scan on entity_watch_map ewm (cost=0.00..1497.80 rows=709 width=8) (actual time=0.14..114.74 rows=1157loops=1) Filter: (msg_type = 307) Total runtime: 951.23 msec (8 rows) It still has the bad estimate at the nested loop stage, but it does seem to have a better understanding of the # of rows it will return in the index scan on msg307. This leads me to wonder if there something I could do to improve the estimates on the 8.1 machine? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql-performance by date: