Thread: Is dump-reload the only cure?
Hi , For a particular table it was only dump and reload of the table that helped in enabling index usage. I tried VACUUM ANALYZE and even recreating the indexes but it did not work. why does the planner use the index like a miser? below are the details was there anything bettwer i could have done for indexes getting used? regds mallah. Query: explain SELECT count( email_id ) from email_bank_mailing_lists where query_id='499'; NOTICE: QUERY PLAN: Aggregate (cost=4330.48..4330.48 rows=1 width=4) -> Index Scan using email_bank_ml_qid on email_bank_mailing_lists (cost=0.00..4327.28 rows=1282 width=4) EXPLAIN distribution of query_id in table: total: 256419 query_id | count(*) ---------------------- 298 | 6167 328 | 2083 354 | 9875 404 | 6974 432 | 5059 437 | 2497 440 | 2837 448 | 14624 449 | 13053 454 | 409 455 | 3725 456 | 560 458 | 3477 460 | 5561 486 | 41842 488 | 63642 492 | 2244 493 | 6047 494 | 37415 499 | 25010 501 | 3318 before dump reload: tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation email_bank_mailing_lists-- NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. Total CPU 0.24s/0.04u sec elapsed 0.30 sec. NOTICE: Analyzing email_bank_mailing_lists VACUUM tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where query_id=499;NOTICE: QUERY PLAN: Aggregate (cost=6863.24..6863.24 rows=1 width=4) -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4) EXPLAIN ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
Hi Rod , Does it means that index scan is used for less frequenlty occuring data? yes my table was not clustered. can u tell me what does 0.00..6788.24 and rows and width means? in explain out put cost=0.00..6788.24 rows=30001 width=4 I have one more table where i face the similar problem , i have not dump - reloaded it yet , will post again if i face the problem. thanks Regds Mallah. > On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote: > > > Looks like a borderline case. See the costs of the index scan and sequential scan are very > similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This > should make a sequential scan much cheaper. > > However, if the data is clumped together (not distributed throughout the table) than an index > scan may be preferable. So... CLUSTER may be useful to you. > > In the future please 'explain analyze' the queries you're looking at to see actual costs as > compared to the estimated cost. > > >> 499 | 25010 >> 501 | 3318 >> >> >> before dump reload: >> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation >> email_bank_mailing_lists-- >> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. >> Total CPU 0.24s/0.04u sec elapsed 0.30 sec. >> NOTICE: Analyzing email_bank_mailing_lists >> VACUUM >> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where >> query_id=499;NOTICE: QUERY PLAN: >> >> Aggregate (cost=6863.24..6863.24 rows=1 width=4) >> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4) >> >> EXPLAIN > > -- > Rod Taylor ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
Rod , Clustering did work for my other case ;-) tradein_clients=> explain analyze SELECT count(*) from email_source where source_id=173; NOTICE: QUERY PLAN: Aggregate (cost=13042.91..13042.91 rows=1 width=0) (actual time=1415.32..1415.32 rows=1 loops=1) -> Seq Scan on email_source (cost=0.00..12964.48 rows=31375 width=0) (actual time=1.19..1368.58 rows=32851 loops=1)Total runtime: 1415.42 msec EXPLAIN tradein_clients=> \d email_source Table "email_source" Column | Type | Modifiers -----------+---------+----------- email_id | integer | source_id | integer | Indexes: email_source_sid Unique keys: email_source_idx tradein_clients=> CLUSTER email_source_sid on email_source ; CLUSTER tradein_clients=> tradein_clients=> explain analyze SELECT count(*) from email_source where source_id=173; NOTICE: QUERY PLAN: Aggregate (cost=11458.83..11458.83 rows=1 width=0) (actual time=207.73..207.73 rows=1 loops=1) -> Index Scan using email_source_sid on email_source (cost=0.00..11449.76 rows=3627 width=0) (actual time=0.27..161.04 rows=32851 loops=1)Total runtime: 207.90 msec EXPLAIN Does it Mean that clustered indexes are guarrented to be used for index scan? one more thing does clustering means that all future data addition will happen in the ordered manner only i mean consecutively in terms of source_id? Regds MALLAH. > On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote: > > > Looks like a borderline case. See the costs of the index scan and sequential scan are very > similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This > should make a sequential scan much cheaper. > > However, if the data is clumped together (not distributed throughout the table) than an index > scan may be preferable. So... CLUSTER may be useful to you. > > In the future please 'explain analyze' the queries you're looking at to see actual costs as > compared to the estimated cost. > > >> 499 | 25010 >> 501 | 3318 >> >> >> before dump reload: >> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation >> email_bank_mailing_lists-- >> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. >> Total CPU 0.24s/0.04u sec elapsed 0.30 sec. >> NOTICE: Analyzing email_bank_mailing_lists >> VACUUM >> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where >> query_id=499;NOTICE: QUERY PLAN: >> >> Aggregate (cost=6863.24..6863.24 rows=1 width=4) >> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4) >> >> EXPLAIN > > -- > Rod Taylor ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
Thanks for the insight Rod. is there any other place i can know more abt these principles? But if the table is clustered then the pages are stored catagiously with respect to that column rite? > On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote: >> >> Hi Rod , >> >> Does it means that index scan is used for less frequenlty occuring data? yes my table was not >> clustered. >> >> can u tell me what does 0.00..6788.24 and rows and width means? >> >> in explain out put cost=0.00..6788.24 rows=30001 width=4 >> >> >> I have one more table where i face the similar problem , i have not dump - reloaded it yet , >> will post again if i face the problem. > > Keep in mind that an index scan is very expensive in regards to a single tuple. It has to run > through (fetch) the index pages, then fetch the pages from the table. Since the table fetches > are random, the harddrive will probably incur a seek for each tuple found in the index. The > seeks add up much quicker than a sequential scan (without nearly as many seeks or drive head > movements). > > -- > Rod Taylor ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote: Looks like a borderline case. See the costs of the index scan and sequential scan are very similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This should make a sequential scan much cheaper. However, if the data is clumped together (not distributed throughout the table) than an index scan may be preferable. So... CLUSTER may be useful to you. In the future please 'explain analyze' the queries you're looking at to see actual costs as compared to the estimated cost. > 499 | 25010 > 501 | 3318 > > > before dump reload: > tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; > NOTICE: --Relation email_bank_mailing_lists-- > NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. > Total CPU 0.24s/0.04u sec elapsed 0.30 sec. > NOTICE: Analyzing email_bank_mailing_lists > VACUUM > tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where > query_id=499;NOTICE: QUERY PLAN: > > Aggregate (cost=6863.24..6863.24 rows=1 width=4) > -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4) > > EXPLAIN -- Rod Taylor
See Paragraph 2 of the description section: http://www.postgresql.org/idocs/index.php?sql-explain.html In the above is a good explanation of 'cost'. Rows is the number of rows estimated to be returned, and width is the expected number of columns it needs to deal with at that point. On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote: > > Hi Rod , > > Does it means that index scan is used for less frequenlty occuring data? > yes my table was not clustered. > > can u tell me what does 0.00..6788.24 and rows and width means? > > in explain out put cost=0.00..6788.24 rows=30001 width=4 > > > I have one more table where i face the similar problem , i have not dump - reloaded > it yet , will post again if i face the problem. > > > thanks > > Regds > Mallah. > > > > On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote: > > > > > > Looks like a borderline case. See the costs of the index scan and sequential scan are very > > similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This > > should make a sequential scan much cheaper. > > > > However, if the data is clumped together (not distributed throughout the table) than an index > > scan may be preferable. So... CLUSTER may be useful to you. > > > > In the future please 'explain analyze' the queries you're looking at to see actual costs as > > compared to the estimated cost. > > > > > >> 499 | 25010 > >> 501 | 3318 > >> > >> > >> before dump reload: > >> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation > >> email_bank_mailing_lists-- > >> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. > >> Total CPU 0.24s/0.04u sec elapsed 0.30 sec. > >> NOTICE: Analyzing email_bank_mailing_lists > >> VACUUM > >> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where > >> query_id=499;NOTICE: QUERY PLAN: > >> > >> Aggregate (cost=6863.24..6863.24 rows=1 width=4) > >> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4) > >> > >> EXPLAIN > > > > -- > > Rod Taylor > > > > ----------------------------------------- > Get your free web based email at trade-india.com. > "India's Leading B2B eMarketplace.!" > http://www.trade-india.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Rod Taylor
On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote: > > Hi Rod , > > Does it means that index scan is used for less frequenlty occuring data? > yes my table was not clustered. > > can u tell me what does 0.00..6788.24 and rows and width means? > > in explain out put cost=0.00..6788.24 rows=30001 width=4 > > > I have one more table where i face the similar problem , i have not dump - reloaded > it yet , will post again if i face the problem. Keep in mind that an index scan is very expensive in regards to a single tuple. It has to run through (fetch) the index pages, then fetch the pages from the table. Since the table fetches are random, the harddrive will probably incur a seek for each tuple found in the index. The seeks add up much quicker than a sequential scan (without nearly as many seeks or drive head movements). -- Rod Taylor
On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote: > tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; > NOTICE: --Relation email_bank_mailing_lists-- > NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822. > Total CPU 0.24s/0.04u sec elapsed 0.30 sec. > NOTICE: Analyzing email_bank_mailing_lists > VACUUM I'd suggest running a vacuum full and then running vacuum analyze more often. 44822 unused tuples seems quite excessive to me... Robert Treat