Thread: out of memory
Good morning, I've increased sort_mem until 2Go !! and the error "out of memory" appears again. Here the request I try to pass with her explain plan, Nested Loop (cost=2451676.23..2454714.73 rows=1001 width=34) -> Subquery Scan "day" (cost=2451676.23..2451688.73 rows=1000 width=16) -> Limit (cost=2451676.23..2451678.73 rows=1000 width=12) -> Sort (cost=2451676.23..2451684.63 rows=3357 width=12) Sort Key: sum(occurence) -> HashAggregate (cost=2451471.24..2451479.63 rows=3357 width=12) -> Index Scan using test_date on queries_detail_statistics (cost=0.00..2449570.55 rows=380138 width=12) Index Cond: ((date >= '2006-01-01'::date) AND (date <= '2006-01-30'::date)) Filter: (((portal)::text = '1'::text) OR ((portal)::text = '2'::text)) -> Index Scan using query_string_pkey on query_string (cost=0.00..3.01 rows=1 width=34) Index Cond: ("outer".query = query_string.id) (11 rows) Any new ideas ?, thanks MB. > On Tue, 2006-02-14 at 10:32, martial.bizel@free.fr wrote: > > command explain analyze crash with the "out of memory" error > > > > I precise that I've tried a lot of values from parameters shared_buffer and > > sort_mem > > > > now, in config file, values are : > > sort_mem=32768 > > and shared_buffer=30000 > > OK, on the command line, try increasing the sort_mem until hash_agg can > work. With a 4 gig machine, you should be able to go as high as needed > here, I'd think. Try as high as 500000 or so or more. Then when > explain analyze works, compare the actual versus estimated number of > rows. >
On Wed, 2006-02-15 at 09:55, martial.bizel@free.fr wrote: > Good morning, > > > > > I've increased sort_mem until 2Go !! > and the error "out of memory" appears again. > > Here the request I try to pass with her explain plan, > > Nested Loop (cost=2451676.23..2454714.73 rows=1001 width=34) > -> Subquery Scan "day" (cost=2451676.23..2451688.73 rows=1000 width=16) > -> Limit (cost=2451676.23..2451678.73 rows=1000 width=12) > -> Sort (cost=2451676.23..2451684.63 rows=3357 width=12) > Sort Key: sum(occurence) > -> HashAggregate (cost=2451471.24..2451479.63 rows=3357 > width=12) > -> Index Scan using test_date on > queries_detail_statistics (cost=0.00..2449570.55 rows=380138 width=12) > Index Cond: ((date >= '2006-01-01'::date) AND > (date <= '2006-01-30'::date)) > Filter: (((portal)::text = '1'::text) OR > ((portal)::text = '2'::text)) > -> Index Scan using query_string_pkey on query_string (cost=0.00..3.01 > rows=1 width=34) > Index Cond: ("outer".query = query_string.id) > (11 rows) OK, so it looks like something is horrible wrong here. Try running the explain analyze query after running the following: set enable_hashagg=off; and see what you get then.
Here the result with hashAgg to false : Nested Loop (cost=2487858.08..2490896.58 rows=1001 width=34) (actual time=1028044.781..1030251.260 rows=1000 loops=1) -> Subquery Scan "day" (cost=2487858.08..2487870.58 rows=1000 width=16) (actual time=1027996.748..1028000.969 rows=1000 loops=1) -> Limit (cost=2487858.08..2487860.58 rows=1000 width=12) (actual time=1027996.737..1027999.199 rows=1000 loops=1) -> Sort (cost=2487858.08..2487866.47 rows=3357 width=12) (actual time=1027996.731..1027998.066 rows=1000 loops=1) Sort Key: sum(occurence) -> GroupAggregate (cost=2484802.05..2487661.48 rows=3357 width=12) (actual time=810623.035..914550.262 rows=19422774 loops=1) -> Sort (cost=2484802.05..2485752.39 rows=380138 width=12) (actual time=810612.248..845427.013 rows=36724340 loops=1) Sort Key: query -> Index Scan using test_date on queries_detail_statistics (cost=0.00..2449570.55 rows=380138 width=12) (actual time=25.393..182029.205 rows=36724340 loops=1) Index Cond: ((date >= '2006-01-01'::date) AND (date <= '2006-01-30'::date)) Filter: (((portal)::text = '1'::text) OR ((portal)::text = '2'::text)) -> Index Scan using query_string_pkey on query_string (cost=0.00..3.01 rows=1 width=34) (actual time=2.244..2.246 rows=1 loops=1000) Index Cond: ("outer".query = query_string.id) Total runtime: 1034357.390 ms (14 rows) thanks table daily has 250 millions records and field query (bigint) 2 millions, occurence is int. request with HashAggregate is OK when date is restricted about 15 days like : SELECT query_string, DAY.ocu from search_data.query_string, (SELECT SUM(occurence) as ocu, query FROM daily.queries_detail_statistics WHERE date >= '2006-01-01' AND date <= '2006-01-15' AND portal IN (1,2) GROUP BY query ORDER BY ocu DESC LIMIT 1000) as DAY WHERE DAY.query=id; > On Wed, 2006-02-15 at 09:55, martial.bizel@free.fr wrote: > > Good morning, > > > > > > > > > > I've increased sort_mem until 2Go !! > > and the error "out of memory" appears again. > > > > Here the request I try to pass with her explain plan, > > > > Nested Loop (cost=2451676.23..2454714.73 rows=1001 width=34) > > -> Subquery Scan "day" (cost=2451676.23..2451688.73 rows=1000 > width=16) > > -> Limit (cost=2451676.23..2451678.73 rows=1000 width=12) > > -> Sort (cost=2451676.23..2451684.63 rows=3357 width=12) > > Sort Key: sum(occurence) > > -> HashAggregate (cost=2451471.24..2451479.63 > rows=3357 > > width=12) > > -> Index Scan using test_date on > > queries_detail_statistics (cost=0.00..2449570.55 rows=380138 width=12) > > Index Cond: ((date >= '2006-01-01'::date) > AND > > (date <= '2006-01-30'::date)) > > Filter: (((portal)::text = '1'::text) OR > > ((portal)::text = '2'::text)) > > -> Index Scan using query_string_pkey on query_string (cost=0.00..3.01 > > rows=1 width=34) > > Index Cond: ("outer".query = query_string.id) > > (11 rows) > > OK, so it looks like something is horrible wrong here. Try running the > explain analyze query after running the following: > > set enable_hashagg=off; > > and see what you get then. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Wed, 2006-02-15 at 11:18, martial.bizel@free.fr wrote: > Here the result with hashAgg to false : > Nested Loop (cost=2487858.08..2490896.58 rows=1001 width=34) (actual > time=1028044.781..1030251.260 rows=1000 loops=1) > -> Subquery Scan "day" (cost=2487858.08..2487870.58 rows=1000 width=16) > (actual time=1027996.748..1028000.969 rows=1000 loops=1) > -> Limit (cost=2487858.08..2487860.58 rows=1000 width=12) (actual > time=1027996.737..1027999.199 rows=1000 loops=1) > -> Sort (cost=2487858.08..2487866.47 rows=3357 width=12) > (actual time=1027996.731..1027998.066 rows=1000 loops=1) > Sort Key: sum(occurence) > -> GroupAggregate (cost=2484802.05..2487661.48 rows=3357 > width=12) (actual time=810623.035..914550.262 rows=19422774 loops=1) > -> Sort (cost=2484802.05..2485752.39 rows=380138 > width=12) (actual time=810612.248..845427.013 rows=36724340 loops=1) > Sort Key: query > -> Index Scan using test_date on > queries_detail_statistics (cost=0.00..2449570.55 rows=380138 width=12) (actual > time=25.393..182029.205 rows=36724340 loops=1) > Index Cond: ((date >= '2006-01-01'::date) > AND (date <= '2006-01-30'::date)) > Filter: (((portal)::text = '1'::text) OR > ((portal)::text = '2'::text)) > -> Index Scan using query_string_pkey on query_string (cost=0.00..3.01 > rows=1 width=34) (actual time=2.244..2.246 rows=1 loops=1000) > Index Cond: ("outer".query = query_string.id) > Total runtime: 1034357.390 ms OK, in the index scan using test_date, you get 36724340 when the planner expects 380138. That's off by a factor of about 10, so I'm guessing that your statistics aren't reflecting what's really in your db. You said before you'd run analyze, so I'd try increasing the stats target on that column and rerun analyze to see if things get any better.
Hello, I want to split table partitioned across two servers postgres (two hosts). To query this remote object, I want to make view with union on two servers with two dblink. But, How to be sure that optimizer plan on remote node is same than local node (ie : optimizer scan only the selected partitions and not make full scan of the remote object) ? example : server 1 (table test partionned on field number and 1 < number <10) server 2 (table test partitioned on field number 10 <number <20) server 3 has view like : create view remote_test as select * from dblink('conn_server1', select * from test) as test_server1(....) union select * from dblink('conn_server2', select * from test) as test_server2(....) If I've made select on view remote_test like : select * from remote_test where number<5 and number >15. optimizer made full scan of all partitions on all servers or scan only partition 1 to partition 4 on server1 and scan partiton 16 to partition 19 on server2 and union ? In fact, I don't know how to have explain plan of remote node. Thanks a lot. MB
martial.bizel@free.fr writes: > In fact, I don't know how to have explain plan of remote node. You send it an EXPLAIN. You can *not* use a view defined as you suggest if you want decent performance --- the dblink functions will fetch the entire table contents and the filtering will be done locally. You'll need to pass the WHERE conditions over to the remote servers, which more or less means that you have to give them to the dblink functions as text. regards, tom lane
Selon Tom Lane <tgl@sss.pgh.pa.us>: > martial.bizel@free.fr writes: > > In fact, I don't know how to have explain plan of remote node. > > You send it an EXPLAIN. Please, Could you send me what to put at end of request : select * from dblink('my_connexion', 'EXPLAIN select * from test where number='1' ') as ........ I want to be sure that remote test is seen as partitioned object. thanks a lot. > > You can *not* use a view defined as you suggest if you want decent > performance --- the dblink functions will fetch the entire table > contents and the filtering will be done locally. You'll need to > pass the WHERE conditions over to the remote servers, which more > or less means that you have to give them to the dblink functions > as text. > > regards, tom lane >