Re: BUG #6763: Severe memory leak with arrays and hstore - Mailing list pgsql-bugs
From | luben karavelov |
---|---|
Subject | Re: BUG #6763: Severe memory leak with arrays and hstore |
Date | |
Msg-id | E067C89B-B29E-4464-9CE9-5EEC78764610@mail.bg Whole thread Raw |
In response to | Re: BUG #6763: Severe memory leak with arrays and hstore (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #6763: Severe memory leak with arrays and hstore
|
List | pgsql-bugs |
On Jul 27, 2012, at 8:47 AM, Tom Lane wrote: > Craig Ringer <ringerc@ringerc.id.au> writes: >> OK, it's certainly leaking, but not in the same drastic way I was able= =20 >> to reproduce manually a couple of times earlier. Self-contained test=20 >> case attached. >=20 > Using HEAD with stock parameters, I don't see any significant change in > allocated address space (VSZ): it sits right around 170MB. The reported > resident set size (RSS) starts from very little and rises to about > 140MB, but I think that's just an artifact of the process touching more > and more of the shared-buffers array as it runs. The actual backend > memory consumption seems to be just a few meg. >=20 > I can get it to blow out memory if I set work_mem large enough to > persuade the planner to use hash aggregation (whereupon it tries to run > all the array_agg aggregates in parallel). However, that requires > work_mem set to a couple of GB, and I don't think it's really a bug when > the backend goes ahead and uses a couple of GB after I told it it could. >=20 > It's possible that the OP's problem boiled down to the planner making > a drastic underestimate of the number of GROUP BY groups, which could > mislead it into applying hash aggregation when there's not room; or > if the space used per aggregate was a lot more than the 8K that the > planner assumes when dealing with array_agg. But neither of those > errors seems to be happening in this example case. >=20 > regards, tom lane It's good that the bug is not in HEAD. I was testing on 9.1.4. Definitely t= he size of RSS is not just references to shared buffers because they are 1.= 8G and the backend RSS got to 4G. My setting for work_mem is 64M, so it's q= uite conservative - the server was tuned for max concurrency, not for max t= hroughput per single query. Here is the plan of the insert: =3D> explain INSERT INTO new_preferences SELECT user_id,hstore(array_agg(na= me), array_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_i= d; QUERY PLAN=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 ---------------------------------------------------------------------------= ------------------------------- Insert on new_preferences (cost=3D65615.89..65617.73 rows=3D67 width=3D36) -> HashAggregate (cost=3D65615.89..65617.06 rows=3D67 width=3D68) -> Bitmap Heap Scan on old_prefs (cost=3D17645.25..56555.65 rows= =3D1208032 width=3D68) Recheck Cond: (user_id < 200000) -> Bitmap Index Scan on old_prefs_user_id_ids (cost=3D0.00= ..17343.24 rows=3D1208032 width=3D0) Index Cond: (user_id < 200000) (6 rows) So, it is using hash aggregate as you have suggested. I have tried the quer= y with disabled hash aggregate and it consumes a lot less memory - single q= uery to migrate the whole table finishes with 900M RSS. After "ANALYZE old_prefs" the planner chooses GroupAggregate instead of Has= hAggregate - you were right about missing statistics of old_prefs. Thank you for figuring out this case Best regards -- luben karavelov
pgsql-bugs by date: