Re: BUG #6763: Severe memory leak with arrays and hstore - Mailing list pgsql-bugs
| From | luben karavelov |
|---|---|
| Subject | Re: BUG #6763: Severe memory leak with arrays and hstore |
| Date | |
| Msg-id | E067C89B-B29E-4464-9CE9-5EEC78764610@mail.bg Whole thread Raw |
| In response to | Re: BUG #6763: Severe memory leak with arrays and hstore (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: BUG #6763: Severe memory leak with arrays and hstore
|
| List | pgsql-bugs |
On Jul 27, 2012, at 8:47 AM, Tom Lane wrote:
> Craig Ringer <ringerc@ringerc.id.au> writes:
>> OK, it's certainly leaking, but not in the same drastic way I was able=
=20
>> to reproduce manually a couple of times earlier. Self-contained test=20
>> case attached.
>=20
> Using HEAD with stock parameters, I don't see any significant change in
> allocated address space (VSZ): it sits right around 170MB. The reported
> resident set size (RSS) starts from very little and rises to about
> 140MB, but I think that's just an artifact of the process touching more
> and more of the shared-buffers array as it runs. The actual backend
> memory consumption seems to be just a few meg.
>=20
> I can get it to blow out memory if I set work_mem large enough to
> persuade the planner to use hash aggregation (whereupon it tries to run
> all the array_agg aggregates in parallel). However, that requires
> work_mem set to a couple of GB, and I don't think it's really a bug when
> the backend goes ahead and uses a couple of GB after I told it it could.
>=20
> It's possible that the OP's problem boiled down to the planner making
> a drastic underestimate of the number of GROUP BY groups, which could
> mislead it into applying hash aggregation when there's not room; or
> if the space used per aggregate was a lot more than the 8K that the
> planner assumes when dealing with array_agg. But neither of those
> errors seems to be happening in this example case.
>=20
> regards, tom lane
It's good that the bug is not in HEAD. I was testing on 9.1.4. Definitely t=
he size of RSS is not just references to shared buffers because they are 1.=
8G and the backend RSS got to 4G. My setting for work_mem is 64M, so it's q=
uite conservative - the server was tuned for max concurrency, not for max t=
hroughput per single query.
Here is the plan of the insert:
=3D> explain INSERT INTO new_preferences SELECT user_id,hstore(array_agg(na=
me), array_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_i=
d;
QUERY PLAN=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
-------------------------------
Insert on new_preferences (cost=3D65615.89..65617.73 rows=3D67 width=3D36)
-> HashAggregate (cost=3D65615.89..65617.06 rows=3D67 width=3D68)
-> Bitmap Heap Scan on old_prefs (cost=3D17645.25..56555.65 rows=
=3D1208032 width=3D68)
Recheck Cond: (user_id < 200000)
-> Bitmap Index Scan on old_prefs_user_id_ids (cost=3D0.00=
..17343.24 rows=3D1208032 width=3D0)
Index Cond: (user_id < 200000)
(6 rows)
So, it is using hash aggregate as you have suggested. I have tried the quer=
y with disabled hash aggregate and it consumes a lot less memory - single q=
uery to migrate the whole table finishes with 900M RSS.
After "ANALYZE old_prefs" the planner chooses GroupAggregate instead of Has=
hAggregate - you were right about missing statistics of old_prefs.
Thank you for figuring out this case
Best regards
--
luben karavelov
pgsql-bugs by date: