Re: BUG #6763: Severe memory leak with arrays and hstore - Mailing list pgsql-bugs

From luben karavelov
Subject Re: BUG #6763: Severe memory leak with arrays and hstore
Date
Msg-id 10398EEF-03DE-405E-B276-06F73ABF9E3A@mail.bg
Whole thread Raw
In response to Re: BUG #6763: Severe memory leak with arrays and hstore  (Craig Ringer <ringerc@ringerc.id.au>)
Responses Re: BUG #6763: Severe memory leak with arrays and hstore  (Craig Ringer <ringerc@ringerc.id.au>)
Re: BUG #6763: Severe memory leak with arrays and hstore  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-bugs
On Jul 26, 2012, at 11:17 AM, Craig Ringer wrote:

> On 07/26/2012 09:32 AM, karavelov@mail.bg wrote:
>> Finally I have managed to migrate it in batches of 100-200k user ids and
>> disconnecting after each query in order to free the backend and leaked
>> memory.
> If you do it in batches, but you do NOT disconnect and reconnect, does th=
e backend continue to grow?
>=20
> What's the output of:
>=20
> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM (
> SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) A=
S sub;
>=20
> and
>=20
> SELECT pg_size_pretty(pg_total_relation_size('old_prefs'));
>=20
> ?
>=20
> --
> Craig Ringer
>=20

-
Sorry for the broken formatting in the last message. Here it is again:

Ok, I will do the procedure again with taking notes on each step.

First, here are the results of the queries you asked:

pg=3D> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FRO=
M (
          SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY us=
er_id)  AS sub;

  count  |  to_char=20=20
---------+-----------
 1257262 |      2.26
(1 row)

pg=3D> SELECT pg_size_pretty(pg_total_relation_size('old_prefs'));
 pg_size_pretty=20
----------------
 264 MB
(1 row)

pg=3D> \d old_prefs
        Table "public.old_prefs"
 Column  |       Type        | Modifiers=20
---------+-------------------+-----------
 user_id | integer           | not null
 name    | character varying | not null
 value   | character varying | not null
Indexes:
    "old_prefs_user_id_ids" btree (user_id)

Also there are max of 34 rows per user_id in old_prefs

Here is the new table I just created:

pg=3D> \d new_preferences
Table "public.new_preferences"
 Column  |  Type   | Modifiers=20
---------+---------+-----------
 user_id | integer | not null
 prefs   | hstore  |=20
Indexes:
    "new_preferences_pkey" PRIMARY KEY, btree (user_id)
Foreign-key constraints:
    "new_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(u=
ser_id) ON DELETE CASCADE


Here is the newly connected backend:

root@pg:/var/log# ps axu | egrep '10.0.2.71|USER' | grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  0.0  0.0 2266944 3448 ?        Ss   15:23   0:00 postgres: =
pg pg 10.0.2.71(51734) idle=20=20=20=20=20=20=20=20=20=20=20


Migrating the first 200k of the users to the new scheme:

pg=3D> select count(*) from old_prefs where user_id<200000;
 count=20=20
--------
 174767
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_id;
INSERT 0 48993
pg=3D> commit;
COMMIT

Here is the backend:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  0.8  7.1 3081772 582712 ?      Ss   15:23   0:02 postgres: =
pg pg 10.0.2.71(51734) idle=20=20=20=20=20=20=20=20=20=20=20=20

Migrating another batch of users:

pg =3D> select count(*) from old_prefs where user_id>=3D200000 and user_id<=
600000;
 count=20=20
--------
 193824
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D200000 AND user_id<600000 =
GROUP BY user_id;
INSERT 0 54157
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  1.1  8.5 3176164 697444 ?      Ss   15:23   0:05 postgres: =
pg pg 10.0.2.71(51734) idle

Another batch:
pg=3D> select count(*) from old_prefs where user_id>=3D600000 and user_id<1=
100000;
 count=20=20
--------
 190504
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D600000 AND user_id<1100000=
 GROUP BY user_id;
INSERT 0 56199
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  0.7  9.6 3210224 791404 ?      Ss   15:23   0:08 postgres: =
pg pg 10.0.2.71(51734) idle

Another batch:

pg=3D> select count(*) from old_prefs where user_id>=3D1100000 and user_id<=
1600000;
 count=20=20
--------
 194965
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D1100000 AND user_id<160000=
0 GROUP BY user_id;
INSERT 0 60257
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  0.9 10.8 3277412 889860 ?      Ss   15:23   0:11 postgres: =
pg pg 10.0.2.71(51734) idle


So Pg backeng keep growing with 100M per 200k row from old table that becam=
e 50-60k rows in the new table
Proceeding with another batch:

pg=3D> select count(*) from old_prefs where user_id>=3D1600000 and user_id<=
2400000;
 count=20=20
--------
 170858
(1 row)

Time: 83,994 ms
pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D1600000 AND user_id<240000=
0 GROUP BY user_id;
INSERT 0 55447
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  0.9 11.5 3277412 945560 ?      Ss   15:23   0:15 postgres: =
pg pg 10.0.2.71(51734) idle

Another batch:

pg=3D> select count(*) from old_prefs where user_id>=3D2400000 and user_id<=
3400000;
 count=20=20
--------
 200614
(1 row)

Time: 83,409 ms
pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D2400000 AND user_id<340000=
0 GROUP BY user_id;
INSERT 0 87940
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  1.2 16.2 3736968 1331796 ?     Ss   15:23   0:20 postgres: =
pg pg 10.0.2.71(51734) idle

Another batch:

pg =3D> select count(*) from old_prefs where user_id>=3D3400000 and user_id=
<3800000;
 count=20=20
--------
 161390
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id<380000=
0 GROUP BY user_id;
ERROR:  insert or update on table "new_preferences" violates foreign key co=
nstraint "new_preferences_user_id_fkey"
DETAIL:  Key (user_id)=3D(3615131) is not present in table "users".
pg=3D> rollback;
ROLLBACK

Ops.. have to cleanup the old_prefs, some users were deleted in the meantim=
e:
pg=3D> delete from old_prefs where user_id not in (select user_id from user=
s);
DELETE 7
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  1.4 26.4 4469520 2157588 ?     Ss   15:23   0:29 postgres: =
pg pg 10.0.2.71(51734) idle

Near 1G grow on rolled back transaction....

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id<380000=
0 GROUP BY user_id;
INSERT 0 131803
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  1.7 26.6 4479944 2180536 ?     Ss   15:23   0:35 postgres: =
pg pg 10.0.2.71(51734) idle

Another batch, bigger this time:

pg=3D> select count(*) from old_prefs where user_id>=3D3800000 and user_id<=
4200000;
 count=20=20
--------
 327374
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D3800000 AND user_id<420000=
0 GROUP BY user_id;
INSERT 0 177044
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  1.9 33.1 5238968 2710756 ?     Ss   15:23   0:45 postgres: =
pg pg 10.0.2.71(51734) idle

Another big batch:

pg=3D> select count(*) from old_prefs where user_id>=3D4200000 and user_id<=
4400000;
 count=20=20
--------
 375352
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D4200000 AND user_id<440000=
0 GROUP BY user_id;
INSERT 0 189095
pg=3D> commit;
COMMIT

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  2.2 35.7 5438412 2918720 ?     Ss   15:23   0:55 postgres: =
pg pg 10.0.2.71(51734) idle

Now a smaller batch:

pg=3D> select count(*) from old_prefs where user_id>=3D4400000 and user_id<=
4500000;
 count=20=20
--------
 219249
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D4400000 AND user_id<450000=
0 GROUP BY user_id;
INSERT 0 99782
pg=3D> commit;
COMMIT

RSS keeps growing:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  2.2 36.0 5438412 2943784 ?     Ss   15:23   1:00 postgres: =
pg pg 10.0.2.71(51734) idle

Lets see if a bigger batch will pass:

pg=3D> select count(*) from old_prefs where user_id>=3D4500000;
 count=20=20
--------
 631911
(1 row)

pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a=
rray_agg(value)) FROM old_prefs WHERE user_id>=3D4500000  GROUP BY user_id;
INSERT 0 296541
pg=3D> commit;
COMMIT

Ok, this time it passed, but the backend is over 4G
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 19121  2.2 50.0 7227968 4088928 ?     Ss   15:23   1:17 postgres: =
pg pg 10.0.2.71(51734) idle

Some observations:

1. Backend does not free allocated memory between transactions.
2. Rolled back transactions also leak memory.
3. Leaked memory is not linear to work done - 2 transactions with 200k keys=
 will leak less than 1 transaction with 400k keys


Regarding Tom's question:
The old_prefs does not fit in work_mem but is quite small regarding the tot=
al RAM. Isn't the "work_mem" a limit of the memory each backend could alloc=
ate for=20
sorting, grouping and aggregation? My understanding is that bigger allocati=
on will overflow to disk and will not kill the server. I could be wrong tho=
ugh.

Thanks in advance and best regards

--
Luben Karavelov

pgsql-bugs by date:

Previous
From: karavelov@mail.bg
Date:
Subject: Re: BUG #6763: Severe memory leak with arrays and hstore
Next
From: Jez Wain
Date:
Subject: Re: BUG #6758: ./configure script sets HAVE_WCSTOMBS_L 1