Thread: BUG #6763: Severe memory leak with arrays and hstore
The following bug has been logged on the website: Bug reference: 6763 Logged by: Luben Karavelov Email address: karavelov@mail.bg PostgreSQL version: 9.1.4 Operating system: Debian Linux Description:=20=20=20=20=20=20=20=20 I was trying to migrate a big ER table (user preferences) into a new table. Old able: old_prefs ( user_id integer NOT NULL, name varchar NOT NULL, value varchar NOT NULL ); New table: preferences ( user_id integer PRIMARY KEY, prefs hstore; ); The query I have tried to use is: INSERT INTO preferences=20 SELECT user_id,hstore(array_agg(name), array_agg(value))=20 FROM old_prefs=20 GROUP BY user_id; But the postgres backend consumed all the available memory (6G free + 4G swap) and finally was killed by the kernel. Its 8G RAM machine and here are the memory options from the config: shared_buffers =3D 1800MB temp_buffers =3D 16MB work_mem =3D 64MB maintenance_work_mem =3D 256MB max_stack_depth =3D 2MB Finally I have managed to migrate it in batches of 100-200k user ids and disconnecting after each query in order to free the backend and leaked memory. Best regards Luben
karavelov@mail.bg writes: > The query I have tried to use is: > INSERT INTO preferences > SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs > GROUP BY user_id; > But the postgres backend consumed all the available memory (6G free + 4G > swap) and finally was killed by the kernel. Is there any reason to think this is an actual leak, and not just "you were trying to compute an impractically large value"? regards, tom lane
On 07/26/2012 09:32 AM, karavelov@mail.bg wrote: > Finally I have managed to migrate it in batches of 100-200k user ids and > disconnecting after each query in order to free the backend and leaked > memory. If you do it in batches, but you do NOT disconnect and reconnect, does the backend continue to grow? What's the output of: SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM ( SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) AS sub; and SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); ? -- Craig Ringer
----- Craig Ringer (ringerc@ringerc.id.au), =D0=BD=D0=B0 26.07.2012 =D0=B2 = 11:17 ----- > On 07/26/2012 09:32 AM, karavelov@mail.bg wrote: >> Finally I have managed to migrate it in batches of 100-200k user ids and >> disconnecting after each query in order to free the backend and leaked >> memory. > If you do it in batches, but you do NOT disconnect and reconnect, does > the backend continue to grow? > > What's the output of: > > SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM ( > SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) > AS sub; > > and > > SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); > > ? > > -- > Craig Ringer > Ok, I will do the procedure again with taking notes on each step. First, here are the results of the queries you asked: pg=3D> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM ( SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) AS sub; count | to_char ---------+----------- 1257262 | 2.26 (1 row) pg=3D> SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); pg_size_pretty ---------------- 264 MB (1 row) pg=3D> d old_prefs Table "public.old_prefs" Column | Type | Modifiers ---------+-------------------+----------- user_id | integer | not null name | character varying | not null value | character varying | not null Indexes: "old_prefs_user_id_ids" btree (user_id) Also there are max of 34 rows per user_id in old_prefs here is the new table I just created: pg=3D> d new_preferences Table "public.new_preferences" Column | Type | Modifiers ---------+---------+----------- user_id | integer | not null prefs | hstore | Indexes: "new_preferences_pkey" PRIMARY KEY, btree (user_id) Foreign-key constraints: "new_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE Here is a newly connected the backend: root@pg:/var/log# ps axu | egrep '10.0.2.71|USER' | grep -v grep USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.0 0.0 2266944 3448 ? Ss 15:23 0:00 postgres: pg pg 10.0.2.71(51734) idle Migrating the first 200k of the users to the new scheme: pg=3D> select count(*) from old_prefs where user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id commit; COMMIT Here is the backend: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.8 7.1 3081772 582712 ? Ss 15:23 0:02 postgres: pg pg 10.0.2.71(51734) idle Migrating another batch of users: pg =3D> select count(*) from old_prefs where user_id>=3D200000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D200000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.1 8.5 3176164 697444 ? Ss 15:23 0:05 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D600000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D600000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.7 9.6 3210224 791404 ? Ss 15:23 0:08 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D11000= 00 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D1100000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.9 10.8 3277412 889860 ? Ss 15:23 0:11 postgres: pg pg 10.0.2.71(51734) idle So Pg backeng keep growing with 100M per 200k row from old table that became 50-60k rows in the new table Proceeding with another batch: pg=3D> select count(*) from old_prefs where user_id>=3D1600000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D1600000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 0:15 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D2400000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D2400000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.2 16.2 3736968 1331796 ? Ss 15:23 0:20 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg =3D> select count(*) from old_prefs where user_id>=3D3400= 000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id rollback; ROLLBACK Ops.. have to cleanup the old_prefs, some users were deleted in the meantime: pg=3D> delete from old_prefs where user_id not in (select user_id from users); DELETE 7 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: pg pg 10.0.2.71(51734) idle Near 1G grow on rolled back transaction.... pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id commi= t; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: pg pg 10.0.2.71(51734) idle Another batch, bigger this time: pg=3D> select count(= *) from old_prefs where user_id>=3D3800000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D3800000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.9 33.1 5238968 2710756 ? Ss 15:23 0:45 postgres: pg pg 10.0.2.71(51734) idle Another big batch: pg=3D> select count(*) from old_prefs where user_id>=3D4200000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D4200000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 0:55 postgres: pg pg 10.0.2.71(51734) idle Now a smaller batch: pg=3D> sele= ct count(*) from old_prefs where user_id>=3D4400000 and user_id INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D4400000 AND user_id commit; COMMIT RSS keeps growing: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: pg pg 10.0.2.71(51734) idle Lets see if a bigger batch will pass: pg=3D> select count(*) from old_prefs where user_id>=3D4500000; count -------- 631911 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3D4500000 GROUP BY user_id; INSERT 0 296541 pg=3D> commit; COMMIT = Ok, this time it passed, but the backend is over 4G USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 50.0 7227968 4088928 ? Ss 15:23 1:17 postgres: pg pg 10.0.2.71(51734) idle Some observations: 1. Backend does not free allocated memory between transactions. 2. Rolled back transactions also leak memory. 3. Leaked memory is not linear to work done - 2 transactions with 200k keys will leak less than 1 transaction with 400k keys Regarding Tom's question: The old_prefs does not fit in work_mem but is quite small regarding the total RAM. Isn't the "work_mem" a limit of the memory each backend could allocate for sorting, grouping and aggregation? My understanding is that bigger allocation will overflow to disk and will not kill the server. I could be wrong though. Thanks in advance and best regards -- Luben Karavelov=
On Jul 26, 2012, at 11:17 AM, Craig Ringer wrote: > On 07/26/2012 09:32 AM, karavelov@mail.bg wrote: >> Finally I have managed to migrate it in batches of 100-200k user ids and >> disconnecting after each query in order to free the backend and leaked >> memory. > If you do it in batches, but you do NOT disconnect and reconnect, does th= e backend continue to grow? >=20 > What's the output of: >=20 > SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM ( > SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) A= S sub; >=20 > and >=20 > SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); >=20 > ? >=20 > -- > Craig Ringer >=20 - Sorry for the broken formatting in the last message. Here it is again: Ok, I will do the procedure again with taking notes on each step. First, here are the results of the queries you asked: pg=3D> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FRO= M ( SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY us= er_id) AS sub; count | to_char=20=20 ---------+----------- 1257262 | 2.26 (1 row) pg=3D> SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); pg_size_pretty=20 ---------------- 264 MB (1 row) pg=3D> \d old_prefs Table "public.old_prefs" Column | Type | Modifiers=20 ---------+-------------------+----------- user_id | integer | not null name | character varying | not null value | character varying | not null Indexes: "old_prefs_user_id_ids" btree (user_id) Also there are max of 34 rows per user_id in old_prefs Here is the new table I just created: pg=3D> \d new_preferences Table "public.new_preferences" Column | Type | Modifiers=20 ---------+---------+----------- user_id | integer | not null prefs | hstore |=20 Indexes: "new_preferences_pkey" PRIMARY KEY, btree (user_id) Foreign-key constraints: "new_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(u= ser_id) ON DELETE CASCADE Here is the newly connected backend: root@pg:/var/log# ps axu | egrep '10.0.2.71|USER' | grep -v grep USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.0 0.0 2266944 3448 ? Ss 15:23 0:00 postgres: = pg pg 10.0.2.71(51734) idle=20=20=20=20=20=20=20=20=20=20=20 Migrating the first 200k of the users to the new scheme: pg=3D> select count(*) from old_prefs where user_id<200000; count=20=20 -------- 174767 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_id; INSERT 0 48993 pg=3D> commit; COMMIT Here is the backend: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.8 7.1 3081772 582712 ? Ss 15:23 0:02 postgres: = pg pg 10.0.2.71(51734) idle=20=20=20=20=20=20=20=20=20=20=20=20 Migrating another batch of users: pg =3D> select count(*) from old_prefs where user_id>=3D200000 and user_id<= 600000; count=20=20 -------- 193824 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D200000 AND user_id<600000 = GROUP BY user_id; INSERT 0 54157 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.1 8.5 3176164 697444 ? Ss 15:23 0:05 postgres: = pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D600000 and user_id<1= 100000; count=20=20 -------- 190504 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D600000 AND user_id<1100000= GROUP BY user_id; INSERT 0 56199 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.7 9.6 3210224 791404 ? Ss 15:23 0:08 postgres: = pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D1100000 and user_id<= 1600000; count=20=20 -------- 194965 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D1100000 AND user_id<160000= 0 GROUP BY user_id; INSERT 0 60257 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.9 10.8 3277412 889860 ? Ss 15:23 0:11 postgres: = pg pg 10.0.2.71(51734) idle So Pg backeng keep growing with 100M per 200k row from old table that becam= e 50-60k rows in the new table Proceeding with another batch: pg=3D> select count(*) from old_prefs where user_id>=3D1600000 and user_id<= 2400000; count=20=20 -------- 170858 (1 row) Time: 83,994 ms pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D1600000 AND user_id<240000= 0 GROUP BY user_id; INSERT 0 55447 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 0:15 postgres: = pg pg 10.0.2.71(51734) idle Another batch: pg=3D> select count(*) from old_prefs where user_id>=3D2400000 and user_id<= 3400000; count=20=20 -------- 200614 (1 row) Time: 83,409 ms pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D2400000 AND user_id<340000= 0 GROUP BY user_id; INSERT 0 87940 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.2 16.2 3736968 1331796 ? Ss 15:23 0:20 postgres: = pg pg 10.0.2.71(51734) idle Another batch: pg =3D> select count(*) from old_prefs where user_id>=3D3400000 and user_id= <3800000; count=20=20 -------- 161390 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id<380000= 0 GROUP BY user_id; ERROR: insert or update on table "new_preferences" violates foreign key co= nstraint "new_preferences_user_id_fkey" DETAIL: Key (user_id)=3D(3615131) is not present in table "users". pg=3D> rollback; ROLLBACK Ops.. have to cleanup the old_prefs, some users were deleted in the meantim= e: pg=3D> delete from old_prefs where user_id not in (select user_id from user= s); DELETE 7 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: = pg pg 10.0.2.71(51734) idle Near 1G grow on rolled back transaction.... pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D3400000 AND user_id<380000= 0 GROUP BY user_id; INSERT 0 131803 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: = pg pg 10.0.2.71(51734) idle Another batch, bigger this time: pg=3D> select count(*) from old_prefs where user_id>=3D3800000 and user_id<= 4200000; count=20=20 -------- 327374 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D3800000 AND user_id<420000= 0 GROUP BY user_id; INSERT 0 177044 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.9 33.1 5238968 2710756 ? Ss 15:23 0:45 postgres: = pg pg 10.0.2.71(51734) idle Another big batch: pg=3D> select count(*) from old_prefs where user_id>=3D4200000 and user_id<= 4400000; count=20=20 -------- 375352 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D4200000 AND user_id<440000= 0 GROUP BY user_id; INSERT 0 189095 pg=3D> commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 0:55 postgres: = pg pg 10.0.2.71(51734) idle Now a smaller batch: pg=3D> select count(*) from old_prefs where user_id>=3D4400000 and user_id<= 4500000; count=20=20 -------- 219249 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D4400000 AND user_id<450000= 0 GROUP BY user_id; INSERT 0 99782 pg=3D> commit; COMMIT RSS keeps growing: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: = pg pg 10.0.2.71(51734) idle Lets see if a bigger batch will pass: pg=3D> select count(*) from old_prefs where user_id>=3D4500000; count=20=20 -------- 631911 (1 row) pg=3D> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), a= rray_agg(value)) FROM old_prefs WHERE user_id>=3D4500000 GROUP BY user_id; INSERT 0 296541 pg=3D> commit; COMMIT Ok, this time it passed, but the backend is over 4G USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 2.2 50.0 7227968 4088928 ? Ss 15:23 1:17 postgres: = pg pg 10.0.2.71(51734) idle Some observations: 1. Backend does not free allocated memory between transactions. 2. Rolled back transactions also leak memory. 3. Leaked memory is not linear to work done - 2 transactions with 200k keys= will leak less than 1 transaction with 400k keys Regarding Tom's question: The old_prefs does not fit in work_mem but is quite small regarding the tot= al RAM. Isn't the "work_mem" a limit of the memory each backend could alloc= ate for=20 sorting, grouping and aggregation? My understanding is that bigger allocati= on will overflow to disk and will not kill the server. I could be wrong tho= ugh. Thanks in advance and best regards -- Luben Karavelov
Woah. Your email client did something insane, and I cannot read your message. See below: On 07/26/2012 09:37 PM, karavelov@mail.bg wrote: > ----- Craig Ringer (ringerc@ringerc.id.au), на 26.07.2012 в 11:17 ----- >> On 07/26/2012 09:32 AM, karavelov@mail.bg wrote: >> Finally I have > managed to migrate it in batches of 100-200k user ids and >> disconnecting > after each query in order to free the backend and leaked >> memory. > If > you do it in batches, but you do NOT disconnect and reconnect, does > the > backend continue to grow? > > What's the output of: > > SELECT > count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM ( > SELECT > user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) > AS sub; >>> and > > SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); > > > ? > > -- > Craig Ringer > Ok, I will do the procedure again with taking > notes on each step. First, here are the results of the queries you asked: > pg=> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM > ( SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) > AS sub; count | to_char ---------+----------- 1257262 | 2.26 (1 row) pg=> > SELECT pg_size_pretty(pg_total_relation_size('old_prefs')); pg_size_pretty > ---------------- 264 MB (1 row) pg=> d old_prefs Table "public.old_prefs" > Column | Type | Modifiers ---------+-------------------+----------- user_id > | integer | not null name | character varying | not null value | character > varying | not null Indexes: "old_prefs_user_id_ids" btree (user_id) Also > there are max of 34 rows per user_id in old_prefs here is the new table I > just created: pg=> d new_preferences Table "public.new_preferences" Column > | Type | Modifiers ---------+---------+----------- user_id | integer | not > null prefs | hstore | Indexes: "new_preferences_pkey" PRIMARY KEY, btree > (user_id) Foreign-key constraints: "new_preferences_user_id_fkey" FOREIGN > KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE Here is a newly > connected the backend: root@pg:/var/log# ps axu | egrep '10.0.2.71|USER' | > grep -v grep USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 19121 0.0 0.0 2266944 3448 ? Ss 15:23 0:00 postgres: pg pg > 10.0.2.71(51734) idle Migrating the first 200k of the users to the new > scheme: pg=> select count(*) from old_prefs where user_id INSERT INTO > new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs WHERE user_id commit; COMMIT Here is the backend: USER PID > %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.8 7.1 > 3081772 582712 ? Ss 15:23 0:02 postgres: pg pg 10.0.2.71(51734) idle > Migrating another batch of users: pg => select count(*) from old_prefs > where user_id>=200000 and user_id INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=200000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY > STAT START TIME COMMAND postgres 19121 1.1 8.5 3176164 697444 ? Ss 15:23 > 0:05 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=> select > count(*) from old_prefs where user_id>=600000 and user_id INSERT INTO > new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs WHERE user_id>=600000 AND user_id commit; COMMIT USER PID > %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 0.7 9.6 > 3210224 791404 ? Ss 15:23 0:08 postgres: pg pg 10.0.2.71(51734) idle > Another batch: pg=> select count(*) from old_prefs where user_id>=1100000 > and user_id INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=1100000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY > STAT START TIME COMMAND postgres 19121 0.9 10.8 3277412 889860 ? Ss 15:23 > 0:11 postgres: pg pg 10.0.2.71(51734) idle So Pg backeng keep growing with > 100M per 200k row from old table that became 50-60k rows in the new table > Proceeding with another batch: pg=> select count(*) from old_prefs where > user_id>=1600000 and user_id INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=1600000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY > STAT START TIME COMMAND postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 > 0:15 postgres: pg pg 10.0.2.71(51734) idle Another batch: pg=> select > count(*) from old_prefs where user_id>=2400000 and user_id INSERT INTO > new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs WHERE user_id>=2400000 AND user_id commit; COMMIT USER PID > %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.2 16.2 > 3736968 1331796 ? Ss 15:23 0:20 postgres: pg pg 10.0.2.71(51734) idle > Another batch: pg => select count(*) from old_prefs where user_id>=3400000 > and user_id INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=3400000 AND user_id rollback; ROLLBACK Ops.. have to cleanup the > old_prefs, some users were deleted in the meantime: pg=> delete from > old_prefs where user_id not in (select user_id from users); DELETE 7 pg=> > commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: pg pg > 10.0.2.71(51734) idle Near 1G grow on rolled back transaction.... pg=> > INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), > array_agg(value)) FROM old_prefs WHERE user_id>=3400000 AND user_id commit; > COMMIT USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres > 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: pg pg > 10.0.2.71(51734) idle Another batch, bigger this time: pg=> select count(*) > from old_prefs where user_id>=3800000 and user_id INSERT INTO > new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs WHERE user_id>=3800000 AND user_id commit; COMMIT USER PID > %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 19121 1.9 33.1 > 5238968 2710756 ? Ss 15:23 0:45 postgres: pg pg 10.0.2.71(51734) idle > Another big batch: pg=> select count(*) from old_prefs where > user_id>=4200000 and user_id INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=4200000 AND user_id commit; COMMIT USER PID %CPU %MEM VSZ RSS TTY > STAT START TIME COMMAND postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 > 0:55 postgres: pg pg 10.0.2.71(51734) idle Now a smaller batch: pg=> select > count(*) from old_prefs where user_id>=4400000 and user_id INSERT INTO > new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) > FROM old_prefs WHERE user_id>=4400000 AND user_id commit; COMMIT RSS keeps > growing: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres > 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: pg pg > 10.0.2.71(51734) idle Lets see if a bigger batch will pass: pg=> select > count(*) from old_prefs where user_id>=4500000; count -------- 631911 (1 > row) pg=> INSERT INTO new_preferences SELECT > user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE > user_id>=4500000 GROUP BY user_id; INSERT 0 296541 pg=> commit; COMMIT Ok, > this time it passed, but the backend is over 4G USER PID %CPU %MEM VSZ RSS > TTY STAT START TIME COMMAND postgres 19121 2.2 50.0 7227968 4088928 ? Ss > 15:23 1:17 postgres: pg pg 10.0.2.71(51734) idle Some observations: 1. > Backend does not free allocated memory between transactions. 2. Rolled back > transactions also leak memory. 3. Leaked memory is not linear to work done > - 2 transactions with 200k keys will leak less than 1 transaction with 400k > keys Regarding Tom's question: The old_prefs does not fit in work_mem but > is quite small regarding the total RAM. Isn't the "work_mem" a limit of the > memory each backend could allocate for sorting, grouping and aggregation? > My understanding is that bigger allocation will overflow to disk and will > not kill the server. I could be wrong though. Thanks in advance and best > regards -- Luben Karavelov >
Hi all Here's a fully self contained, automated test case that demonstrates the leak. Example output on my system, pasted as quote to stop Thunderbird mangling it: > $ ./hstore-leak-demo.sh > NOTICE: extension "hstore" already exists, skipping > CREATE EXTENSION > DROP TABLE > CREATE TABLE > CREATE TABLE > INSERT 0 100000 > INSERT 0 80175 > INSERT 0 72267 > INSERT 0 50649 > INSERT 0 30430 > avg | max > --------------------+----- > 3.3352100000000000 | 20 > (1 row) > > Backend PID is: 3167 > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 0.0 0.0 504276 4368 ? Ss 10:29 0:00 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 91.0 13.0 2163384 1055652 ? Ss 10:29 0:00 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 77.5 13.2 2163408 1071496 ? Ss 10:29 0:01 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 108 13.4 2163408 1084056 ? Ss 10:29 0:02 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 93.0 13.5 2163408 1092244 ? Ss 10:29 0:02 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 86.0 13.6 2163408 1100444 ? Ss 10:29 0:03 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 101 13.7 2163408 1108704 ? Ss 10:29 0:04 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 93.8 13.8 2163408 1116896 ? Ss 10:29 0:04 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 88.8 13.9 2163408 1125048 ? Ss 10:29 0:05 postgres: craig regress [local] idle > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 3167 99.1 14.0 2163408 1133228 ? Ss 10:29 0:05 postgres: craig regress [local] idle
Attachment
On 07/26/2012 09:55 PM, luben karavelov wrote: > Ok, I will do the procedure again with taking notes on each step. Thankyou for taking the time to do this in detail. > First, here are the results of the queries you asked: > count | to_char > ---------+----------- > 1257262 | 2.26 > pg_size_pretty > ---------------- > 264 MB OK, none of that looks obviously insanely huge to me. That's a lot of hstores, but with your query I wouldn't expect them to all sit around in memory, and nothing individually is particularly huge. > Also there are max of 34 rows per user_id in old_prefs Thanks, I forgot to ask that. Again, nothing particularly big. > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > postgres 19121 0.0 0.0 2266944 3448 ? Ss 15:23 0:00 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 0.8 7.1 3081772 582712 ? Ss 15:23 0:02 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 1.1 8.5 3176164 697444 ? Ss 15:23 0:05 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 0.7 9.6 3210224 791404 ? Ss 15:23 0:08 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 0.9 10.8 3277412 889860 ? Ss 15:23 0:11 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 0:15 postgres: pg pg 10.0.2.71(51734) idle > postgres 19121 1.2 16.2 3736968 1331796 ? Ss 15:23 0:20 postgres: pg pg 10.0.2.71(51734) idle > 1. Backend does not free allocated memory between transactions. > 2. Rolled back transactions also leak memory. > 3. Leaked memory is not linear to work done - 2 transactions with 200k keys will leak less than 1 transaction with 400kkeys Ouch. Sure looks like a leak to me, yeah. Thankyou for taking the time to do this. It's common to see "leaks" reported here that are really just queries that require lots of memory, so the first response tends to be somewhat cautious. This doesn't look like one of those reports. I don't know if I can be much use with the actual tracking down of the leak, but there certainly appears to be one, and you've provided a pretty clear illustration of it. -- Craig Ringer
On 07/27/2012 10:30 AM, Craig Ringer wrote: > Hi all > > Here's a fully self contained, automated test case that demonstrates > the leak. Gah. Except it doesn't now, as shown by the text I pasted. WTF? I was *definitely* seeing this on my system. What's changed? Will follow up.
OK, it's certainly leaking, but not in the same drastic way I was able to reproduce manually a couple of times earlier. Self-contained test case attached. $ bash hstore-leak-demo.sh NOTICE: extension "hstore" already exists, skipping CREATE EXTENSION DROP TABLE CREATE TABLE CREATE TABLE INSERT 0 100000 INSERT 0 80014 INSERT 0 72434 INSERT 0 50340 INSERT 0 30077 avg | max --------------------+----- 3.3286500000000000 | 21 (1 row) Backend PID is: 4823 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Before first postgres 4823 0.0 0.0 504276 4312 ? Ss 11:19 0:00 postgres: craig regress [local] idle Iteration 1 postgres 4823 0.0 0.3 536908 25416 ? Rs 11:19 0:00 postgres: craig regress [local] INSERT Iteration 2 postgres 4823 33.0 13.1 2163384 1056560 ? Rs 11:19 0:00 postgres: craig regress [local] INSERT Iteration 3 postgres 4823 56.0 13.3 2163408 1072300 ? Rs 11:19 0:01 postgres: craig regress [local] INSERT Iteration 4 postgres 4823 58.7 13.4 2163408 1084936 ? Rs 11:19 0:02 postgres: craig regress [local] INSERT Iteration 20 postgres 4823 85.3 14.3 2173776 1156784 ? Rs 11:19 0:13 postgres: craig regress [local] INSERT Iteration 40 postgres 4823 92.0 16.3 2176848 1314700 ? Rs 11:19 0:28 postgres: craig regress [local] INSERT Iteration 60 postgres 4823 94.1 16.4 2173776 1322208 ? Rs 11:19 0:43 postgres: craig regress [local] INSERT Iteration 80 postgres 4823 96.0 16.4 2173776 1323768 ? Rs 11:19 0:58 postgres: craig regress [local] INSERT Iteration 100 postgres 4823 95.7 16.5 2176848 1329880 ? Rs 11:19 1:14 postgres: craig regress [local] INSERT Iteration 120 postgres 4823 97.1 16.4 2176848 1329132 ? Rs 11:19 1:31 postgres: craig regress [local] INSERT Iteration 140 postgres 4823 96.8 16.4 2176848 1329524 ? Rs 11:19 1:48 postgres: craig regress [local] INSERT
Attachment
Craig Ringer <ringerc@ringerc.id.au> writes: > OK, it's certainly leaking, but not in the same drastic way I was able > to reproduce manually a couple of times earlier. Self-contained test > case attached. Using HEAD with stock parameters, I don't see any significant change in allocated address space (VSZ): it sits right around 170MB. The reported resident set size (RSS) starts from very little and rises to about 140MB, but I think that's just an artifact of the process touching more and more of the shared-buffers array as it runs. The actual backend memory consumption seems to be just a few meg. I can get it to blow out memory if I set work_mem large enough to persuade the planner to use hash aggregation (whereupon it tries to run all the array_agg aggregates in parallel). However, that requires work_mem set to a couple of GB, and I don't think it's really a bug when the backend goes ahead and uses a couple of GB after I told it it could. It's possible that the OP's problem boiled down to the planner making a drastic underestimate of the number of GROUP BY groups, which could mislead it into applying hash aggregation when there's not room; or if the space used per aggregate was a lot more than the 8K that the planner assumes when dealing with array_agg. But neither of those errors seems to be happening in this example case. regards, tom lane
On 07/27/2012 01:47 PM, Tom Lane wrote: > Craig Ringer <ringerc@ringerc.id.au> writes: >> OK, it's certainly leaking, but not in the same drastic way I was able >> to reproduce manually a couple of times earlier. Self-contained test >> case attached. > Using HEAD with stock parameters, I don't see any significant change in > allocated address space (VSZ): it sits right around 170MB. The reported > resident set size (RSS) starts from very little and rises to about > 140MB, but I think that's just an artifact of the process touching more > and more of the shared-buffers array as it runs. Gah. I should know better than that. Sorry. This makes me wonder if the "leak-like" pattern I saw earlier was just a similar growth in shared_buffers, and carried on more steeply rather than tapering off because I was working with a smaller data set. -- Craig Ringer
On 07/27/2012 10:31 AM, Craig Ringer wrote: > Ouch. Sure looks like a leak to me, yeah. ... but it turns out I'm not thinking very straight. I forgot to check the size of your `shared_buffers' or `work_mem' and forgot to get you to report `free -m' output after each run to measure _real_ memory use. During testing I did got a backend crash when running an INSERT - which I didn't expect given that I have only 20MB of work_mem and 256MB of shared_buffers. I was surprised by that as I would not have expected that query to require a huge gob of RAM. I didn't dig much further as I'm on a swapless system with overcommit enabled ( 'cos the Java VM does't work with overcommit off ) which isn't exactly a recommended Pg configuration. -- Craig Ringer
On Jul 27, 2012, at 8:47 AM, Tom Lane wrote: > Craig Ringer <ringerc@ringerc.id.au> writes: >> OK, it's certainly leaking, but not in the same drastic way I was able= =20 >> to reproduce manually a couple of times earlier. Self-contained test=20 >> case attached. >=20 > Using HEAD with stock parameters, I don't see any significant change in > allocated address space (VSZ): it sits right around 170MB. The reported > resident set size (RSS) starts from very little and rises to about > 140MB, but I think that's just an artifact of the process touching more > and more of the shared-buffers array as it runs. The actual backend > memory consumption seems to be just a few meg. >=20 > I can get it to blow out memory if I set work_mem large enough to > persuade the planner to use hash aggregation (whereupon it tries to run > all the array_agg aggregates in parallel). However, that requires > work_mem set to a couple of GB, and I don't think it's really a bug when > the backend goes ahead and uses a couple of GB after I told it it could. >=20 > It's possible that the OP's problem boiled down to the planner making > a drastic underestimate of the number of GROUP BY groups, which could > mislead it into applying hash aggregation when there's not room; or > if the space used per aggregate was a lot more than the 8K that the > planner assumes when dealing with array_agg. But neither of those > errors seems to be happening in this example case. >=20 > regards, tom lane It's good that the bug is not in HEAD. I was testing on 9.1.4. Definitely t= he size of RSS is not just references to shared buffers because they are 1.= 8G and the backend RSS got to 4G. My setting for work_mem is 64M, so it's q= uite conservative - the server was tuned for max concurrency, not for max t= hroughput per single query. Here is the plan of the insert: =3D> explain INSERT INTO new_preferences SELECT user_id,hstore(array_agg(na= me), array_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_i= d; QUERY PLAN=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 ---------------------------------------------------------------------------= ------------------------------- Insert on new_preferences (cost=3D65615.89..65617.73 rows=3D67 width=3D36) -> HashAggregate (cost=3D65615.89..65617.06 rows=3D67 width=3D68) -> Bitmap Heap Scan on old_prefs (cost=3D17645.25..56555.65 rows= =3D1208032 width=3D68) Recheck Cond: (user_id < 200000) -> Bitmap Index Scan on old_prefs_user_id_ids (cost=3D0.00= ..17343.24 rows=3D1208032 width=3D0) Index Cond: (user_id < 200000) (6 rows) So, it is using hash aggregate as you have suggested. I have tried the quer= y with disabled hash aggregate and it consumes a lot less memory - single q= uery to migrate the whole table finishes with 900M RSS. After "ANALYZE old_prefs" the planner chooses GroupAggregate instead of Has= hAggregate - you were right about missing statistics of old_prefs. Thank you for figuring out this case Best regards -- luben karavelov
On 07/27/2012 07:52 PM, luben karavelov wrote: > > It's good that the bug is not in HEAD. I was testing on 9.1.4. So was I, and while I thought I'd reproduced it I now suspect I was just seeing shared_buffers touching. Are you able to produce a self-contained SQL test that demonstrates the leak? Does the test program I posted behave differently on your system? -- Craig Ringer