Thread: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
L.S. Something seems wrong here.... when applying arrag_agg() on a large recordset, above a certain size 'things fall over' and memory usage races off until the system runs out of it: # select version(); version ----------------------------------------------------------------------------------- PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.1, 64-bit # \! free -g total used free shared buffers cached Mem: 31 1 30 0 0 0 -/+ buffers/cache: 0 30 Swap: 31 0 31 ==================== Create concatenate function and aggregate: CREATE OR REPLACE FUNCTION comma_cat(text, text) RETURNS text LANGUAGE 'plpgsql' IMMUTABLE STRICT SECURITY INVOKER AS ' BEGIN IF $1 = '''' THEN RETURN $2; ELSIF $2 = '''' THEN RETURN $1; ELSE RETURN $1 || '', '' || $2; END IF; END;'; CREATE AGGREGATE comcat(text) ( SFUNC = comma_cat, STYPE = text ); Activate timing: \timing Timing is on. Create demo data: create temp table f as select id, random() as value from generate_series(1, 1e7::int) as f(id); Time: 7036,917 ms ==================== Don't mind the 'usefulness' of the exact query below, I ran into this issue when experimenting a bit using random(). On my system, using the comcat() aggregate is no problem regardless of the size of the recordset: with g as (select * from f limit 1e5) select comcat(id::text), min(value) from g group by g.value having count(1) > 1; Time: 189,835 ms with g as (select * from f limit 1e6) select comcat(id::text), min(value) from g group by g.value having count(1) > 1; Time: 1815,756 ms with g as (select * from f) select comcat(id::text), min(value) from g group by g.value having count(1) > 1; Time: 18660,326 ms ==================== However, using the array_agg() this breaks (on my system ) on the largest set: with g as (select * from f limit 1e5) select array_agg(id), min(value) from g group by g.value having count(1) > 1; Time: 361,242 ms with g as (select * from f limit 1e6) select array_agg(id), min(value) from g group by g.value having count(1) > 1; Time: 3310,347 ms with g as (select * from f) select array_agg(id), min(value) from g group by g.value having count(1) > 1; Time: <none, fails> => the last query eats up all 32GB main memory in seconds, then starts on the 32GB swap (which obviously takes a bit longer to digest) until eventually the child process gets killed by the oom-killer and postgresql restarts.... -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Valentine Gogichashvili
Date:
Hi Frank, this is a little bit not relevant to the question itself. But to prevent OOM killer from currupting your database please consider this for your production environments: http://www.postgresql.org/docs/9.1/static/kernel-resources.html#LINUX-MEMOR= Y-OVERCOMMIT =E1=83=95=E1=83=90=E1=83=9A=E1=83=94=E1=83=9C=E1=83=A2=E1=83=98=E1=83=9C = =E1=83=92=E1=83=9D=E1=83=92=E1=83=98=E1=83=A9=E1=83=90=E1=83=A8=E1=83=95=E1= =83=98=E1=83=9A=E1=83=98 Valentine Gogichashvili On Sat, Oct 19, 2013 at 3:38 PM, Frank van Vugt <ftm.van.vugt@foxi.nl>wrote= : > L.S. > > Something seems wrong here.... when applying arrag_agg() on a large > recordset, > above a certain size 'things fall over' and memory usage races off until > the > system runs out of it: > > > # select version(); > version > > -------------------------------------------------------------------------= ---------- > PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.= 1, > 64-bit > > > # \! free -g > total used free shared buffers cached > Mem: 31 1 30 0 0 0 > -/+ buffers/cache: 0 30 > Swap: 31 0 31 > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Create concatenate function and aggregate: > > CREATE OR REPLACE FUNCTION comma_cat(text, text) > RETURNS text > LANGUAGE 'plpgsql' > IMMUTABLE > STRICT > SECURITY INVOKER > AS ' > BEGIN > IF $1 =3D '''' THEN > RETURN $2; > ELSIF $2 =3D '''' THEN > RETURN $1; > ELSE > RETURN $1 || '', '' || $2; > END IF; > END;'; > CREATE AGGREGATE comcat(text) ( SFUNC =3D comma_cat, STYPE =3D text ); > > > Activate timing: > > \timing > Timing is on. > > > Create demo data: > > create temp table f as > select id, random() as value > from generate_series(1, 1e7::int) as f(id); > > Time: 7036,917 ms > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Don't mind the 'usefulness' of the exact query below, I ran into this iss= ue > when experimenting a bit using random(). > > > > > On my system, using the comcat() aggregate is no problem regardless of th= e > size of the recordset: > > with g as (select * from f limit 1e5) > select comcat(id::text), min(value) > from g > group by g.value > having count(1) > 1; > Time: 189,835 ms > > > with g as (select * from f limit 1e6) > select comcat(id::text), min(value) > from g > group by g.value > having count(1) > 1; > Time: 1815,756 ms > > > with g as (select * from f) > select comcat(id::text), min(value) > from g > group by g.value > having count(1) > 1; > Time: 18660,326 ms > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > However, using the array_agg() this breaks (on my system ) on the largest > set: > > with g as (select * from f limit 1e5) > select array_agg(id), min(value) > from g > group by g.value > having count(1) > 1; > Time: 361,242 ms > > > with g as (select * from f limit 1e6) > select array_agg(id), min(value) > from g > group by g.value > having count(1) > 1; > Time: 3310,347 ms > > > with g as (select * from f) > select array_agg(id), min(value) > from g > group by g.value > having count(1) > 1; > Time: <none, fails> > > > =3D> the last query eats up all 32GB main memory in seconds, then starts = on > the > 32GB swap (which obviously takes a bit longer to digest) until eventually > the > child process gets killed by the oom-killer and postgresql restarts.... > > > > > > > -- > > Best, > > > > > Frank. > > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs >
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Tomas Vondra
Date:
Hi, On 19.10.2013 21:02, Valentine Gogichashvili wrote: > Hi Frank, > > this is a little bit not relevant to the question itself. But to prevent > OOM killer from currupting your database please consider this for your > production environments: > http://www.postgresql.org/docs/9.1/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT Not sure what's your suggestion? The problem is that the database is actually using the memory, so overcommit is not directly applicable here. He might easily add more swap to "fix" the issue, but it will take ages to actually execute the query because of the overhead. Tomas
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Tomas Vondra
Date:
Hi, On 19.10.2013 15:38, Frank van Vugt wrote: > L.S. > > Something seems wrong here.... when applying arrag_agg() on a large recordset, > above a certain size 'things fall over' and memory usage races off until the > system runs out of it: ... > with g as (select * from f) > select comcat(id::text), min(value) > from g > group by g.value > having count(1) > 1; > Time: 18660,326 ms ... > with g as (select * from f) > select array_agg(id), min(value) > from g > group by g.value > having count(1) > 1; > Time: <none, fails> Hmmmm. I initially thought that this was because of 2x resizing the array in accumArrayResult, but that clearly is not the case (e.g. because the example is constructed so that groups with multiple elements are very unlikely). Then I though that it's because the array is pre-allocated with 64 elements, but that's not an issue either (at least not the main one) because even with 1e7 elements this amounts for about 10M * 8 * 64 = ~5GB and the report says that it consumed ~32GB RAM and swap (not sure how much, but probably not a small amount). On my machine it easily ate 8GB of RAM and 4GB of swap (and then got shot by OOM). Anyway, I disabled the preallocation (i.e. 1 element initially, +1 for each iteration) which should be ~80MB of data, but even then I was unable to execute that query. The effect on smaller queries (say, 1e6 rows) was negligible too - it consumed more or less the same amount of memory, irrespectedly of the preallocation. Then I started to wonder about the memory local context, which is defined like this: arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); which means arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", 0, (8*1024), (8*1024*1024)); I may be wrong, but it seems to me that each group has it's own allocation context and once something gets allocated (which is pretty much granted for each group), it's allocates at least 8kB of memory. With 1e6 groups that's ~8GB (not really far from what I see here), and with 1e7 groups it's ~80GB. Not the most efficient approach for 80MB of values. I tried this: #define ALLOCSET_TINY_MINSIZE 0 #define ALLOCSET_TINY_INITSIZE (1 * 256) #define ALLOCSET_TINY_MAXSIZE (8 * 1024) arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); and after a minor tweak in aset.c, which by default enforces that init size >= 1024: if (initBlockSize < 1024) initBlockSize = 1024; I'm able to process even the 1e7 values, although it still consumes a significant amount of memory. The failing query is slightly artificial, but pretty much any array_agg query with large number of groups is going to fail exactly the same. So if we could improve that somehow, that'd be nice. However the more I think about the local memory context, the more I think the idea to use a dedicated memory context for each group is flawed. I think a single context for the whole aggregate would work much better, and indeed - after replacing the local memory context by rcontext (which is just aggcontext passed from array_agg_transfn) the memory consumption was cut in half. The other thing I think we should reevaluate is the preallocation, i.e. the initial size (currently 64) and if growing to 2x the size is a good idea. Based on experience and measurements with multiple custom aggregates I did in the past few weeks, I think a significantly lower initial size (say 16 or maybe even 8) would work equally well. We're talking about difference in percents vs. higher probability of running out of memory. But let's say it's a minor issue, and by switching to a shared memory context we've already saved ~50% memory for such these cases (single element in a group, 1kB chunk - 64*8B = 512B). Regarding the growth rate - I think once array reachch certain size (e.g. 1024 elements), we should probably decrease the growth rate. For example +128 instead of x2 or something like that. With the current approach, we're pretty much bound to have ~50% overhead because of the doubling. regards Tomas
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Hi Valentine, Op zaterdag 19 oktober 2013 21:02:25 schreef Valentine Gogichashvili: > this is a little bit not relevant to the question itself. But to prevent OOM > killer from currupting your database please consider this for your > production environments: Sure, I understand why you'd want to mention this 'on the side', but I'm aware of OOM-tuning. In production, I use it wherever it's _really_ needed, but mind that the oom-killer in newer kernels is already selecting processes a bit smarter than it used to. In the example I gave, the correct child process was killed. -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Hi, Interesting read so far! Op zondag 20 oktober 2013 04:48:01 schreef Tomas Vondra: > because the example is constructed so that groups with multiple elements > are very unlikely). yep, the original intention of the 'experiment' was to investigate how often random() would 'clash' on various machines with/without hardware number generators, etc > the report says that it consumed ~32GB RAM and swap (not sure how > much, but probably not a small amount). On my machine it easily ate 8GB > of RAM and 4GB of swap (and then got shot by OOM). amount of swap is 32GB as well (see output of 'free' on top of report ;) ) > Anyway, I disabled the preallocation (i.e. 1 element initially, +1 for > each iteration) which should be ~80MB of data, but even then I was > unable to execute that query. > > The effect on smaller queries (say, 1e6 rows) was negligible too - it > consumed more or less the same amount of memory, irrespectedly of the > preallocation. on my setup, running the array_agg() over 1e6 records uses up ~5GB > With 1e6 groups that's ~8GB (not really far from what I see here), and > with 1e7 groups it's ~80GB. Not the most efficient approach for 80MB of > values. exactly, the 'scaling' of memory hunger was not what I'd expect from such a recordset, which made me write up the report > The failing query is slightly artificial, but pretty much any array_agg > query with large number of groups is going to fail exactly the same. So > if we could improve that somehow, that'd be nice. exactly > But let's say it's a minor issue, and by switching to a shared memory > context we've already saved ~50% memory for such these cases (single > element in a group, 1kB chunk - 64*8B = 512B). I never ran into this earlier (and I've been using PostgreSQL for quite some time now) and even now, this is not about a 'production query', so I'm happy with any and all benefits for future versions that come from this ;) Having said that, should you want me to check the effects of some patch, just let me know, I'd be happy to do that. -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
"Tomas Vondra"
Date:
On 20 Říjen 2013, 10:26, Frank van Vugt wrote: > > I never ran into this earlier (and I've been using PostgreSQL for quite > some > time now) and even now, this is not about a 'production query', so I'm > happy > with any and all benefits for future versions that come from this ;) > > Having said that, should you want me to check the effects of some patch, > just > let me know, I'd be happy to do that. Attached is a quick patch removing the local memory context and using aggcontext instead. I've also tuned down the preallocation. It's against 9.2 stable, so it should apply fine against your 9.2.4. But be careful, it's not really tested. Tomas
Attachment
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Valentine Gogichashvili
Date:
Hi Frank, On Sun, Oct 20, 2013 at 10:26 AM, Frank van Vugt <ftm.van.vugt@foxi.nl>wrote: > Sure, I understand why you'd want to mention this 'on the side', but I'm > aware > of OOM-tuning. In production, I use it wherever it's _really_ needed, but > mind > that the oom-killer in newer kernels is already selecting processes a bit > smarter than it used to. In the example I gave, the correct child process > was > killed. > > I personally prefer to see ERROR: out of memory, coming from Postgres, then OOM killing processes, that it thinks are to be killed :) Apropos, how does the string_agg aggregate behave in your example? Will be nice to know without experimenting myself and looking into the sources :) Regards, -- Valentine
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Stephen Frost
Date:
* Frank van Vugt (ftm.van.vugt@foxi.nl) wrote: > Op zaterdag 19 oktober 2013 21:02:25 schreef Valentine Gogichashvili: > > this is a little bit not relevant to the question itself. But to preven= t OOM > > killer from currupting your database please consider this for your > > production environments: >=20 > Sure, I understand why you'd want to mention this 'on the side', but I'm = aware=20 > of OOM-tuning. In production, I use it wherever it's _really_ needed, but= mind=20 > that the oom-killer in newer kernels is already selecting processes a bit= =20 > smarter than it used to. In the example I gave, the correct child process= was=20 > killed. The correct child being killed doesn't actually mean that it's a *good idea* to kill off PG child processes, in general, particularly in the way that the OOM killer goes about it (kill -9). If it was possible to tune the OOM killer to use a different signal, which would allow PG to actually clean things up, it *might* be reasonable to allow it, but I still wouldn't recommend it. In production, for my part, proper memory accounting (and disabling of OOM) should *always* be used. Thanks, Stephen
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Hi Valentine, Op zondag 20 oktober 2013 14:06:19 schreef Valentine Gogichashvili: Apropos, how does the string_agg aggregate behave in your example? Will be nice to know without experimenting myself and looking into the sources :) Interesting.... On the 1e7 dataset, our custom comcat() function runs for ~18.5 seconds and eats up to ~6,5 GB of memory, while the string_agg() on the same dataset runs for ~23.5 seconds and eats up to ~16 GB of memory. -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Hi Stephen, Op maandag 21 oktober 2013 10:47:26 schreef Stephen Frost: > > In production, I use it wherever it's _really_ needed, but mind > > that the oom-killer in newer kernels is already selecting processes > > a bit smarter than it used to. In the example I gave, > > the correct child process was killed. > The correct child being killed doesn't actually mean that it's a *good > idea* to kill off PG child processes, in general, particularly in the > way that the OOM killer goes about it (kill -9). If it was possible to > tune the OOM killer to use a different signal, which would allow PG to > actually clean things up, it *might* be reasonable to allow it, but I > still wouldn't recommend it. > > In production, for my part, proper memory accounting (and disabling of > OOM) should *always* be used. I did not describe this the way I should have, I meant to say that I do exactly that: disable the OOM-killer _always_ in a production situation since there it is _really_ needed. I noticed on the development machine where I did the testing that the correct child process was being killed, which used to not be the case. Thanks for emphasizing this, though. -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Hi, Op zondag 20 oktober 2013 12:57:43 schreef Tomas Vondra: > Attached is a quick patch removing the local memory context and using > aggcontext instead. I've also tuned down the preallocation. It's against > 9.2 stable, so it should apply fine against your 9.2.4. But be careful, > it's not really tested. I applied the patch and ran my queries again with the following results: > with g as (select * from f limit 1e5) > select array_agg(id), min(value) used to be: Time: 361,242 ms now: Time: 363,767 ms > with g as (select * from f limit 1e6) > select array_agg(id), min(value) used to be: Time: 3310,347 ms now: Time: 2134,688 ms > with g as (select * from f limit 1e7) > select array_agg(id), min(value) used to be: Time: <none, fails> now: Time: 23234,045 ms The last query now uses up ~3.5GB of memory. and as for the comcat() / string_agg() comparison: > with g as (select * from f limit 1e7) > select comcat(id::text), min(value) used to be / still is: Time: ~18.5 seconds Mem: ~6,5 GB > with g as (select * from f limit 1e7) > select string_agg(id::text, ', '), min(value) used to be: Time: ~28.5 seconds Mem: ~16 GB now: Time: ~28.5 seconds Mem: ~12 GB So, the patch seems to have the desired effect ;) It looks like this didn't go into git yet. Will it be in v9.2.6/v9.3.1? Since it's still a 'rough' patch, I reversed it on our development server for now. -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Op zondag 20 oktober 2013 12:57:43 schreef Tomas Vondra: >> Attached is a quick patch removing the local memory context and using >> aggcontext instead. > It looks like this didn't go into git yet. Will it be in v9.2.6/v9.3.1? Certainly not. We can consider it for 9.4, if Tomas submits it to the commitfest process, but it needs review and testing. In particular I'm skeptical that it probably outright breaks some cases (look at makeMdArrayResult) and results in substantial added bloat, not savings, in others. regards, tom lane
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
Frank van Vugt
Date:
Op donderdag 31 oktober 2013 10:05:16 schreef Tom Lane: > > It looks like this didn't go into git yet. Will it be in v9.2.6/v9.3.1? > > Certainly not. We can consider it for 9.4, if Tomas submits it to the > commitfest process, but it needs review and testing. In particular > I'm skeptical that it probably outright breaks some cases (look at > makeMdArrayResult) and results in substantial added bloat, not savings, Clear, please mind the fact that Tomas did point out to me that it was an untested patch ;) However, although the example with which I ran into this issue was a bit contrived, the current behaviour of the array functions in terms of memory usage does seem to be a bit greedy. So at least it _looks_ like a good idea to want to do something about it, since triggering this issue in the real world doesn't seem that far-fetched... Tomas...., are you planning to (submit a patch)? -- Best, Frank.
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
From
"Tomas Vondra"
Date:
On 31 ÅÃjen 2013, 15:05, Tom Lane wrote: > Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >> Op zondag 20 oktober 2013 12:57:43 schreef Tomas Vondra: >>> Attached is a quick patch removing the local memory context and using >>> aggcontext instead. > >> It looks like this didn't go into git yet. Will it be in v9.2.6/v9.3.1? > > Certainly not. We can consider it for 9.4, if Tomas submits it to the > commitfest process, but it needs review and testing. In particular > I'm skeptical that it probably outright breaks some cases (look at > makeMdArrayResult) and results in substantial added bloat, not savings, > in others. Yes, I'll submit it to the next commitfest. And you're right that the makeMdArrayResult is faulty - it shouldn't really do the MemoryContextDelete as I reused the context of the aggregation function. Actually, I'm quite surprised it did not fail, even though my tests were only really sketchy. As for the 9.2 / 9.3 branches, I think we should discuss whether the current state is actually correct. I'm inclined to say that a code that manages to crash a server with 64GB of virtual memory, although with very light tweaking can happily complete with ~3.5GB of RAM, is not exactly right. It seems to me that the code somehow assumes the accumulated arrays are going to be quite large - in that case preallocating > 1kB per group is probably fine. I'm fine with "reasonable" preallocation, but 1kB seems way too aggressive to me. But it's not possible to use smaller preallocation because this is determined by the block size check in aset.c. So I see three options for <9.4 versions: (a) keeping the current implementation, that crashes with cases like this (and I don't really like this option, because the code seems broken to me) (b) keep the local memory context but tune it somehow - this however means changes in aset.c, which however impacts significant part of the codebase (not sure how much internal / external code relies on this behavior) (c) remove the local memory context (I'm really wondering what other benefit it might have in this particular case) and maybe tune the other parameters (e.g. the initial size / growth rate of the array etc.). I'd certainly vote for (c) in this case. Tomas