PATCH: decreasing memory needlessly consumed by array_agg - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | PATCH: decreasing memory needlessly consumed by array_agg |
Date | |
Msg-id | 5334D7A5.2000907@fuzzy.cz Whole thread Raw |
Responses |
Re: PATCH: decreasing memory needlessly consumed by array_agg
Re: PATCH: decreasing memory needlessly consumed by array_agg |
List | pgsql-hackers |
Hi, this is a patch for issue reported in October 2013 in pgsql-bugs: http://www.postgresql.org/message-id/3839201.Nfa2RvcheX@techfox.foxi Frank van Vugt reported that a simple query with array_agg() and large number of groups (1e7) fails because of OOM even on machine with 32GB of RAM. So for example doing this: CREATE TABLE test (a INT, b INT); INSERT INTO test SELECT i, i FROM generate_series(1,10000000) s(i); SELECT a, array_agg(b) FROM test GROUP BY a; allocates huge amounts of RAM and easily forces the machine into swapping and eventually gets killed by OOM (on my workstation with 8GB of RAM that happens almost immediately). Upon investigation, it seems caused by a combination of issues: (1) per-group memory contexts - each group state uses a dedicated memory context, which is defined like this (in accumArrayResult): arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); which actually means this arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", 0, (8*1024), (8*1024*1024)); so each group will allocate at least 8kB of memory (of the first palloc call). With 1e7 groups, that's ~80GB of RAM, even if each group contains just 1 item. (2) minimum block size in aset.c - The first idea I got was to decrease the block size in the allocator. So I decreased it to 256B but I was still getting OOM. Then I found that aset.c contains this: if (initBlockSize < 1024) initBlockSize = 1024; so effectively the lowest allowed block size is 1kB. Which means ~10GB of memory for the state data (i.e. not considering overhead of the hash table etc., which is not negligible). Considering we're talking about 1e7 32-bit integers, i.e. 40MB of raw data, that's still pretty excessive (250x more). While I question whether the 1kB minimum block size makes sense, I really think per-group memory contexts don't make much sense here. What is the point of per-group memory contexts? The memory will be allocated when the first row of the group is received, and won't be allocated until the whole result set is processed. At least that's how it works for Hash Aggregate. However that's exactly how it would work with a single memory context, which has the significant benefit that all the groups share the same memory (so the minimum block size is not an issue). That is exactly what the patch aims to do - it removes the per-group memory contexts and reuses the main memory context of the aggregate itself. The patch also does one more thing - it changes how the arrays (in the aggregate state) grow. Originally it worked like this /* initial size */ astate->alen = 64; /* when full, grow exponentially */ if (astate->nelems >= astate->alen) astate->alen *= 2; so the array length would grow like this 64 -> 128 -> 256 -> 512 ... (note we're talking about elements, not bytes, so with with 32-bit integers it's actually 256B -> 512B -> 1024B -> ...). While I do understand the point of this (minimizing palloc overhead), I find this pretty dangerous, especially in case of array_agg(). I've modified the growth like this: /* initial size */ astate->alen = 4; /* when full, grow exponentially */ if (astate->nelems >= astate->alen) astate->alen += 4; I admit that might be a bit too aggressive, and maybe there's a better way to do this - with better balance between safety and speed. I was thinking about something like this: /* initial size */ astate->alen = 4; /* when full, grow exponentially */ if (astate->nelems >= astate->alen) if (astate->alen < 128) astate->alen *= 2; else astate->alen += 128; i.e. initial size with exponential growth, but capped at 128B. regards Tomas
Attachment
pgsql-hackers by date: