Re: PATCH: decreasing memory needlessly consumed by array_agg - Mailing list pgsql-hackers
From | Ali Akbar |
---|---|
Subject | Re: PATCH: decreasing memory needlessly consumed by array_agg |
Date | |
Msg-id | CACQjQLrsbbN=MQxOAAnS1k10gkK3H-EH3ERHELkWmqp-dh2dNA@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: decreasing memory needlessly consumed by array_agg (Ali Akbar <the.apaan@gmail.com>) |
Responses |
Re: PATCH: decreasing memory needlessly consumed by array_agg
|
List | pgsql-hackers |
2014-12-16 11:01 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
Looks like with master, postgres resort to disk, but with the patch it fits in memory.2014-12-16 10:47 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:2014-12-16 6:27 GMT+07:00 Tomas Vondra <tv@fuzzy.cz>:Just fast-viewing the patch.
The patch is not implementing the checking for not creating new context in initArrayResultArr. I think we should implement it also there for consistency (and preventing future problems).
Testing the performance with your query, looks promising: speedup is between 12% ~ 15%.
Because i'm using 32-bit systems, setting work_mem to 1024GB failed:
Because i'm using 32-bit systems, setting work_mem to 1024GB failed:
ERROR: 1073741824 is outside the valid range for parameter "work_mem" (64 .. 2097151)
STATEMENT: SET work_mem = '1024GB';
psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20: ERROR: 1073741824 is outside the valid range for parameter "work_mem" (64 .. 2097151)
Maybe because of that, in the large groups a test, the speedup is awesome:
master: 16,819 ms
with patch: 1,720 ms
Note: I hasn't tested the large dataset.
As expected, testing array_agg(anyarray), the performance is still the same, because the subcontext hasn't implemented there (test script modified from Tomas', attached).
I implemented the subcontext checking in initArrayResultArr by changing the v3 patch like this:
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext,
bool subcontext)
{
ArrayBuildStateArr *astate;
- MemoryContext arr_context;
+ MemoryContext arr_context = rcontext; /* by default use the parent ctx */
/* Make a temporary context to hold all the junk */
- arr_context = AllocSetContextCreate(rcontext,
+ if (subcontext)
+ arr_context = AllocSetContextCreate(rcontext,
"accumArrayResultArr",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
Testing the performance, it got the 12%~15% speedup. Good. (patch attached)
Looking at the modification in accumArrayResult* functions, i don't really comfortable with:Just an idea: why don't we minimize the change in API like this:
- Code that calls accumArrayResult* after explicitly calling initArrayResult* must always passing subcontext, but it has no effect.
- All existing codes that calls accumArrayResult must be changed.
Other codes that calls directly to accumArrayResult can only be changed in the call to makeArrayResult* (with release=true parameter). In places that we don't want to create subcontext (as in array_agg_transfn), modify it to use initArrayResult* before calling accumArrayResult*.
- Adding parameter bool subcontext, only in initArrayResult* functions but not in accumArrayResult*
- Code that want to not creating subcontext must calls initArrayResult* explicitly.
What do you think?
As per your concern about calling initArrayResult* with subcontext=false, while makeArrayResult* with release=true:
Also, it seems that using 'subcontext=false' and then 'release=true'
would be a bug. Maybe it would be appropriate to store the
'subcontext' value into the ArrayBuildState and then throw an error
if makeArrayResult* is called with (release=true && subcontext=false).
Yes, i think we should do that to minimize unexpected coding errors. In makeArrayResult*, i think its better to not throwing an error, but using assertions:
Assert(release == false || astate->subcontext == true);
Regards,
--
Ali Akbar
Attachment
pgsql-hackers by date: