Re: PATCH: decreasing memory needlessly consumed by array_agg - Mailing list pgsql-hackers

From Ali Akbar
Subject Re: PATCH: decreasing memory needlessly consumed by array_agg
Date
Msg-id CACQjQLrsbbN=MQxOAAnS1k10gkK3H-EH3ERHELkWmqp-dh2dNA@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: decreasing memory needlessly consumed by array_agg  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: PATCH: decreasing memory needlessly consumed by array_agg  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers

2014-12-16 11:01 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:


2014-12-16 10:47 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

2014-12-16 6:27 GMT+07:00 Tomas Vondra <tv@fuzzy.cz>:
Just fast-viewing the patch.

The patch is not implementing the checking for not creating new context in initArrayResultArr. I think we should implement it also there for consistency (and preventing future problems).

 
Testing the performance with your query, looks promising: speedup is between 12% ~ 15%.

Because i'm using 32-bit systems, setting work_mem to 1024GB failed:
ERROR:  1073741824 is outside the valid range for parameter "work_mem" (64 .. 2097151)
STATEMENT:  SET work_mem = '1024GB';
psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20: ERROR:  1073741824 is outside the valid range for parameter "work_mem" (64 .. 2097151)

Maybe because of that, in the large groups a test, the speedup is awesome:
master: 16,819 ms
with patch: 1,720 ms
Looks like with master, postgres resort to disk, but with the patch it fits in memory.

Note: I hasn't tested the large dataset.

As expected, testing array_agg(anyarray), the performance is still the same, because the subcontext hasn't implemented there (test script modified from Tomas', attached).

I implemented the subcontext checking in initArrayResultArr by changing the v3 patch like this:
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext,
                   bool subcontext)
 {
    ArrayBuildStateArr *astate;
-   MemoryContext arr_context;
+   MemoryContext arr_context = rcontext;   /* by default use the parent ctx */
 
    /* Make a temporary context to hold all the junk */
-   arr_context = AllocSetContextCreate(rcontext,
+   if (subcontext)
+       arr_context = AllocSetContextCreate(rcontext,
                                        "accumArrayResultArr",
                                        ALLOCSET_DEFAULT_MINSIZE,
                                        ALLOCSET_DEFAULT_INITSIZE,

Testing the performance, it got the 12%~15% speedup. Good. (patch attached)


Looking at the modification in accumArrayResult* functions, i don't really comfortable with:
  1. Code that calls accumArrayResult* after explicitly calling initArrayResult* must always passing subcontext, but it has no effect.
  2. All existing codes that calls accumArrayResult must be changed.
Just an idea: why don't we minimize the change in API like this:
  1. Adding parameter bool subcontext, only in initArrayResult* functions but not in accumArrayResult*
  2. Code that want to not creating subcontext must calls initArrayResult* explicitly.
Other codes that calls directly to accumArrayResult can only be changed in the call to makeArrayResult* (with release=true parameter). In places that we don't want to create subcontext (as in array_agg_transfn), modify it to use initArrayResult* before calling accumArrayResult*.

What do you think?


As per your concern about calling initArrayResult* with subcontext=false, while makeArrayResult* with release=true:
  Also, it seems that using 'subcontext=false' and then 'release=true'
  would be a bug. Maybe it would be appropriate to store the
  'subcontext' value into the ArrayBuildState and then throw an error
  if makeArrayResult* is called with (release=true && subcontext=false).

Yes, i think we should do that to minimize unexpected coding errors. In makeArrayResult*, i think its better to not throwing an error, but using assertions:
Assert(release == false || astate->subcontext == true);

Regards,
--
Ali Akbar
Attachment

pgsql-hackers by date:

Previous
From: Christian Ullrich
Date:
Subject: Re: New Python vs. old PG on raccoon and jaguarundi
Next
From: Martijn van Oosterhout
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}