Re: PATCH: decreasing memory needlessly consumed by array_agg - Mailing list pgsql-hackers

From Ali Akbar
Subject Re: PATCH: decreasing memory needlessly consumed by array_agg
Date
Msg-id CACQjQLq523T8NDG0fT+inxjOLGTKidJRsF78+kxUiBCQt8rFxQ@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: decreasing memory needlessly consumed by array_agg  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: PATCH: decreasing memory needlessly consumed by array_agg  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: PATCH: decreasing memory needlessly consumed by array_agg  (Ali Akbar <the.apaan@gmail.com>)
List pgsql-hackers
Another positive benefit is that this won't break the code unless it
uses the new API. This is a problem especially with external code (e.g.
extensions), but the new API (initArray*) is not part of 9.4 so there's
no such code. So that's nice.

The one annoying thing is that this makes the API slighly unbalanced.
With the new API you can use a shared memory context, which with the old
one (not using the initArray* methods) you can't.

But I'm OK with that, and it makes the patch smaller (15kB -> 11kB).

Yes, with this API, we can backpatch this patch to 9.4 (or below) if we need it there.

I think this API is a good compromise of old API and new API. Ideally if we can migrate all code to new API (all code must call initArrayResult* before accumArrayResult*), we can remove parameter MemoryContext rcontext from accumArrayResult. Currently, the code isn't using the rcontext for anything except for old API calls (in first call to accumArrayResult).

2014-12-21 20:38 GMT+07:00 Tomas Vondra <tv@fuzzy.cz>:
On 21.12.2014 02:54, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>> Attached is v5 of the patch, fixing an error with releasing a shared
>> memory context (invalid flag values in a few calls).
>
> The functions that gain a new argument should get their comment updated,
> to explain what the new argument is for.

Right. I've added a short description of the 'subcontext' parameter to
all three variations of the initArray* function, and a more thorough
explanation to initArrayResult().


With this API, i think we should make it clear if we call initArrayResult with subcontext=false, we can't call makeArrayResult, but we must use makeMdArrayResult directly.

Or better, we can modify makeArrayResult to release according to astate->private_cxt:
@@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
    dims[0] = astate->nelems;
    lbs[0] = 1;
 
-   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
+   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, astate->private_cxt);



Or else we implement what you suggest below (more comments below):

Thinking about the 'release' flag a bit more - maybe we could do this
instead:

    if (release && astate->private_cxt)
        MemoryContextDelete(astate->mcontext);
    else if (release)
    {
        pfree(astate->dvalues);
        pfree(astate->dnulls);
        pfree(astate);
    }

i.e. either destroy the whole context if possible, and just free the
memory when using a shared memory context. But I'm afraid this would
penalize the shared memory context, because that's intended for cases
where all the build states coexist in parallel and then at some point
are all converted into a result and thrown away. Adding pfree() calls is
no improvement here, and just wastes cycles.

As per Tom's comment, i'm using "parent memory context" instead of "shared memory context" below.

In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context.

In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all.

Regards,
--
Ali Akbar

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: parallel mode and parallel contexts
Next
From: Michael Paquier
Date:
Subject: Re: btree_gin and ranges