Re: PATCH: decreasing memory needlessly consumed by array_agg - Mailing list pgsql-hackers

From Ali Akbar
Subject Re: PATCH: decreasing memory needlessly consumed by array_agg
Date
Msg-id CACQjQLoYZeBBH3noRoRxafoMbxXpreBCe6JiyYN6iLTSm3Ep2A@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: decreasing memory needlessly consumed by array_agg  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: PATCH: decreasing memory needlessly consumed by array_agg
List pgsql-hackers
In the CF, the status becomes "Needs Review". Let's continue our discussion of makeArrayResult* behavior if subcontext=false and release=true (more below):
2014-12-22 8:08 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

With this API, i think we should make it clear if we call initArrayResult with subcontext=false, we can't call makeArrayResult, but we must use makeMdArrayResult directly.

Or better, we can modify makeArrayResult to release according to astate->private_cxt:
@@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
    dims[0] = astate->nelems;
    lbs[0] = 1;
 
-   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
+   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, astate->private_cxt);

Or else we implement what you suggest below (more comments below):

Thinking about the 'release' flag a bit more - maybe we could do this
instead:

    if (release && astate->private_cxt)
        MemoryContextDelete(astate->mcontext);
    else if (release)
    {
        pfree(astate->dvalues);
        pfree(astate->dnulls);
        pfree(astate);
    }

i.e. either destroy the whole context if possible, and just free the
memory when using a shared memory context. But I'm afraid this would
penalize the shared memory context, because that's intended for cases
where all the build states coexist in parallel and then at some point
are all converted into a result and thrown away. Adding pfree() calls is
no improvement here, and just wastes cycles.

As per Tom's comment, i'm using "parent memory context" instead of "shared memory context" below.

In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context.

In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all.

As for the v6 patch:
- the patch applies cleanly to master
- make check is successfull
- memory benefit is still there
- performance benefit i think is negligible

Reviewing the code, found this:
@@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
         elog(ERROR, "array_agg_array_transfn called in non-aggregate context");
     }
 
-    state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
+    if (PG_ARGISNULL(0))
+    {
+        Oid            element_type = get_element_type(arg1_typeid);
+
+        if (!OidIsValid(element_type))
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("data type %s is not an array type",
+                            format_type_be(arg1_typeid))));

digging more, it looks like those code required because accumArrayResultArr checks the element type:
        /* First time through --- initialize */
        Oid            element_type = get_element_type(array_type);

        if (!OidIsValid(element_type))
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("data type %s is not an array type",
                            format_type_be(array_type))));
        astate = initArrayResultArr(array_type, element_type, rcontext, true);

I think it should be in initArrayResultArr, because it is an initialization-only check, shouldn't it?

Regards,
--
Ali Akbar

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fillfactor for GIN indexes
Next
From: Ashutosh Bapat
Date:
Subject: Re: Transactions involving multiple postgres foreign servers