Memory leak due to thinko in PL/Python error handling - Mailing list pgsql-hackers

From Tom Lane
Subject Memory leak due to thinko in PL/Python error handling
Date
Msg-id 1203918.1761184159@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I have realized that there's an oversight in my recent commit
c6f7f11d8, which so far from preventing edge-case leaks as intended,
actually introduces a memory leak in the normal error-catching path.
The leakage is easy to see if you extract the
catch_python_unique_violation() test case from plpython_error.sql
and run it in a loop, like

pl_regression=# do $$                                  
begin
for i in 1..1000000 loop
perform catch_python_unique_violation();
end loop; end$$;

The backend's VIRT size as reported by top(1) will grow steadily.

The problem is that I did not think through the fact that
PyObject_GetAttrString() acquires a refcount on the returned object,
so that after advancing to the next frame object with

         tb = PyObject_GetAttrString(tb, "tb_next");

we have an extra refcount on the new "tb" object, and there's no logic
that will get rid of that.  The loop that I introduced with an eye to
cleaning things up:

        /* Must release all the objects in the traceback stack */
        while (tb != NULL && tb != Py_None)
        {
            PyObject   *tb_prev = tb;

            tb = PyObject_GetAttrString(tb, "tb_next");
            Py_DECREF(tb_prev);
        }

isn't right either, since it likewise doesn't account for the
extra refcount added by PyObject_GetAttrString.

The correct fix, I believe, is as attached.  If we avoid collecting
extra refcounts during PLy_traceback(), then PLy_elog_impl() can go
back to simply doing Py_XDECREF(tb) on the first frame.  Any
additional frames will go away when the previous frame object is
cleaned up and drops its refcount.

I also added a couple of comments explaining why PLy_elog_impl()
doesn't try to free the strings acquired from PLy_get_spi_error_data()
or PLy_get_error_data().  That's because I got here by looking at a
Coverity complaint about how those strings might get leaked.  They
are not leaked, but in testing that I discovered this other leak.

            regards, tom lane

diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index a7c664f1ca1..e109888cced 100644
--- a/src/pl/plpython/plpy_elog.c
+++ b/src/pl/plpython/plpy_elog.c
@@ -143,14 +143,7 @@ PLy_elog_impl(int elevel, const char *fmt,...)
     {
         Py_XDECREF(exc);
         Py_XDECREF(val);
-        /* Must release all the objects in the traceback stack */
-        while (tb != NULL && tb != Py_None)
-        {
-            PyObject   *tb_prev = tb;
-
-            tb = PyObject_GetAttrString(tb, "tb_next");
-            Py_DECREF(tb_prev);
-        }
+        Py_XDECREF(tb);
         /* For neatness' sake, also release our string buffers */
         if (fmt)
             pfree(emsg.data);
@@ -343,6 +336,17 @@ PLy_traceback(PyObject *e, PyObject *v, PyObject *tb,
         tb = PyObject_GetAttrString(tb, "tb_next");
         if (tb == NULL)
             elog(ERROR, "could not traverse Python traceback");
+
+        /*
+         * Release the refcount that PyObject_GetAttrString acquired on the
+         * next frame object.  We don't need it, because our caller has a
+         * refcount on the first frame object and the frame objects each have
+         * a refcount on the next one.  If we tried to hold this refcount
+         * longer, it would greatly complicate cleanup in the event of a
+         * failure in the above PG_TRY block.
+         */
+        Py_DECREF(tb);
+
         (*tb_depth)++;
     }

@@ -376,6 +380,10 @@ PLy_get_sqlerrcode(PyObject *exc, int *sqlerrcode)

 /*
  * Extract the error data from a SPIError
+ *
+ * Note: the returned string values are pointers into the given PyObject.
+ * They must not be free()'d, and are not guaranteed to be valid once
+ * we stop holding a reference on the PyObject.
  */
 static void
 PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
@@ -412,6 +420,11 @@ PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
  *
  * Note: position and query attributes are never set for Error so, unlike
  * PLy_get_spi_error_data, this function doesn't return them.
+ *
+ * Note: the returned string values are palloc'd in the current context.
+ * While our caller could pfree them later, there's no real need to do so,
+ * and it would be complicated to handle both this convention and that of
+ * PLy_get_spi_error_data.
  */
 static void
 PLy_get_error_data(PyObject *exc, int *sqlerrcode, char **detail, char **hint,

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Include extension path on pg_available_extensions
Next
From: Tatsuo Ishii
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options