Memory leak due to thinko in PL/Python error handling - Mailing list pgsql-hackers
| From | Tom Lane |
|---|---|
| Subject | Memory leak due to thinko in PL/Python error handling |
| Date | |
| Msg-id | 1203918.1761184159@sss.pgh.pa.us Whole thread Raw |
| List | pgsql-hackers |
I have realized that there's an oversight in my recent commit
c6f7f11d8, which so far from preventing edge-case leaks as intended,
actually introduces a memory leak in the normal error-catching path.
The leakage is easy to see if you extract the
catch_python_unique_violation() test case from plpython_error.sql
and run it in a loop, like
pl_regression=# do $$
begin
for i in 1..1000000 loop
perform catch_python_unique_violation();
end loop; end$$;
The backend's VIRT size as reported by top(1) will grow steadily.
The problem is that I did not think through the fact that
PyObject_GetAttrString() acquires a refcount on the returned object,
so that after advancing to the next frame object with
tb = PyObject_GetAttrString(tb, "tb_next");
we have an extra refcount on the new "tb" object, and there's no logic
that will get rid of that. The loop that I introduced with an eye to
cleaning things up:
/* Must release all the objects in the traceback stack */
while (tb != NULL && tb != Py_None)
{
PyObject *tb_prev = tb;
tb = PyObject_GetAttrString(tb, "tb_next");
Py_DECREF(tb_prev);
}
isn't right either, since it likewise doesn't account for the
extra refcount added by PyObject_GetAttrString.
The correct fix, I believe, is as attached. If we avoid collecting
extra refcounts during PLy_traceback(), then PLy_elog_impl() can go
back to simply doing Py_XDECREF(tb) on the first frame. Any
additional frames will go away when the previous frame object is
cleaned up and drops its refcount.
I also added a couple of comments explaining why PLy_elog_impl()
doesn't try to free the strings acquired from PLy_get_spi_error_data()
or PLy_get_error_data(). That's because I got here by looking at a
Coverity complaint about how those strings might get leaked. They
are not leaked, but in testing that I discovered this other leak.
regards, tom lane
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index a7c664f1ca1..e109888cced 100644
--- a/src/pl/plpython/plpy_elog.c
+++ b/src/pl/plpython/plpy_elog.c
@@ -143,14 +143,7 @@ PLy_elog_impl(int elevel, const char *fmt,...)
{
Py_XDECREF(exc);
Py_XDECREF(val);
- /* Must release all the objects in the traceback stack */
- while (tb != NULL && tb != Py_None)
- {
- PyObject *tb_prev = tb;
-
- tb = PyObject_GetAttrString(tb, "tb_next");
- Py_DECREF(tb_prev);
- }
+ Py_XDECREF(tb);
/* For neatness' sake, also release our string buffers */
if (fmt)
pfree(emsg.data);
@@ -343,6 +336,17 @@ PLy_traceback(PyObject *e, PyObject *v, PyObject *tb,
tb = PyObject_GetAttrString(tb, "tb_next");
if (tb == NULL)
elog(ERROR, "could not traverse Python traceback");
+
+ /*
+ * Release the refcount that PyObject_GetAttrString acquired on the
+ * next frame object. We don't need it, because our caller has a
+ * refcount on the first frame object and the frame objects each have
+ * a refcount on the next one. If we tried to hold this refcount
+ * longer, it would greatly complicate cleanup in the event of a
+ * failure in the above PG_TRY block.
+ */
+ Py_DECREF(tb);
+
(*tb_depth)++;
}
@@ -376,6 +380,10 @@ PLy_get_sqlerrcode(PyObject *exc, int *sqlerrcode)
/*
* Extract the error data from a SPIError
+ *
+ * Note: the returned string values are pointers into the given PyObject.
+ * They must not be free()'d, and are not guaranteed to be valid once
+ * we stop holding a reference on the PyObject.
*/
static void
PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
@@ -412,6 +420,11 @@ PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
*
* Note: position and query attributes are never set for Error so, unlike
* PLy_get_spi_error_data, this function doesn't return them.
+ *
+ * Note: the returned string values are palloc'd in the current context.
+ * While our caller could pfree them later, there's no real need to do so,
+ * and it would be complicated to handle both this convention and that of
+ * PLy_get_spi_error_data.
*/
static void
PLy_get_error_data(PyObject *exc, int *sqlerrcode, char **detail, char **hint,
pgsql-hackers by date: