Thread: [PATCH] nodeindexscan with reorder memory leak
Hey!
I was investigating a leak reported in the PostGIS issues tracker [1] which led me to the Postgres side where the problem really is. The leak is reproducible with query from original ticket [1]:
[1] https://trac.osgeo.org/postgis/ticket/4720
I was investigating a leak reported in the PostGIS issues tracker [1] which led me to the Postgres side where the problem really is. The leak is reproducible with query from original ticket [1]:
WITH latitudes AS ( SELECT generate_series AS latitude FROM generate_series(-90, 90, 0.1) ), longitudes AS ( SELECT generate_series AS longitude FROM generate_series(-180, 180, 0.1) ), points AS ( SELECT ST_SetSRID(ST_Point(longitude, latitude), 4326)::geography AS geog FROM latitudes CROSS JOIN longitudes ) SELECT geog, ( SELECT name FROM ne_110m_admin_0_countries AS ne ORDER BY p.geog <-> ne.geog LIMIT 1 ) FROM points AS p ;The leak is only noticeable when index scan with reorder happens as part of subquery plan which is explained by the fact that heap tuples cloned in reorderqueue_push are not freed during flush of reorder queue in ExecReScanIndex.
[1] https://trac.osgeo.org/postgis/ticket/4720
Attachment
Aliaksandr Kalenik <akalenik@kontur.io> writes: > The leak is only noticeable when index scan with reorder happens as part of > subquery plan which is explained by the fact that heap tuples cloned in > reorderqueue_push are not freed during flush of reorder queue in > ExecReScanIndex. Hmm ... I see from the code coverage report[1] that that part of ExecReScanIndexScan isn't even reached by our existing regression tests. Seems bad :-( regards, tom lane [1] https://coverage.postgresql.org/src/backend/executor/nodeIndexscan.c.gcov.html#577
Aliaksandr Kalenik <akalenik@kontur.io> writes: > I was investigating a leak reported in the PostGIS issues tracker [1] which > led me to the Postgres side where the problem really is. The leak is > reproducible with query from original ticket [1]: > ... > The leak is only noticeable when index scan with reorder happens as part of > subquery plan which is explained by the fact that heap tuples cloned in > reorderqueue_push are not freed during flush of reorder queue in > ExecReScanIndex. Actually, that code has got worse problems than that. I tried to improve our regression tests to exercise that code path, as attached. What I got was +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x; +ERROR: index returned tuples in wrong order (The error doesn't always appear depending on what generate_series parameters you use, but it seems to show up consistently with a step of 1 and a limit of 1000 or more.) Fixing this is well beyond my knowledge of that code, so I'm punting it to the original authors. regards, tom lane diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index 8b353be16e..0b60caabe4 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -255,6 +255,10 @@ EXPLAIN (COSTS OFF) SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10; SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10; +EXPLAIN (COSTS OFF) +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> point(x,x) LIMIT 1) as c FROM generate_series(0,1000,1)x; +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> point(x,x) LIMIT 1) as c FROM generate_series(0,1000,1)x; + -- Now check the results from bitmap indexscan SET enable_seqscan = OFF; SET enable_indexscan = OFF;
Thanks for your review! On Sun, Jan 30, 2022 at 7:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Actually, that code has got worse problems than that. I tried to improve > our regression tests to exercise that code path, as attached. What I got > was > > +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p > oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x; > +ERROR: index returned tuples in wrong order I tried to figure out what is the problem with this query. This error happens when actual distance is less than estimated distance. For this specific query it happened while comparing these values: 50.263279680219099532223481219262 (actual distance returned by dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266 (bounding box distance returned by computeDistance in gistproc.c). So for me it looks like this error is not really related to KNN scan code but to some floating-arithmetic issue in distance calculation functions for geometry type.Would be great to figure out a fix but for now I didn’t manage to find a better way than comparing the difference of distance with FLT_EPSILON which definitely doesn't seem like the way to fix :(
Hi! On Mon, Feb 7, 2022 at 11:42 AM Aliaksandr Kalenik <akalenik@kontur.io> wrote: > Thanks for your review! > > On Sun, Jan 30, 2022 at 7:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Actually, that code has got worse problems than that. I tried to improve > > our regression tests to exercise that code path, as attached. What I got > > was > > > > +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p > > oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x; > > +ERROR: index returned tuples in wrong order > > I tried to figure out what is the problem with this query. This error > happens when actual distance is less than estimated distance. > For this specific query it happened while comparing these values: > 50.263279680219099532223481219262 (actual distance returned by > dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266 > (bounding box distance returned by computeDistance in gistproc.c). > > So for me it looks like this error is not really related to KNN scan > code but to some floating-arithmetic issue in distance calculation > functions for geometry type.Would be great to figure out a fix but > for now I didn’t manage to find a better way than comparing the > difference of distance with FLT_EPSILON which definitely doesn't seem > like the way to fix :( Probably, this is caused by some compiler optimization. Could you re-check the issue with different compilers and optimization levels? Regarding the memory leak, could you add a corresponding regression test to the patch (probably similar to Tom's query upthread)? ------ Regards, Alexander Korotkov
On Mon, Feb 7, 2022 at 11:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Regarding the memory leak, could you add a corresponding regression > test to the patch (probably similar to Tom's query upthread)? Yes, added similar to Tom's query but with polygons instead of circles.
Attachment
On Thu, Feb 10, 2022 at 1:19 AM Aliaksandr Kalenik <akalenik@kontur.io> wrote: > On Mon, Feb 7, 2022 at 11:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Regarding the memory leak, could you add a corresponding regression > > test to the patch (probably similar to Tom's query upthread)? > > Yes, added similar to Tom's query but with polygons instead of circles. I've rechecked that the now patched code branch is covered by regression tests. I think the memory leak issue is independent of the computational errors we've observed. So, I'm going to push and backpatch this if no objections. ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > I've rechecked that the now patched code branch is covered by > regression tests. I think the memory leak issue is independent of the > computational errors we've observed. > So, I'm going to push and backpatch this if no objections. +1. We should work on the roundoff-error issue as well, but as you say, it's an independent problem. regards, tom lane
On Thu, Feb 10, 2022 at 2:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: > > I've rechecked that the now patched code branch is covered by > > regression tests. I think the memory leak issue is independent of the > > computational errors we've observed. > > So, I'm going to push and backpatch this if no objections. > > +1. We should work on the roundoff-error issue as well, but > as you say, it's an independent problem. Pushed, thank you. ------ Regards, Alexander Korotkov