Thread: BUG #8410: out of binary heap slots
The following bug has been logged on the website: Bug reference: 8410 Logged by: Terje Elde Email address: terje@elde.net PostgreSQL version: Unsupported/Unknown Operating system: FreeBSD Description: Running: PostgreSQL 9.3beta1 on amd64-portbld-freebsd9.1, compiled by cc (GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit I'm getting "out of binary heap slots", which offcourse spoils the fun of the query. I'm having trouble reproducing it, as I'm only seeing the issue in about 1 in 20 000 queries. I can get the error, turn right around and run the same again manually, and it'll run just fine. I'd love to see if I can reproduce in on 9.3rc1, but seeing it is rare enough as it is. Slightly anonymised from the logs: ERROR: out of binary heap slots CONTEXT: PL/pgSQL function foo(integer,bigint,character[],timestamp without time zone,integer,timestamp without time zone,integer,timestamp without time zone,integer,inet,character varying) line 233 at FETCH STATEMENT: SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o FROM foo( 100, 2221::bigint, ARRAY['m', 'f', '', ' ', NULL]::char[], '2013-07-05T19:11:41.958154'::timestamp, 30::int, NULL::timestamp, 10::int, '2013-08-30T19:11:41.958168'::timestamp::timestamp, 100::int, '123.123.123.123'::inet, 'FOO/1.1 CFNetwork/609.1.4 Darwin/13.0.0'::varchar ) The FETCH-line is running off of a cursor, with a query going pretty much like this: OPEN curs FOR SELECT * FROM ( SELECT 'n'::char AS noo, p.id, p.pub, p.details, i.hash AS foo, p.bam, p.goo, e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS likes, p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE barzz = p.id ) AS tags FROM p, ps, i, e, s WHERE s.tihi = i_cid AND s.v = True AND p.pub = s.id AND p.id = ps.id AND i.id = p.foo AND e.id = p.bar AND i.baz IS NOT NULL AND p.bam = ANY( i_bam ) AND endt > NOW() AND startt < NOW() + INTERVAL '15 minutes' AND p.startt > i_newerthan ORDER BY p.startt DESC LIMIT i_maxnew FOR UPDATE OF ps ) AS newer_than UNION SELECT * FROM ( SELECT 'u'::char AS noo, p.id, p.pub, p.details, i.hash AS foo, p.bam, p.goo, e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS likes, p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE bazz = p.id ) AS tags FROM p, ps, i, e, s WHERE s.tihi = i_cid AND s.v = True AND p.pub = s.id AND p.id = ps.id AND i.id = p.foo AND e.id = p.bar AND i.baz IS NOT NULL AND p.bam = ANY( i_bam ) AND endt > NOW() AND startt < NOW() + INTERVAL '15 minutes' AND startt > i_oldest AND p.last_change > i_newerthan ORDER BY p.last_change DESC LIMIT i_maxupdates FOR UPDATE OF ps ) AS updated; Yeah, I know. Mangled/obfuscated bits in bug-reports are no fun, but the code isn't mine to put in a public list. If you'd like to take a look, I can't imagine it'd be much of a problem for me to mail you the proper query (and whole function) off-list. This is not a crisis for us, and I've been meaning to clean up or rewrite the query anyway, but I figured I should fire off an email anyway, just in case there's a bug in PostgreSQL that hasn't been caught since beta1. (again, sorry about the old version). Explain analyze gives this plan (again anonymized a bit, but can send proper off-list): QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=423.84..424.15 rows=31 width=223) (actual time=0.668..0.675 rows=30 loops=1) -> Append (cost=35.50..422.67 rows=31 width=223) (actual time=0.123..0.608 rows=30 loops=1) -> Subquery Scan on apples (cost=35.50..412.18 rows=30 width=223) (actual time=0.123..0.600 rows=30 loops=1) -> Limit (cost=35.50..411.88 rows=30 width=217) (actual time=0.122..0.591 rows=30 loops=1) -> LockRows (cost=35.50..2494.50 rows=196 width=217) (actual time=0.121..0.584 rows=30 loops=1) -> Nested Loop (cost=35.50..2492.54 rows=196 width=217) (actual time=0.113..0.544 rows=30 loops=1) -> Nested Loop (cost=35.23..410.51 rows=196 width=207) (actual time=0.085..0.343 rows=30 loops=1) -> Nested Loop (cost=34.95..319.23 rows=202 width=164) (actual time=0.078..0.264 rows=32 loops=1) -> Nested Loop (cost=34.81..263.23 rows=211 width=147) (actual time=0.073..0.203 rows=32 loops=1) -> Merge Append (cost=34.66..172.40 rows=354 width=141) (actual time=0.067..0.126 rows=32 loops=1) Sort Key: p.startt -> Sort (cost=0.01..0.02 rows=1 width=669) (actual time=0.016..0.016 rows=0 loops=1) Sort Key: p.startt Sort Method: quicksort Memory: 25kB -> Seq Scan on cars p (cost=0.00..0.00 rows=1 width=669) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval))) -> Index Scan Backward using cars_startt on cars_2013 p_1 (cost=0.28..129.49 rows=351 width=136) (actual time=0.043..0.096 rows=32 loops=1) Index Cond: ((startt < (now() + '00:15:00'::interval)) AND (startt > '2013-07-05 19:11:41.958154'::timestamp without time zone)) Filter: ((endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[]))) -> Sort (cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0 loops=1) Sort Key: p_2.startt Sort Method: quicksort Memory: 25kB -> Seq Scan on cars_2014 p_2 (cost=0.00..13.44 rows=1 width=669) (actual time=0.000..0.000 rows=0 loops=1) Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval))) -> Sort (cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0 loops=1) Sort Key: p_3.startt Sort Method: quicksort Memory: 25kB -> Seq Scan on cars_2015 p_3 (cost=0.00..13.44 rows=1 width=669) (actual time=0.000..0.000 rows=0 loops=1) Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval))) -> Index Scan using oranges_pkey on oranges s (cost=0.14..0.25 rows=1 width=10) (actual time=0.002..0.002 rows=1 loops=32) Index Cond: (id = p.agent) Filter: (visible AND (carpool = 100)) -> Index Scan using roadworkers_pkey on roadworkers e (cost=0.14..0.26 rows=1 width=25) (actual time=0.001..0.001 rows=1 loops=32) Index Cond: (id = p.employee) -> Index Scan using bikes_pkey on bikes i (cost=0.28..0.44 rows=1 width=55) (actual time=0.002..0.002 rows=1 loops=32) Index Cond: (id = p.bike) Filter: (source IS NOT NULL) Rows Removed by Filter: 0 -> Index Scan using cars_stats_pkey on cars_stats ps (cost=0.28..0.48 rows=1 width=18) (actual time=0.003..0.003 rows=1 loops=30) Index Cond: (id = p.id) SubPlan 2 -> Index Only Scan using fruit_tags_pkey on fruit_tags fruit_tags_1 (cost=0.28..10.14 rows=3 width=4) (actual time=0.001..0.002 rows=2 loops=30) Index Cond: (fruit = p.id) Heap Fetches: 60 -> Subquery Scan on updated (cost=10.16..10.18 rows=1 width=221) (actual time=0.005..0.005 rows=0 loops=1) -> Limit (cost=10.16..10.17 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1) -> LockRows (cost=10.16..10.17 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1) -> Sort (cost=10.16..10.16 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1) Sort Key: last_change Sort Method: quicksort Memory: 25kB -> Result (cost=0.00..10.15 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1) One-Time Filter: false SubPlan 1 -> Index Only Scan using fruit_tags_pkey on fruit_tags (cost=0.28..10.14 rows=3 width=4) (never executed) Index Cond: (fruit = id) Heap Fetches: 0 Total runtime: 0.885 ms (57 rows) Terje
On Aug 30, 2013, at 10:46 PM, terje@elde.net wrote: > The following bug has been logged on the website: That didn't work out too well, rendering-wize. Sending the explain analyze by email as well, hoping it'll be more = readable: = QUERY PLAN = = =20 = --------------------------------------------------------------------------= --------------------------------------------------------------------------= --------------------------------------------------------------------------= ------------------------------------------------- HashAggregate (cost=3D423.84..424.15 rows=3D31 width=3D223) (actual = time=3D0.668..0.675 rows=3D30 loops=3D1) -> Append (cost=3D35.50..422.67 rows=3D31 width=3D223) (actual = time=3D0.123..0.608 rows=3D30 loops=3D1) -> Subquery Scan on apples (cost=3D35.50..412.18 rows=3D30 = width=3D223) (actual time=3D0.123..0.600 rows=3D30 loops=3D1) -> Limit (cost=3D35.50..411.88 rows=3D30 width=3D217) = (actual time=3D0.122..0.591 rows=3D30 loops=3D1) -> LockRows (cost=3D35.50..2494.50 rows=3D196 = width=3D217) (actual time=3D0.121..0.584 rows=3D30 loops=3D1) -> Nested Loop (cost=3D35.50..2492.54 = rows=3D196 width=3D217) (actual time=3D0.113..0.544 rows=3D30 loops=3D1) -> Nested Loop (cost=3D35.23..410.51 = rows=3D196 width=3D207) (actual time=3D0.085..0.343 rows=3D30 loops=3D1) -> Nested Loop = (cost=3D34.95..319.23 rows=3D202 width=3D164) (actual time=3D0.078..0.264 = rows=3D32 loops=3D1) -> Nested Loop = (cost=3D34.81..263.23 rows=3D211 width=3D147) (actual time=3D0.073..0.203 = rows=3D32 loops=3D1) -> Merge Append = (cost=3D34.66..172.40 rows=3D354 width=3D141) (actual time=3D0.067..0.126 = rows=3D32 loops=3D1) Sort Key: = p.startt -> Sort = (cost=3D0.01..0.02 rows=3D1 width=3D669) (actual time=3D0.016..0.016 = rows=3D0 loops=3D1) Sort Key: = p.startt Sort = Method: quicksort Memory: 25kB -> Seq = Scan on cars p (cost=3D0.00..0.00 rows=3D1 width=3D669) (actual = time=3D0.001..0.001 rows=3D0 loops=3D1) = Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time = zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," = ",NULL}'::character(1)[])) AND (startt < (now() + = '00:15:00'::interval))) -> Index Scan = Backward using cars_startt on cars_2013 p_1 (cost=3D0.28..129.49 = rows=3D351 width=3D136) (actual time=3D0.043..0.096 rows=3D32 loops=3D1) Index = Cond: ((startt < (now() + '00:15:00'::interval)) AND (startt > = '2013-07-05 19:11:41.958154'::timestamp without time zone)) Filter: = ((endt > now()) AND (bus =3D ANY ('{m,f," "," = ",NULL}'::character(1)[]))) -> Sort = (cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 = rows=3D0 loops=3D1) Sort Key: = p_2.startt Sort = Method: quicksort Memory: 25kB -> Seq = Scan on cars_2014 p_2 (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual = time=3D0.000..0.000 rows=3D0 loops=3D1) = Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time = zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," = ",NULL}'::character(1)[])) AND (startt < (now() + = '00:15:00'::interval))) -> Sort = (cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 = rows=3D0 loops=3D1) Sort Key: = p_3.startt Sort = Method: quicksort Memory: 25kB -> Seq = Scan on cars_2015 p_3 (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual = time=3D0.000..0.000 rows=3D0 loops=3D1) = Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time = zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," = ",NULL}'::character(1)[])) AND (startt < (now() + = '00:15:00'::interval))) -> Index Scan using = oranges_pkey on oranges s (cost=3D0.14..0.25 rows=3D1 width=3D10) = (actual time=3D0.002..0.002 rows=3D1 loops=3D32) Index Cond: (id = =3D p.agent) Filter: = (visible AND (carpool =3D 100)) -> Index Scan using = roadworkers_pkey on roadworkers e (cost=3D0.14..0.26 rows=3D1 width=3D25)= (actual time=3D0.001..0.001 rows=3D1 loops=3D32) Index Cond: (id =3D = p.employee) -> Index Scan using bikes_pkey = on bikes i (cost=3D0.28..0.44 rows=3D1 width=3D55) (actual = time=3D0.002..0.002 rows=3D1 loops=3D32) Index Cond: (id =3D p.bike) Filter: (source IS NOT = NULL) Rows Removed by Filter: 0 -> Index Scan using cars_stats_pkey on = cars_stats ps (cost=3D0.28..0.48 rows=3D1 width=3D18) (actual = time=3D0.003..0.003 rows=3D1 loops=3D30) Index Cond: (id =3D p.id) SubPlan 2 -> Index Only Scan using = fruit_tags_pkey on fruit_tags fruit_tags_1 (cost=3D0.28..10.14 rows=3D3 = width=3D4) (actual time=3D0.001..0.002 rows=3D2 loops=3D30) Index Cond: (fruit =3D p.id) Heap Fetches: 60 -> Subquery Scan on updated (cost=3D10.16..10.18 rows=3D1 = width=3D221) (actual time=3D0.005..0.005 rows=3D0 loops=3D1) -> Limit (cost=3D10.16..10.17 rows=3D1 width=3D0) = (actual time=3D0.005..0.005 rows=3D0 loops=3D1) -> LockRows (cost=3D10.16..10.17 rows=3D1 = width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1) -> Sort (cost=3D10.16..10.16 rows=3D1 = width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1) Sort Key: last_change Sort Method: quicksort Memory: 25kB -> Result (cost=3D0.00..10.15 rows=3D1 = width=3D0) (actual time=3D0.000..0.000 rows=3D0 loops=3D1) One-Time Filter: false SubPlan 1 -> Index Only Scan using = fruit_tags_pkey on fruit_tags (cost=3D0.28..10.14 rows=3D3 width=3D4) = (never executed) Index Cond: (fruit =3D = id) Heap Fetches: 0 Total runtime: 0.885 ms (57 rows)
Hi, On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > I'm getting "out of binary heap slots", which offcourse spoils the fun of > the query. > Explain analyze gives this plan (again anonymized a bit, but can send proper > off-list): Since I reviewed the patch that introduced that message, I'd be interested in getting that. Ideally in a state where I can reproduce the issue in a new cluster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-30 23:00:15 +0200, Andres Freund wrote: > Hi, > > > On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > > I'm getting "out of binary heap slots", which offcourse spoils the fun of > > the query. > > > Explain analyze gives this plan (again anonymized a bit, but can send proper > > off-list): > > Since I reviewed the patch that introduced that message, I'd be > interested in getting that. Ideally in a state where I can reproduce the > issue in a new cluster. No need, found the bug. And I think can build a testcase myself. ExecReScanMergeAppend resets ms_initialized, but doesn't clear the binaryheap. Thus no new elements fit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 30. aug. 2013, at 23:05, Andres Freund <andres@2ndquadrant.com> wrote: > No need, found the bug. And I think can build a testcase myself. Sounds good. :) Please do let me know if there's anything I can do. Terje
Andres Freund wrote: > No need, found the bug. And I think can build a testcase myself. > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > binaryheap. Thus no new elements fit. Um. Are we missing a binaryheap_clear() method? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > No need, found the bug. And I think can build a testcase myself. > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > binaryheap. Thus no new elements fit. Sounds like a bug all right, but I'm not convinced it explains Terje's report. The thing this theory doesn't explain is why would Terje be having trouble reproducing the failure? Seems like re-running the same query ought to produce the same failure. regards, tom lane
On 2013-08-30 17:15:32 -0400, Alvaro Herrera wrote: > Andres Freund wrote: > > > No need, found the bug. And I think can build a testcase myself. > > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Um. Are we missing a binaryheap_clear() method? In the patch I am patch-to-be it's binaryheap_reset(), but yes ;). Are you already patching it, or do you want me to finish it? I have a not so nice testcase and I can confirm that this is the issue and that a binaryheap_reset() fixes it: SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-30 17:23:51 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > No need, found the bug. And I think can build a testcase myself. > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the same > query ought to produce the same failure. The number of rescans can be rather data-dependant, so I'd guess that's the reason. If many of the subplans don't return a tuple, it can take several resets to actually ever reach the heap's limit as we don't add the subplan to the heap in that case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug 30, 2013, at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the = same > query ought to produce the same failure. Same query, but possibly different context. The query originally runs as part of a larger stored procedure. (mailing the thing out in a separate mail, omitting the list for that = part) Other than the context, the explain analyze was run with the same = parameters as the failed query (copypasted from the logged failure). = It's possibly that execution might be different as I ran it perhaps half = an hour later. Possible, but not likely. Terje
On 2013-08-30 23:05:25 +0200, Andres Freund wrote: > On 2013-08-30 23:00:15 +0200, Andres Freund wrote: > > Hi, > > > > > > On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > > > I'm getting "out of binary heap slots", which offcourse spoils the fun of > > > the query. > > > > > Explain analyze gives this plan (again anonymized a bit, but can send proper > > > off-list): > > > > Since I reviewed the patch that introduced that message, I'd be > > interested in getting that. Ideally in a state where I can reproduce the > > issue in a new cluster. > > No need, found the bug. And I think can build a testcase myself. > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > binaryheap. Thus no new elements fit. Ok, patch for that attached. Should we add SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); as a regression test? I slightly on the "no" side... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2013-08-30 17:23:51 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > No need, found the bug. And I think can build a testcase myself. > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the same > query ought to produce the same failure. Even better explanation: The merge append is some steps below a LockRows node, so the number of rescans might depend on the concurrency because we'll do the EvalPlanQual dance on a concurrent row update. Terje, do you use read committed or repeatable read/serializable? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug 30, 2013, at 11:49 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Even better explanation: The merge append is some steps below a LockRows > node, so the number of rescans might depend on the concurrency because > we'll do the EvalPlanQual dance on a concurrent row update. > > Terje, do you use read committed or repeatable read/serializable? Using Read Committed. Terje
Andres Freund <andres@2ndquadrant.com> writes: > Terje, do you use read committed or repeatable read/serializable? Or even more to the point, can you apply the just-posted patch and see if the problem goes away for you? regards, tom lane
On Aug 31, 2013, at 12:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Or even more to the point, can you apply the just-posted patch and see > if the problem goes away for you? Will do. At our current organic load though, we likely wouldn't get any good = confirmation either way this week. I'll see if I can set up some synthetic load to provoke the error more = rapidly, then patch and re-check. It's getting quite late, and has been a bit of a long day, so it'll = unfortunately have to wait until tomorrow I think, sorry. :( Terje
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-08-30 23:05:25 +0200, Andres Freund wrote: >> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the >> binaryheap. Thus no new elements fit. > Ok, patch for that attached. I think the comments need a bit of copy-editing, but looks good otherwise. Will fix and commit. > Should we add > SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); > as a regression test? I slightly on the "no" side... Not sure. It's pretty disturbing that this wasn't caught earlier; it seems to me that means there's no regression coverage that hits ExecReScanMergeAppend. However, I don't much like this specific test case because it seems like hitting the bug could depend on what series of random values you get. regards, tom lane
On 2013-08-30 18:55:53 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-08-30 23:05:25 +0200, Andres Freund wrote: > >> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > >> binaryheap. Thus no new elements fit. > > > Ok, patch for that attached. > > I think the comments need a bit of copy-editing, but looks good otherwise. > Will fix and commit. Thanks. > > Should we add > > SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); > > as a regression test? I slightly on the "no" side... > > Not sure. It's pretty disturbing that this wasn't caught earlier; > it seems to me that means there's no regression coverage that hits > ExecReScanMergeAppend. However, I don't much like this specific test case > because it seems like hitting the bug could depend on what series of > random values you get. Hm, that should be fixable. How about: SELECT -- correlated subquery, with dependency on outer query, to force rescans -- which will be planned as a merge-append. (SELECT g.i FROM ( (SELECT * FROM generate_series(1, 2) ORDER BY 1) UNION ALL (SELECT * FROM generate_series(1, 2) ORDER BY 1) ) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 3) g(i); I couldn't find a simpler testcase within some minutes... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-08-30 18:55:53 -0400, Tom Lane wrote: >> Not sure. It's pretty disturbing that this wasn't caught earlier; >> it seems to me that means there's no regression coverage that hits >> ExecReScanMergeAppend. However, I don't much like this specific test case >> because it seems like hitting the bug could depend on what series of >> random values you get. > Hm, that should be fixable. How about: Looks good, applied. regards, tom lane
On 2013-08-30 19:28:39 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-08-30 18:55:53 -0400, Tom Lane wrote: > >> Not sure. It's pretty disturbing that this wasn't caught earlier; > >> it seems to me that means there's no regression coverage that hits > >> ExecReScanMergeAppend. However, I don't much like this specific test case > >> because it seems like hitting the bug could depend on what series of > >> random values you get. > > > Hm, that should be fixable. How about: > > Looks good, applied. On second thought, it might not be so good looking - the queries results are independent of the data from merge-append. So we only check that we don't crash and not that the results make any sense. How about the attached patch? I verified that it fails without the binaryheap_reset(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Andres Freund <andres@2ndquadrant.com> writes: > On second thought, it might not be so good looking - the queries results > are independent of the data from merge-append. So we only check that we > don't crash and not that the results make any sense. How about the > attached patch? Good point --- pushed. regards, tom lane