Thread: BUG #8410: out of binary heap slots

BUG #8410: out of binary heap slots

From

terje@elde.net

Date:

30 August 2013, 20:46:40

The following bug has been logged on the website:

Bug reference:      8410
Logged by:          Terje Elde
Email address:      terje@elde.net
PostgreSQL version: Unsupported/Unknown
Operating system:   FreeBSD
Description:

Running:


PostgreSQL 9.3beta1 on amd64-portbld-freebsd9.1, compiled by cc (GCC) 4.2.1
20070831 patched [FreeBSD], 64-bit


I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.


I'm having trouble reproducing it, as I'm only seeing the issue in about 1
in 20 000 queries.


I can get the error, turn right around and run the same again manually, and
it'll run just fine.


I'd love to see if I can reproduce in on 9.3rc1, but seeing it is rare
enough as it is.


Slightly anonymised from the logs:


ERROR:  out of binary heap slots
CONTEXT:  PL/pgSQL function foo(integer,bigint,character[],timestamp without
time zone,integer,timestamp without time zone,integer,timestamp without time
zone,integer,inet,character varying) line 233 at FETCH
STATEMENT:  SELECT    a, b, c, d, e, f, g, h, i, j, k, l, m, n, o
FROM foo( 100, 2221::bigint, ARRAY['m', 'f', '', ' ', NULL]::char[],
'2013-07-05T19:11:41.958154'::timestamp, 30::int, NULL::timestamp, 10::int,
'2013-08-30T19:11:41.958168'::timestamp::timestamp, 100::int,
'123.123.123.123'::inet, 'FOO/1.1 CFNetwork/609.1.4 Darwin/13.0.0'::varchar
)


The FETCH-line is running off of a cursor, with a query going pretty much
like this:


        OPEN curs FOR SELECT * FROM (
            SELECT
                'n'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
                e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
                p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE barzz =
p.id ) AS tags
            FROM
                p, ps, i, e, s
            WHERE
                s.tihi         = i_cid            AND
                s.v            = True            AND
                p.pub        = s.id            AND
                p.id        = ps.id            AND
                i.id        = p.foo            AND
                e.id        = p.bar            AND
                i.baz        IS NOT NULL        AND
                p.bam        = ANY( i_bam )    AND
                endt        > NOW()            AND
                startt        < NOW() + INTERVAL '15 minutes'     AND
                p.startt    > i_newerthan
            ORDER BY p.startt DESC LIMIT i_maxnew
            FOR UPDATE OF ps ) AS newer_than
        UNION SELECT * FROM (
            SELECT
                'u'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
                e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
                p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE bazz =
p.id ) AS tags
            FROM
                p, ps, i, e, s
            WHERE
                s.tihi        = i_cid            AND
                s.v            = True            AND
                p.pub        = s.id            AND
                p.id        = ps.id            AND
                i.id        = p.foo            AND
                e.id        = p.bar            AND
                i.baz        IS NOT NULL        AND
                p.bam        = ANY( i_bam )    AND
                endt        > NOW()            AND
                startt        < NOW() + INTERVAL '15 minutes'    AND
                startt        > i_oldest        AND
                p.last_change    > i_newerthan
            ORDER BY p.last_change DESC LIMIT i_maxupdates
            FOR UPDATE OF ps ) AS updated;


Yeah, I know.  Mangled/obfuscated bits in bug-reports are no fun, but the
code isn't mine to put in a public list.


If you'd like to take a look, I can't imagine it'd be much of a problem for
me to mail you the proper query (and whole function) off-list.


This is not a crisis for us, and I've been meaning to clean up or rewrite
the query anyway, but I figured I should fire off an email anyway, just in
case there's a bug in PostgreSQL that hasn't been caught since beta1.


(again, sorry about the old version).


Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):



                                                     QUERY PLAN



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=423.84..424.15 rows=31 width=223) (actual
time=0.668..0.675 rows=30 loops=1)
   ->  Append  (cost=35.50..422.67 rows=31 width=223) (actual
time=0.123..0.608 rows=30 loops=1)
         ->  Subquery Scan on apples  (cost=35.50..412.18 rows=30 width=223)
(actual time=0.123..0.600 rows=30 loops=1)
               ->  Limit  (cost=35.50..411.88 rows=30 width=217) (actual
time=0.122..0.591 rows=30 loops=1)
                     ->  LockRows  (cost=35.50..2494.50 rows=196 width=217)
(actual time=0.121..0.584 rows=30 loops=1)
                           ->  Nested Loop  (cost=35.50..2492.54 rows=196
width=217) (actual time=0.113..0.544 rows=30 loops=1)
                                 ->  Nested Loop  (cost=35.23..410.51
rows=196 width=207) (actual time=0.085..0.343 rows=30 loops=1)
                                       ->  Nested Loop  (cost=34.95..319.23
rows=202 width=164) (actual time=0.078..0.264 rows=32 loops=1)
                                             ->  Nested Loop
(cost=34.81..263.23 rows=211 width=147) (actual time=0.073..0.203 rows=32
loops=1)
                                                   ->  Merge Append
(cost=34.66..172.40 rows=354 width=141) (actual time=0.067..0.126 rows=32
loops=1)
                                                         Sort Key: p.startt
                                                         ->  Sort
(cost=0.01..0.02 rows=1 width=669) (actual time=0.016..0.016 rows=0
loops=1)
                                                               Sort Key:
p.startt
                                                               Sort Method:
quicksort  Memory: 25kB
                                                               ->  Seq Scan
on cars p  (cost=0.00..0.00 rows=1 width=669) (actual time=0.001..0.001
rows=0 loops=1)
                                                                     Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
                                                         ->  Index Scan
Backward using cars_startt on cars_2013 p_1  (cost=0.28..129.49 rows=351
width=136) (actual time=0.043..0.096 rows=32 loops=1)
                                                               Index Cond:
((startt < (now() + '00:15:00'::interval)) AND (startt > '2013-07-05
19:11:41.958154'::timestamp without time zone))
                                                               Filter:
((endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])))
                                                         ->  Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
                                                               Sort Key:
p_2.startt
                                                               Sort Method:
quicksort  Memory: 25kB
                                                               ->  Seq Scan
on cars_2014 p_2  (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
                                                                     Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
                                                         ->  Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
                                                               Sort Key:
p_3.startt
                                                               Sort Method:
quicksort  Memory: 25kB
                                                               ->  Seq Scan
on cars_2015 p_3  (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
                                                                     Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
                                                   ->  Index Scan using
oranges_pkey on oranges s  (cost=0.14..0.25 rows=1 width=10) (actual
time=0.002..0.002 rows=1 loops=32)
                                                         Index Cond: (id =
p.agent)
                                                         Filter: (visible
AND (carpool = 100))
                                             ->  Index Scan using
roadworkers_pkey on roadworkers e  (cost=0.14..0.26 rows=1 width=25) (actual
time=0.001..0.001 rows=1 loops=32)
                                                   Index Cond: (id =
p.employee)
                                       ->  Index Scan using bikes_pkey on
bikes i  (cost=0.28..0.44 rows=1 width=55) (actual time=0.002..0.002 rows=1
loops=32)
                                             Index Cond: (id = p.bike)
                                             Filter: (source IS NOT NULL)
                                             Rows Removed by Filter: 0
                                 ->  Index Scan using cars_stats_pkey on
cars_stats ps  (cost=0.28..0.48 rows=1 width=18) (actual time=0.003..0.003
rows=1 loops=30)
                                       Index Cond: (id = p.id)
                                 SubPlan 2
                                   ->  Index Only Scan using fruit_tags_pkey
on fruit_tags fruit_tags_1  (cost=0.28..10.14 rows=3 width=4) (actual
time=0.001..0.002 rows=2 loops=30)
                                         Index Cond: (fruit = p.id)
                                         Heap Fetches: 60
         ->  Subquery Scan on updated  (cost=10.16..10.18 rows=1 width=221)
(actual time=0.005..0.005 rows=0 loops=1)
               ->  Limit  (cost=10.16..10.17 rows=1 width=0) (actual
time=0.005..0.005 rows=0 loops=1)
                     ->  LockRows  (cost=10.16..10.17 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
                           ->  Sort  (cost=10.16..10.16 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
                                 Sort Key: last_change
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Result  (cost=0.00..10.15 rows=1
width=0) (actual time=0.000..0.000 rows=0 loops=1)
                                       One-Time Filter: false
                                       SubPlan 1
                                         ->  Index Only Scan using
fruit_tags_pkey on fruit_tags  (cost=0.28..10.14 rows=3 width=4) (never
executed)
                                               Index Cond: (fruit = id)
                                               Heap Fetches: 0
 Total runtime: 0.885 ms
(57 rows)


Terje

Re: BUG #8410: out of binary heap slots

From

Terje Elde

Date:

30 August 2013, 20:52:29

On Aug 30, 2013, at 10:46 PM, terje@elde.net wrote:
> The following bug has been logged on the website:

That didn't work out too well, rendering-wize.

Sending the explain analyze by email as well, hoping it'll be more =
readable:

                                                                         =
                                                         QUERY PLAN      =
                                                                         =
                                                   =20
=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
-------------------------------------------------
 HashAggregate  (cost=3D423.84..424.15 rows=3D31 width=3D223) (actual =
time=3D0.668..0.675 rows=3D30 loops=3D1)
   ->  Append  (cost=3D35.50..422.67 rows=3D31 width=3D223) (actual =
time=3D0.123..0.608 rows=3D30 loops=3D1)
         ->  Subquery Scan on apples  (cost=3D35.50..412.18 rows=3D30 =
width=3D223) (actual time=3D0.123..0.600 rows=3D30 loops=3D1)
               ->  Limit  (cost=3D35.50..411.88 rows=3D30 width=3D217) =
(actual time=3D0.122..0.591 rows=3D30 loops=3D1)
                     ->  LockRows  (cost=3D35.50..2494.50 rows=3D196 =
width=3D217) (actual time=3D0.121..0.584 rows=3D30 loops=3D1)
                           ->  Nested Loop  (cost=3D35.50..2492.54 =
rows=3D196 width=3D217) (actual time=3D0.113..0.544 rows=3D30 loops=3D1)
                                 ->  Nested Loop  (cost=3D35.23..410.51 =
rows=3D196 width=3D207) (actual time=3D0.085..0.343 rows=3D30 loops=3D1)
                                       ->  Nested Loop  =
(cost=3D34.95..319.23 rows=3D202 width=3D164) (actual time=3D0.078..0.264 =
rows=3D32 loops=3D1)
                                             ->  Nested Loop  =
(cost=3D34.81..263.23 rows=3D211 width=3D147) (actual time=3D0.073..0.203 =
rows=3D32 loops=3D1)
                                                   ->  Merge Append  =
(cost=3D34.66..172.40 rows=3D354 width=3D141) (actual time=3D0.067..0.126 =
rows=3D32 loops=3D1)
                                                         Sort Key: =
p.startt
                                                         ->  Sort  =
(cost=3D0.01..0.02 rows=3D1 width=3D669) (actual time=3D0.016..0.016 =
rows=3D0 loops=3D1)
                                                               Sort Key: =
p.startt
                                                               Sort =
Method: quicksort  Memory: 25kB
                                                               ->  Seq =
Scan on cars p  (cost=3D0.00..0.00 rows=3D1 width=3D669) (actual =
time=3D0.001..0.001 rows=3D0 loops=3D1)
                                                                     =
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
                                                         ->  Index Scan =
Backward using cars_startt on cars_2013 p_1  (cost=3D0.28..129.49 =
rows=3D351 width=3D136) (actual time=3D0.043..0.096 rows=3D32 loops=3D1)
                                                               Index =
Cond: ((startt < (now() + '00:15:00'::interval)) AND (startt > =
'2013-07-05 19:11:41.958154'::timestamp without time zone))
                                                               Filter: =
((endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])))
                                                         ->  Sort  =
(cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 =
rows=3D0 loops=3D1)
                                                               Sort Key: =
p_2.startt
                                                               Sort =
Method: quicksort  Memory: 25kB
                                                               ->  Seq =
Scan on cars_2014 p_2  (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual =
time=3D0.000..0.000 rows=3D0 loops=3D1)
                                                                     =
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
                                                         ->  Sort  =
(cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 =
rows=3D0 loops=3D1)
                                                               Sort Key: =
p_3.startt
                                                               Sort =
Method: quicksort  Memory: 25kB
                                                               ->  Seq =
Scan on cars_2015 p_3  (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual =
time=3D0.000..0.000 rows=3D0 loops=3D1)
                                                                     =
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
                                                   ->  Index Scan using =
oranges_pkey on oranges s  (cost=3D0.14..0.25 rows=3D1 width=3D10) =
(actual time=3D0.002..0.002 rows=3D1 loops=3D32)
                                                         Index Cond: (id =
=3D p.agent)
                                                         Filter: =
(visible AND (carpool =3D 100))
                                             ->  Index Scan using =
roadworkers_pkey on roadworkers e  (cost=3D0.14..0.26 rows=3D1 width=3D25)=
 (actual time=3D0.001..0.001 rows=3D1 loops=3D32)
                                                   Index Cond: (id =3D =
p.employee)
                                       ->  Index Scan using bikes_pkey =
on bikes i  (cost=3D0.28..0.44 rows=3D1 width=3D55) (actual =
time=3D0.002..0.002 rows=3D1 loops=3D32)
                                             Index Cond: (id =3D p.bike)
                                             Filter: (source IS NOT =
NULL)
                                             Rows Removed by Filter: 0
                                 ->  Index Scan using cars_stats_pkey on =
cars_stats ps  (cost=3D0.28..0.48 rows=3D1 width=3D18) (actual =
time=3D0.003..0.003 rows=3D1 loops=3D30)
                                       Index Cond: (id =3D p.id)
                                 SubPlan 2
                                   ->  Index Only Scan using =
fruit_tags_pkey on fruit_tags fruit_tags_1  (cost=3D0.28..10.14 rows=3D3 =
width=3D4) (actual time=3D0.001..0.002 rows=3D2 loops=3D30)
                                         Index Cond: (fruit =3D p.id)
                                         Heap Fetches: 60
         ->  Subquery Scan on updated  (cost=3D10.16..10.18 rows=3D1 =
width=3D221) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
               ->  Limit  (cost=3D10.16..10.17 rows=3D1 width=3D0) =
(actual time=3D0.005..0.005 rows=3D0 loops=3D1)
                     ->  LockRows  (cost=3D10.16..10.17 rows=3D1 =
width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
                           ->  Sort  (cost=3D10.16..10.16 rows=3D1 =
width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
                                 Sort Key: last_change
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Result  (cost=3D0.00..10.15 rows=3D1 =
width=3D0) (actual time=3D0.000..0.000 rows=3D0 loops=3D1)
                                       One-Time Filter: false
                                       SubPlan 1
                                         ->  Index Only Scan using =
fruit_tags_pkey on fruit_tags  (cost=3D0.28..10.14 rows=3D3 width=3D4) =
(never executed)
                                               Index Cond: (fruit =3D =
id)
                                               Heap Fetches: 0
 Total runtime: 0.885 ms
(57 rows)

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:00:26

Hi,


On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:
> I'm getting "out of binary heap slots", which offcourse spoils the fun of
> the query.

> Explain analyze gives this plan (again anonymized a bit, but can send proper
> off-list):

Since I reviewed the patch that introduced that message, I'd be
interested in getting that. Ideally in a state where I can reproduce the
issue in a new cluster.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:05:32

On 2013-08-30 23:00:15 +0200, Andres Freund wrote:
> Hi,
>
>
> On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:
> > I'm getting "out of binary heap slots", which offcourse spoils the fun of
> > the query.
>
> > Explain analyze gives this plan (again anonymized a bit, but can send proper
> > off-list):
>
> Since I reviewed the patch that introduced that message, I'd be
> interested in getting that. Ideally in a state where I can reproduce the
> issue in a new cluster.

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Terje Elde

Date:

30 August 2013, 21:07:46

On 30. aug. 2013, at 23:05, Andres Freund <andres@2ndquadrant.com> wrote:
> No need, found the bug. And I think can build a testcase myself.

Sounds good.  :)

Please do let me know if there's anything I can do.

Terje

Re: BUG #8410: out of binary heap slots

From

Alvaro Herrera

Date:

30 August 2013, 21:15:44

Andres Freund wrote:

> No need, found the bug. And I think can build a testcase myself.
>
> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> binaryheap. Thus no new elements fit.

Um.  Are we missing a binaryheap_clear() method?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Tom Lane

Date:

30 August 2013, 21:24:02

Andres Freund <andres@2ndquadrant.com> writes:
> No need, found the bug. And I think can build a testcase myself.

> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> binaryheap. Thus no new elements fit.

Sounds like a bug all right, but I'm not convinced it explains Terje's
report.  The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure?  Seems like re-running the same
query ought to produce the same failure.

            regards, tom lane

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:27:13

On 2013-08-30 17:15:32 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > No need, found the bug. And I think can build a testcase myself.
> >
> > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> > binaryheap. Thus no new elements fit.
>
> Um.  Are we missing a binaryheap_clear() method?

In the patch I am patch-to-be it's binaryheap_reset(), but yes ;). Are
you already patching it, or do you want me to finish it? I have a not so
nice testcase and I can confirm that this is the issue and that a
binaryheap_reset() fixes it:

SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET
0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); 

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:30:17

On 2013-08-30 17:23:51 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > No need, found the bug. And I think can build a testcase myself.
>
> > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> > binaryheap. Thus no new elements fit.
>
> Sounds like a bug all right, but I'm not convinced it explains Terje's
> report.  The thing this theory doesn't explain is why would Terje be
> having trouble reproducing the failure?  Seems like re-running the same
> query ought to produce the same failure.

The number of rescans can be rather data-dependant, so I'd guess that's
the reason. If many of the subplans don't return a tuple, it can take
several resets to actually ever reach the heap's limit as we don't
add the subplan to the heap in that case.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Terje Elde

Date:

30 August 2013, 21:31:47

On Aug 30, 2013, at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sounds like a bug all right, but I'm not convinced it explains Terje's
> report.  The thing this theory doesn't explain is why would Terje be
> having trouble reproducing the failure?  Seems like re-running the =
same
> query ought to produce the same failure.

Same query, but possibly different context.

The query originally runs as part of a larger stored procedure.

(mailing the thing out in a separate mail, omitting the list for that =
part)

Other than the context, the explain analyze was run with the same =
parameters as the failed query (copypasted from the logged failure).  =
It's possibly that execution might be different as I ran it perhaps half =
an hour later.  Possible, but not likely.

Terje

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:44:35

On 2013-08-30 23:05:25 +0200, Andres Freund wrote:
> On 2013-08-30 23:00:15 +0200, Andres Freund wrote:
> > Hi,
> >
> >
> > On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:
> > > I'm getting "out of binary heap slots", which offcourse spoils the fun of
> > > the query.
> >
> > > Explain analyze gives this plan (again anonymized a bit, but can send proper
> > > off-list):
> >
> > Since I reviewed the patch that introduced that message, I'd be
> > interested in getting that. Ideally in a state where I can reproduce the
> > issue in a new cluster.
>
> No need, found the bug. And I think can build a testcase myself.
>
> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> binaryheap. Thus no new elements fit.

Ok, patch for that attached. Should we add
SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET
0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); 
as a regression test? I slightly on the "no" side...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Reset-the-binary-heap-in-merge-append-rescans.patch

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 21:49:20

On 2013-08-30 17:23:51 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > No need, found the bug. And I think can build a testcase myself.
>
> > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> > binaryheap. Thus no new elements fit.
>
> Sounds like a bug all right, but I'm not convinced it explains Terje's
> report.  The thing this theory doesn't explain is why would Terje be
> having trouble reproducing the failure?  Seems like re-running the same
> query ought to produce the same failure.

Even better explanation: The merge append is some steps below a LockRows
node, so the number of rescans might depend on the concurrency because
we'll do the EvalPlanQual dance on a concurrent row update.

Terje, do you use read committed or repeatable read/serializable?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Terje Elde

Date:

30 August 2013, 22:18:59

On Aug 30, 2013, at 11:49 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Even better explanation: The merge append is some steps below a LockRows
> node, so the number of rescans might depend on the concurrency because
> we'll do the EvalPlanQual dance on a concurrent row update.
>
> Terje, do you use read committed or repeatable read/serializable?

Using Read Committed.

Terje

Re: BUG #8410: out of binary heap slots

From

Tom Lane

Date:

30 August 2013, 22:22:31

Andres Freund <andres@2ndquadrant.com> writes:
> Terje, do you use read committed or repeatable read/serializable?

Or even more to the point, can you apply the just-posted patch and see
if the problem goes away for you?

            regards, tom lane

Re: BUG #8410: out of binary heap slots

From

Terje Elde

Date:

30 August 2013, 22:44:07

On Aug 31, 2013, at 12:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or even more to the point, can you apply the just-posted patch and see
> if the problem goes away for you?

Will do.

At our current organic load though, we likely wouldn't get any good =
confirmation either way this week.

I'll see if I can set up some synthetic load to provoke the error more =
rapidly, then patch and re-check.

It's getting quite late, and has been a bit of a long day, so it'll =
unfortunately have to wait until tomorrow I think, sorry. :(

Terje

Re: BUG #8410: out of binary heap slots

From

Tom Lane

Date:

30 August 2013, 22:56:08

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-30 23:05:25 +0200, Andres Freund wrote:
>> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
>> binaryheap. Thus no new elements fit.

> Ok, patch for that attached.

I think the comments need a bit of copy-editing, but looks good otherwise.
Will fix and commit.

> Should we add
> SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET
0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); 
> as a regression test? I slightly on the "no" side...

Not sure.  It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend.  However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.

            regards, tom lane

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 23:12:36

On 2013-08-30 18:55:53 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-08-30 23:05:25 +0200, Andres Freund wrote:
> >> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> >> binaryheap. Thus no new elements fit.
>
> > Ok, patch for that attached.
>
> I think the comments need a bit of copy-editing, but looks good otherwise.
> Will fix and commit.

Thanks.

> > Should we add
> > SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1
OFFSET0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); 
> > as a regression test? I slightly on the "no" side...
>
> Not sure.  It's pretty disturbing that this wasn't caught earlier;
> it seems to me that means there's no regression coverage that hits
> ExecReScanMergeAppend.  However, I don't much like this specific test case
> because it seems like hitting the bug could depend on what series of
> random values you get.

Hm, that should be fixable. How about:

SELECT
-- correlated subquery, with dependency on outer query, to force rescans
-- which will be planned as a merge-append.
    (SELECT g.i FROM (
        (SELECT * FROM generate_series(1, 2) ORDER BY 1)
        UNION ALL
        (SELECT * FROM generate_series(1, 2) ORDER BY 1)
    ) f(i)
    ORDER BY f.i LIMIT 1)
FROM generate_series(1, 3) g(i);

I couldn't find a simpler testcase within some minutes...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8410: out of binary heap slots

From

Tom Lane

Date:

30 August 2013, 23:28:49

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-30 18:55:53 -0400, Tom Lane wrote:
>> Not sure.  It's pretty disturbing that this wasn't caught earlier;
>> it seems to me that means there's no regression coverage that hits
>> ExecReScanMergeAppend.  However, I don't much like this specific test case
>> because it seems like hitting the bug could depend on what series of
>> random values you get.

> Hm, that should be fixable. How about:

Looks good, applied.

            regards, tom lane

Re: BUG #8410: out of binary heap slots

From

Andres Freund

Date:

30 August 2013, 23:56:05

On 2013-08-30 19:28:39 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-08-30 18:55:53 -0400, Tom Lane wrote:
> >> Not sure.  It's pretty disturbing that this wasn't caught earlier;
> >> it seems to me that means there's no regression coverage that hits
> >> ExecReScanMergeAppend.  However, I don't much like this specific test case
> >> because it seems like hitting the bug could depend on what series of
> >> random values you get.
>
> > Hm, that should be fixable. How about:
>
> Looks good, applied.

On second thought, it might not be so good looking - the queries results
are independent of the data from merge-append. So we only check that we
don't crash and not that the results make any sense. How about the
attached patch?
I verified that it fails without the binaryheap_reset().

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Improve-regression-test-for-8410.patch

Re: BUG #8410: out of binary heap slots

From

Tom Lane

Date:

31 August 2013, 01:41:50

Andres Freund <andres@2ndquadrant.com> writes:
> On second thought, it might not be so good looking - the queries results
> are independent of the data from merge-append. So we only check that we
> don't crash and not that the results make any sense. How about the
> attached patch?

Good point --- pushed.

            regards, tom lane