Thread: [BUGS] Crash with a CUBE query on 9.6

[BUGS] Crash with a CUBE query on 9.6

From
Heikki Linnakangas
Date:
The attached test case crashes on REL9_6_STABLE and master. On 9.5, it 
worked.

Stack trace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000555d4f8c64b6 in tuplesort_performsort (state=0x0) at 
tuplesort.c:1746
1746        MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
(gdb) bt
#0  0x0000555d4f8c64b6 in tuplesort_performsort (state=0x0) at 
tuplesort.c:1746
#1  0x0000555d4f59f739 in process_ordered_aggregate_single 
(aggstate=0x555d51d561d0, pertrans=0x555d51d7be78, 
pergroupstate=0x555d51d64bc0) at nodeAgg.c:1169
#2  0x0000555d4f5a033f in finalize_aggregates (aggstate=0x555d51d561d0, 
peraggs=0x555d51d647a8, pergroup=0x555d51d64bc0, currentSet=0) at 
nodeAgg.c:1588
#3  0x0000555d4f5a102f in agg_retrieve_direct (aggstate=0x555d51d561d0) 
at nodeAgg.c:2197
#4  0x0000555d4f5a09db in ExecAgg (node=0x555d51d561d0) at nodeAgg.c:1877
#5  0x0000555d4f58c2c3 in ExecProcNode (node=0x555d51d561d0) at 
execProcnode.c:503
#6  0x0000555d4f5ba47e in ExecSort (node=0x555d51d55f00) at nodeSort.c:103
#7  0x0000555d4f58c299 in ExecProcNode (node=0x555d51d55f00) at 
execProcnode.c:495
#8  0x0000555d4f58833f in ExecutePlan (estate=0x555d51d55de8, 
planstate=0x555d51d55f00, use_parallel_mode=0 '\000', 
operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0,
      direction=ForwardScanDirection, dest=0x555d51d4ba08) at 
execMain.c:1569
#9  0x0000555d4f5862da in standard_ExecutorRun 
(queryDesc=0x555d51c5c128, direction=ForwardScanDirection, count=0) at 
execMain.c:338
#10 0x0000555d4f586153 in ExecutorRun (queryDesc=0x555d51c5c128, 
direction=ForwardScanDirection, count=0) at execMain.c:286
#11 0x0000555d4f7327f8 in PortalRunSelect (portal=0x555d51c37468, 
forward=1 '\001', count=0, dest=0x555d51d4ba08) at pquery.c:948
#12 0x0000555d4f73247f in PortalRun (portal=0x555d51c37468, 
count=9223372036854775807, isTopLevel=1 '\001', dest=0x555d51d4ba08, 
altdest=0x555d51d4ba08, completionTag=0x7ffc26ce4350 "")
      at pquery.c:789
#13 0x0000555d4f72c2b6 in exec_simple_query (
      query_string=0x555d51cb8898 "SELECT CASE WHEN sale.qty < 10 THEN 1 
ELSE 2 END as newalias1, TO_CHAR(COALESCE(STDDEV(DISTINCT 
floor(sale.cn)),0),'99999999.9999999'),TO_CHAR(COALESCE(AVG(DISTINCT 
floor(sale.vn*sale.vn)),0),'9999999"...) at postgres.c:1094

- Heikki


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] Crash with a CUBE query on 9.6

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> The attached test case crashes on REL9_6_STABLE and master. On 9.5, it
> worked.

As best I can tell, this is the fault of commit 804163bc2.  The problem
query can be simplified to

SELECT
  STDDEV(DISTINCT floor(sale.cn)),
  AVG(DISTINCT floor(sale.cn))
FROM sale,vendor
WHERE sale.vn=vendor.vn
GROUP BY CUBE((sale.cn,sale.dt,sale.dt),(sale.pn,sale.prc),(sale.qty,sale.vn)),sale.qty order by 1,2 ;

The two aggregates share a "pertrans" state, but finalize_aggregates
does not account for that and calls process_ordered_aggregate_single()
twice on the same pertrans state.  The second time crashes because we
already deleted the tuplesort object the first time.

Probably, the loop in finalize_aggregates needs to be split into two,
one over the pertrans states and then a second one over the peragg states.
But this code has been hacked up enough since I last looked at it that
I'm hesitant to try to fix it myself.

            regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Crash with a CUBE query on 9.6

From
Heikki Linnakangas
Date:
On 12/19/2016 09:37 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> The attached test case crashes on REL9_6_STABLE and master. On 9.5, it
>> worked.
>
> As best I can tell, this is the fault of commit 804163bc2.

Oh, the ball is right back in my court, then.

> The problem query can be simplified to
>
> SELECT
>   STDDEV(DISTINCT floor(sale.cn)),
>   AVG(DISTINCT floor(sale.cn))
> FROM sale,vendor
> WHERE sale.vn=vendor.vn
> GROUP BY CUBE((sale.cn,sale.dt,sale.dt),(sale.pn,sale.prc),(sale.qty,sale.vn)),sale.qty order by 1,2 ;

And further:

SELECT
   STDDEV(DISTINCT g::float8),
   AVG(DISTINCT g::float8)
FROM generate_series(1, 1000) g;

> The two aggregates share a "pertrans" state, but finalize_aggregates
> does not account for that and calls process_ordered_aggregate_single()
> twice on the same pertrans state.  The second time crashes because we
> already deleted the tuplesort object the first time.
>
> Probably, the loop in finalize_aggregates needs to be split into two,
> one over the pertrans states and then a second one over the peragg states.
> But this code has been hacked up enough since I last looked at it that
> I'm hesitant to try to fix it myself.

Yes, that seems straightforward. I came up with the attached. Will 
commit tomorrow, barring objections.

Thanks for the debugging!

- Heikki

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] Crash with a CUBE query on 9.6

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 12/19/2016 09:37 PM, Tom Lane wrote:
>> Probably, the loop in finalize_aggregates needs to be split into two,
>> one over the pertrans states and then a second one over the peragg states.
>> But this code has been hacked up enough since I last looked at it that
>> I'm hesitant to try to fix it myself.

> Yes, that seems straightforward. I came up with the attached. Will 
> commit tomorrow, barring objections.

Code patch looks reasonable, but I do not get a crash on the proposed test
case with current code.  I suspect that integer avg() and sum() don't
actually share transstates, making the test case a bit off-point.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Crash with a CUBE query on 9.6

From
Heikki Linnakangas
Date:
On 12/19/2016 11:22 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 12/19/2016 09:37 PM, Tom Lane wrote:
>>> Probably, the loop in finalize_aggregates needs to be split into two,
>>> one over the pertrans states and then a second one over the peragg states.
>>> But this code has been hacked up enough since I last looked at it that
>>> I'm hesitant to try to fix it myself.
>
>> Yes, that seems straightforward. I came up with the attached. Will
>> commit tomorrow, barring objections.
>
> Code patch looks reasonable, but I do not get a crash on the proposed test
> case with current code.  I suspect that integer avg() and sum() don't
> actually share transstates, making the test case a bit off-point.

Hmm, it does crash here:

postgres=# select my_avg(distinct one),my_sum(distinct one) from 
(values(1),(3),(1)) t(one);
NOTICE:  avg_transfn called with 1
NOTICE:  avg_transfn called with 3
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

In the patch I sent yesterday, the expected output was wrong, though. 
The query there didn't use DISTINCT, so it didn't crash.

Also  note that the test case uses my_avg() and my_sum() rather than the 
built-ins. If you replace them with the built-ins, it indeed doesn't 
crash because sum(int) and avg(int) have different transition functions. 
But this does crash with the built-ins, too:

postgres=# select avg(distinct one::numeric), sum(distinct one::numeric) 
from (values(1),(3),(1)) t(one);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Fixed the expected output, and committed. Thanks!

- Heikki



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Crash with a CUBE query on 9.6

From
Andres Freund
Date:
Hi,

On 2016-12-19 22:49:10 +0200, Heikki Linnakangas wrote:
> On 12/19/2016 09:37 PM, Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > > The attached test case crashes on REL9_6_STABLE and master. On 9.5, it
> > > worked.
> > 
> > As best I can tell, this is the fault of commit 804163bc2.
> 
> Oh, the ball is right back in my court, then.

> Yes, that seems straightforward. I came up with the attached. Will commit
> tomorrow, barring objections.

No objections.

But an observation about how the code evolved lately (partially by me,
don't get me wrong): I think we've too much smarts at execution
time. IMO deduplicating transition states and such should much rather
done at plan time, and if we do it right we'd also get rid of the uggly
way AggState->aggs is built up...

Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Crash with a CUBE query on 9.6

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> But an observation about how the code evolved lately (partially by me,
> don't get me wrong): I think we've too much smarts at execution
> time. IMO deduplicating transition states and such should much rather
> done at plan time, and if we do it right we'd also get rid of the uggly
> way AggState->aggs is built up...

I've looked at that before, and concluded that it wouldn't do that much
beyond shuffling code from point A to point B.  What it would do that
I wouldn't like is widen the API between the planner and executor
(and everything else that looks at plan trees).  For instance, most of
nodeAgg.c's internal structs would have to become public, creating
hazards if we need to change them in a minor release.

You could make an argument that having a better handle on the number of
transition states would give the planner a more accurate gauge of the
use of workmem in HashAgg --- but AFAICT, that is far from the number
one problem we have in hashagg memory estimation, so I can't get excited
about it.

Not saying "no", but I think it'd be a lot of work for not much return.
We have lots of other fish more worth frying.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Crash with a CUBE query on 9.6

From
David Rowley
Date:
On 20 December 2016 at 20:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> postgres=# select avg(distinct one::numeric), sum(distinct one::numeric)
> from (values(1),(3),(1)) t(one);
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> Fixed the expected output, and committed. Thanks!

Many thanks for fixing this Heikki, and Tom for the investigation.

Sorry for not chipping in or helping. It was summer holiday time down
this end of the planet.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs