Thread: Backend crash with user defined aggregate

Backend crash with user defined aggregate

From
Ron Mayer
Date:

I tried making an aggregate function that concatenates
any non-null strings it encounters, as shown below.

The function works fine on small tables; but when I run
it on one of my larger tables it seems to me that it
crashes the backend - though I don't see anything obvious
in the log.


According to 'top', the postmaster grows pretty quickly
so perhaps it ran out of memory?   The query I ran is
doing a group-by and no individual value in the group
by clause should have that big a result; so hypothetically
it seems it wouldn't need to put everything in memory
if it sorted them first.

Any workarounds or other clever ways of doing this
kind of aggregation?  I guess a stored procedure that
reads one group at a time?

   Thanks in advance,
   Ron

------------------------------------------------------------
--- The definition of my aggregte
------------------------------------------------------------
CREATE OR REPLACE FUNCTION nonull_append_strings (
        text, text )
RETURNS text AS '
SELECT CASE WHEN $1 IS NULL THEN $2
        WHEN $2 IS NULL THEN $1
        ELSE $1 || '' '' || $2
        END;
' LANGUAGE sql IMMUTABLE;

CREATE OPERATOR ||+ (
        LEFTARG = TEXT,
        RIGHTARG = TEXT,
        PROCEDURE = nonull_append_strings
);

create aggregate strcat_agg (
    sfunc = nonull_append_strings,
    basetype = text,
    stype = text
);



------------------------------------------------------------
--- Example using the function
------------------------------------------------------------
fli=#
fli=#   create table new_keywords as
        select ext_doc_id,strcat_agg(nam)||+strcat_agg(val)
        from facet_raw group by ext_doc_id;
FATAL:  terminating connection due to administrator command
CONTEXT:  SQL function "nonull_append_strings" statement 1
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
fli=#


Re: Backend crash with user defined aggregate

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> I tried making an aggregate function that concatenates
> any non-null strings it encounters, as shown below.
> The function works fine on small tables; but when I run
> it on one of my larger tables it seems to me that it
> crashes the backend - though I don't see anything obvious
> in the log.

I tried this example in CVS tip and didn't see any particular misbehavior.
What PG version are you running?  Does EXPLAIN show that the query is
being done with a group or hash aggregation?  What do you have sort_mem
set to?

            regards, tom lane

Re: Backend crash with user defined aggregate

From
Ron Mayer
Date:
Tom Lane wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>>...aggregate function that concatenates
>>any non-null strings it encounters...crashes...
>
> I tried this example in CVS tip and didn't see any particular misbehavior.

One thing is that my table is quite large (20 GB or so of data in that
table with most of it in that column called 'val' being aggregated).
When I try it on small tables (100k rows) I get no problems.


> What PG version are you running?

fli=# select version();
                                        version
-------------------------------------------------------------------------------------
  PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)
(1 row)

> Does EXPLAIN show that the query is
> being done with a group or hash aggregation?

Fails the same way in each case.  It was doing a HashAggregate last time; but
I just reproduced it now with a GroupAggregate.


fli=#
fli=# explain select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id;
                                 QUERY PLAN
---------------------------------------------------------------------------
  HashAggregate  (cost=216543.88..216737.30 rows=9671 width=46)
    ->  Seq Scan on facet_raw  (cost=0.00..118746.32 rows=5588432 width=46)
(2 rows)

fli=# set enable_hashagg = false;
SET
fli=# explain select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id;
                                    QUERY PLAN
---------------------------------------------------------------------------------
  GroupAggregate  (cost=951426.23..1063388.29 rows=9671 width=46)
    ->  Sort  (cost=951426.23..965397.31 rows=5588432 width=46)
          Sort Key: ext_doc_id
          ->  Seq Scan on facet_raw  (cost=0.00..118746.32 rows=5588432 width=46)
(4 rows)

fli=# explain analyze select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
fli=#


>  What do you have sort_mem set to?


fli=# show sort_mem;
  work_mem
----------
  51829
(1 row)

fli=#


Re: Backend crash with user defined aggregate

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Fails the same way in each case.  It was doing a HashAggregate last time; but
> I just reproduced it now with a GroupAggregate.

> fli=# explain analyze select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id;
> FATAL:  terminating connection due to administrator command

Hmm.  Now that I look at it again, that message implies that something
sent SIGTERM to the backend.  You need to figure out where that signal
is coming from and why.  I think I have heard of platforms on which
SIGTERM is generated if a process exceeds a preset run-time limit;
is it possible that you've started your postmaster with finite limits
on process runtime?  What is the platform exactly, anyway?

            regards, tom lane

Re: Backend crash with user defined aggregate

From
Ron Mayer
Date:
Tom Lane wrote:
>
> Hmm.  Now that I look at it again, that message implies that something
> sent SIGTERM to the backend.  You need to figure out where that signal
> is coming from and why.  ...
> is it possible that you've started your postmaster with finite limits
> on process runtime?  What is the platform exactly, anyway?

Thanks. That's something for me to look into monday.  I don't think
there was a process runtime, but there might have been a memory limit.
The platform was a SuSE 9.1 on x86 but with so many updates & upgrades
from so many sources (rpm, apt, rcd, yast, tar.gz) that it's hard to
tell what it really is anymore.   Monday I'll try to reproduce on a
cleaner system and/or look for the SIGTERM with strace or something.