Thread: Backend crash with user defined aggregate
I tried making an aggregate function that concatenates any non-null strings it encounters, as shown below. The function works fine on small tables; but when I run it on one of my larger tables it seems to me that it crashes the backend - though I don't see anything obvious in the log. According to 'top', the postmaster grows pretty quickly so perhaps it ran out of memory? The query I ran is doing a group-by and no individual value in the group by clause should have that big a result; so hypothetically it seems it wouldn't need to put everything in memory if it sorted them first. Any workarounds or other clever ways of doing this kind of aggregation? I guess a stored procedure that reads one group at a time? Thanks in advance, Ron ------------------------------------------------------------ --- The definition of my aggregte ------------------------------------------------------------ CREATE OR REPLACE FUNCTION nonull_append_strings ( text, text ) RETURNS text AS ' SELECT CASE WHEN $1 IS NULL THEN $2 WHEN $2 IS NULL THEN $1 ELSE $1 || '' '' || $2 END; ' LANGUAGE sql IMMUTABLE; CREATE OPERATOR ||+ ( LEFTARG = TEXT, RIGHTARG = TEXT, PROCEDURE = nonull_append_strings ); create aggregate strcat_agg ( sfunc = nonull_append_strings, basetype = text, stype = text ); ------------------------------------------------------------ --- Example using the function ------------------------------------------------------------ fli=# fli=# create table new_keywords as select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id; FATAL: terminating connection due to administrator command CONTEXT: SQL function "nonull_append_strings" statement 1 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. fli=#
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > I tried making an aggregate function that concatenates > any non-null strings it encounters, as shown below. > The function works fine on small tables; but when I run > it on one of my larger tables it seems to me that it > crashes the backend - though I don't see anything obvious > in the log. I tried this example in CVS tip and didn't see any particular misbehavior. What PG version are you running? Does EXPLAIN show that the query is being done with a group or hash aggregation? What do you have sort_mem set to? regards, tom lane
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >>...aggregate function that concatenates >>any non-null strings it encounters...crashes... > > I tried this example in CVS tip and didn't see any particular misbehavior. One thing is that my table is quite large (20 GB or so of data in that table with most of it in that column called 'val' being aggregated). When I try it on small tables (100k rows) I get no problems. > What PG version are you running? fli=# select version(); version ------------------------------------------------------------------------------------- PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row) > Does EXPLAIN show that the query is > being done with a group or hash aggregation? Fails the same way in each case. It was doing a HashAggregate last time; but I just reproduced it now with a GroupAggregate. fli=# fli=# explain select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id; QUERY PLAN --------------------------------------------------------------------------- HashAggregate (cost=216543.88..216737.30 rows=9671 width=46) -> Seq Scan on facet_raw (cost=0.00..118746.32 rows=5588432 width=46) (2 rows) fli=# set enable_hashagg = false; SET fli=# explain select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id; QUERY PLAN --------------------------------------------------------------------------------- GroupAggregate (cost=951426.23..1063388.29 rows=9671 width=46) -> Sort (cost=951426.23..965397.31 rows=5588432 width=46) Sort Key: ext_doc_id -> Seq Scan on facet_raw (cost=0.00..118746.32 rows=5588432 width=46) (4 rows) fli=# explain analyze select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. fli=# > What do you have sort_mem set to? fli=# show sort_mem; work_mem ---------- 51829 (1 row) fli=#
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Fails the same way in each case. It was doing a HashAggregate last time; but > I just reproduced it now with a GroupAggregate. > fli=# explain analyze select ext_doc_id,strcat_agg(nam)||+strcat_agg(val) from facet_raw group by ext_doc_id; > FATAL: terminating connection due to administrator command Hmm. Now that I look at it again, that message implies that something sent SIGTERM to the backend. You need to figure out where that signal is coming from and why. I think I have heard of platforms on which SIGTERM is generated if a process exceeds a preset run-time limit; is it possible that you've started your postmaster with finite limits on process runtime? What is the platform exactly, anyway? regards, tom lane
Tom Lane wrote: > > Hmm. Now that I look at it again, that message implies that something > sent SIGTERM to the backend. You need to figure out where that signal > is coming from and why. ... > is it possible that you've started your postmaster with finite limits > on process runtime? What is the platform exactly, anyway? Thanks. That's something for me to look into monday. I don't think there was a process runtime, but there might have been a memory limit. The platform was a SuSE 9.1 on x86 but with so many updates & upgrades from so many sources (rpm, apt, rcd, yast, tar.gz) that it's hard to tell what it really is anymore. Monday I'll try to reproduce on a cleaner system and/or look for the SIGTERM with strace or something.