Thread: array_agg crash?
I had one simple query that kept crashing the connection. It crashes after several minutes.
Tried restarting, it still error’d at the same place.
Tried recreating the table it was selecting from, it still error’d at the same place.
I rewrote the query with an ARRAY subselect and it finished flawlessly in a few seconds.
There is about 4 million records in the table its selecting from. No array ends up with more than 4 elements.
For some reason, the log indicates this is causing an issue with autovacuum…it says it was -9’d, but it wasn’t by or any other physical person.
Here is the problem query…
create table public.temptrips
as
select trip_id,array_agg(customer_upload_id)
from
trip_ids_to_customer_upload_ids
group by trip_id;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
LOG: 00000: autovacuum launcher process (PID 10264) was terminated by signal 9: Killed
2009-07-21 08:44:26 EDT - LOCATION: LogChildExit, postmaster.c:2673
2009-07-21 08:44:26 EDT - LOG: 00000: terminating any other active server processes
2009-07-21 08:44:26 EDT - LOCATION: HandleChildCrash, postmaster.c:2500
2009-07-21 08:44:26 EDT - WARNING: 57P02: terminating connection because of crash of another server process
2009-07-21 08:44:26 EDT - DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2009-07-21 08:44:26 EDT - HINT: In a moment you should be able to reconnect to the database and repeat your command.
2009-07-21 08:44:26 EDT - LOCATION: quickdie, postgres.c:2495
2009-07-21 08:44:27 EDT - postgres postgres /usr/local/pgsql/bin/postmaster FATAL: 57P03: the database system is in recovery mode
2009-07-21 08:44:27 EDT - postgres postgres /usr/local/pgsql/bin/postmaster LOCATION: ProcessStartupPacket, postmaster.c:1721
2009-07-21 08:44:26 EDT - postgres postgres startup WARNING: 57P02: terminating connection because of crash of another server process
2009-07-21 08:44:27 EDT - postgres postgres startup DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2009-07-21 08:44:27 EDT - postgres postgres startup HINT: In a moment you should be able to reconnect to the database and repeat your command.
2009-07-21 08:44:27 EDT - postgres postgres startup LOCATION: quickdie, postgres.c:2495
2009-07-21 08:44:28 EDT - LOG: 00000: all server processes terminated; reinitializing
2009-07-21 08:44:28 EDT - LOCATION: PostmasterStateMachine, postmaster.c:2858
2009-07-21 08:44:28 EDT - LOG: 00000: database system was interrupted; last known up at 2009-07-21 08:41:32 EDT
2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5236
2009-07-21 08:44:28 EDT - LOG: 00000: database system was not properly shut down; automatic recovery in progress
2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5410
2009-07-21 08:44:28 EDT - LOG: 00000: redo starts at 76/4380AC70
2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5493
2009-07-21 08:44:29 EDT - LOG: 00000: record with zero length at 76/438869D0
2009-07-21 08:44:29 EDT - LOCATION: ReadRecord, xlog.c:3532
2009-07-21 08:44:29 EDT - LOG: 00000: redo done at 76/438869A0
2009-07-21 08:44:29 EDT - LOCATION: StartupXLOG, xlog.c:5625
2009-07-21 08:44:29 EDT - LOG: 00000: last completed transaction was at log time 2009-07-21 08:41:49.707423-04
2009-07-21 08:44:29 EDT - LOCATION: StartupXLOG, xlog.c:5629
2009-07-21 08:44:30 EDT - LOG: 00000: autovacuum launcher started
2009-07-21 08:44:30 EDT - LOCATION: AutoVacLauncherMain, autovacuum.c:529
2009-07-21 08:44:30 EDT - LOG: 00000: database system is ready to accept connections
2009-07-21 08:44:30 EDT - LOCATION: reaper, postmaster.c:2272
It looks like this is causing the autovacuum to crash, what could cause this?
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8.0.1), 64-bit
(1 row)
Chris
"Chris Spotts" <rfusca@gmail.com> writes: > LOG: 00000: autovacuum launcher process (PID 10264) was terminated by > signal 9: Killed Looks like the OOM killer is loose on your system. Disable memory overcommit in the kernel and things will get better. http://www.postgresql.org/docs/8.4/static/kernel-resources.html#AEN24152 regards, tom lane
> "Chris Spotts" <rfusca@gmail.com> writes: > > LOG: 00000: autovacuum launcher process (PID 10264) was terminated > by > > signal 9: Killed > > Looks like the OOM killer is loose on your system. Disable memory > overcommit in the kernel and things will get better. > http://www.postgresql.org/docs/8.4/static/kernel- > resources.html#AEN24152 > > regards, tom lane [Spotts, Christopher] That seems to have fixed it in the sense that it doesn't crash, but I killed it after 15 minutes and no results...rather I tried to kill it, but it looks like I'm going to have to -9 it...
"Chris Spotts" <rfusca@gmail.com> writes: > That seems to have fixed it in the sense that it doesn't crash, but I killed > it after 15 minutes and no results...rather I tried to kill it, but it looks > like I'm going to have to -9 it... How big were the arrays you were trying to push around here? I tried interrupting a similar query and it came right back; but if you were building some really enormous arrays I could see the final array build loop taking awhile, and there's no CHECK_FOR_INTERRUPTS in it... regards, tom lane
> killed > > it after 15 minutes and no results...rather I tried to kill it, but > it looks > > like I'm going to have to -9 it... > > How big were the arrays you were trying to push around here? I tried > interrupting a similar query and it came right back; but if you were > building some really enormous arrays I could see the final array > build loop taking awhile, and there's no CHECK_FOR_INTERRUPTS in it... > > regards, tom lane [Spotts, Christopher] Like I'd said originally, there were no arrays that ended up being more 4 elements long - all integers. The vast majority of them were 1 or 2 long. If it was having trouble allocating the memory for arrays, why doesn't using a ARRAY(subselect) yield the same issue?
"Chris Spotts" <rfusca@gmail.com> writes: >>> killed >>> it after 15 minutes and no results...rather I tried to kill it, but >>> it looks like I'm going to have to -9 it... >> >> How big were the arrays you were trying to push around here? > Like I'd said originally, there were no arrays that ended up being more 4 > elements long - all integers. The vast majority of them were 1 or 2 long. Hm, maybe the problem is not so much array size as too many arrays. How many groups are you expecting in that query? Does the plan for the array_agg query show hash or group aggregation? regards, tom lane
> >> How big were the arrays you were trying to push around here? > > > Like I'd said originally, there were no arrays that ended up being > more 4 > > elements long - all integers. The vast majority of them were 1 or 2 > long. > > Hm, maybe the problem is not so much array size as too many arrays. > How > many groups are you expecting in that query? Does the plan for the > array_agg query show hash or group aggregation? > > regards, tom lane [Spotts, Christopher] GroupAggregate
"Chris Spotts" <rfusca@gmail.com> writes: >> many groups are you expecting in that query? Does the plan for the >> array_agg query show hash or group aggregation? > GroupAggregate Huh, there's no reason it should take much memory then. Maybe you've found a memory leak. Can you put together a self-contained test case? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, July 21, 2009 12:16 PM > To: Chris Spotts > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] array_agg crash? > > "Chris Spotts" <rfusca@gmail.com> writes: > >> many groups are you expecting in that query? Does the plan for the > >> array_agg query show hash or group aggregation? > > > GroupAggregate > > Huh, there's no reason it should take much memory then. Maybe you've > found a memory leak. Can you put together a self-contained test case? > > regards, tom lane [Spotts, Christopher] I'll try to put something together tonight and let you know tomorrow.
Tom Lane wrote: > "Chris Spotts" <rfusca@gmail.com> writes: > >>> many groups are you expecting in that query? Does the plan for the >>> array_agg query show hash or group aggregation? >>> > > >> GroupAggregate >> > > Huh, there's no reason it should take much memory then. Maybe you've > found a memory leak. Can you put together a self-contained test case? > > regards, tom lane > What do you want specifically as far as details for the test case? I exported just the table that that was reading from. Installed a new clean virtual machine ubuntu (jaunty) and then installed 8.4.0. Imported the table and definition. Ran the same query and the same thing happened. Table its selecting from is: Table "public.trip_ids_to_customer_upload_ids" Column | Type | Modifiers | Storage | Description --------------------+---------+-----------+---------+------------- trip_id | bigint | | plain | customer_upload_id | integer | | plain | Indexes: "trips_customer_id" btree (trip_id, customer_upload_id) Has OIDs: no There is 3801347 rows in the table. There are 3773039 distinct trip_id values. So you can see that the vast majority of rows here are just a single element array.
Chris Spotts <rfusca@gmail.com> writes: > What do you want specifically as far as details for the test case? I > exported just the table that that was reading from. Installed a new > clean virtual machine ubuntu (jaunty) and then installed 8.4.0. > Imported the table and definition. Ran the same query and the same > thing happened. The table dump and the query would be enough then. Can you send it to me off-list? regards, tom lane