Thread: autovacuum process (PID ...) was terminated by signal 11
Hi. When I'm doing a database load of a 5gb database, autovacuum always segfaults shortly after the load finishes. The load is being done via slony during the initial copy set command while building a slave, not through pg_restore. LOG: autovacuum process (PID ...) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. This problem sounds similiar to these reports: http://archives.postgresql.org/pgsql-bugs/2005-09/msg00200.php and http://archives.postgresql.org/pgsql-bugs/2005-11/msg00276.php here is the stack for the autovacuum process. Program received signal SIGSEGV, Segmentation fault. CopySnapshot (snapshot=0x0) at tqual.c:1301 1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) + (gdb) bt #0 CopySnapshot (snapshot=0x0) at tqual.c:1301 #1 0x0812388c in _SPI_execute_plan (plan=0x83cd3e0, Values=0x841adf0, Nulls=0x841ae00 " ", snapshot=0x0, crosscheck_snapshot=0x0, read_only=1 '\001', tcount=1) at spi.c:1433 #2 0x0812215a in SPI_execute_plan (plan=0x83cd3e0, Values=0x841adf0, Nulls=0x841ae00 " ", read_only=1 '\001', tcount=1) at spi.c:336 #3 0x403cc0ef in ?? () #4 0x403c90e9 in ?? () #5 0x403c8668 in ?? () #6 0x403c84e7 in ?? () #7 0x403c83f1 in ?? () #8 0x403c7651 in ?? () #9 0x403c442a in ?? () #10 0x081137c8 in ExecMakeFunctionResult (fcache=0x83fcb88, econtext=0x83fcb18, isNull=0xbfffe94b "?\020???", isDone=0x0) at execQual.c:1095 #11 0x0811554e in ExecEvalExprSwitchContext (expression=0x83fcb88, econtext=0x83fcb18, isNull=0xbfffe94b "?\020???", isDone=0x0) at execQual.c:2864 #12 0x080b20e6 in FormIndexDatum (indexInfo=0x83ad7d0, slot=0x83fea98, estate=0x83fca90, values=0xbfffea10, isnull=0xbfffe9f0 "??:\b\n") at index.c:971 #13 0x080dd832 in compute_index_stats (onerel=0x401f6f58, totalrows=11323, ---Type <return> to continue, or q <return> to quit--- indexdata=0x83ad3d0, nindexes=4, rows=0x83e9940, numrows=3000, col_context=0x83fab88) at analyze.c:540 #14 0x080dd4b7 in analyze_rel (relid=32465292, vacstmt=0x83a48d8) at analyze.c:387 #15 0x08109365 in vacuum (vacstmt=0x83a48d8, relids=0x83a5540) at vacuum.c:476 #16 0x0815b4bd in autovacuum_do_vac_analyze (relids=0x83a5540, dovacuum=0, doanalyze=1, freeze=0) at autovacuum.c:907 #17 0x0815b16d in do_autovacuum (dbentry=0x82fb410) at autovacuum.c:681 #18 0x0815acb7 in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:423 #19 0x0815aa01 in autovac_start () at autovacuum.c:170 #20 0x0815fe84 in ServerLoop () at postmaster.c:1269 #21 0x0815f80f in PostmasterMain (argc=2, argv=0x82d6158) at postmaster.c:943 #22 0x0812a9fb in main (argc=2, argv=0x82d6158) at main.c:256 #23 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) -------------------------------------------- MobyGames http://www.mobygames.com The world's largest and most comprehensive gaming database project
Brian Hirt <bhirt@mobygames.com> writes: > When I'm doing a database load of a 5gb database, autovacuum always > segfaults shortly after the load finishes. This sure looks like the same bug already fixed in 8.1.1: 2005-11-28 08:35 alvherre * src/backend/postmaster/autovacuum.c: Set a snapshot before running analyze on a single table, to avoid a crash when analyzing tables with expressional indexes. Per report from Frank van Vugt. regards, tom lane
that's strange, because I'm running 8.1.1. [root@pgdb01 root]# /usr/pg-8.1/bin/postmaster --version postmaster (PostgreSQL) 8.1.1 Is there more information i can provide to help find the problem? On Jan 4, 2006, at 10:04 AM, Tom Lane wrote: > Brian Hirt <bhirt@mobygames.com> writes: >> When I'm doing a database load of a 5gb database, autovacuum always >> segfaults shortly after the load finishes. > > This sure looks like the same bug already fixed in 8.1.1: > > 2005-11-28 08:35 alvherre > > * src/backend/postmaster/autovacuum.c: Set a snapshot before > running analyze on a single table, to avoid a crash when analyzing > tables with expressional indexes. > > Per report from Frank van Vugt. > > regards, tom lane -------------------------------------------- MobyGames http://www.mobygames.com The world's largest and most comprehensive gaming database project
On 1/4/06, Brian Hirt <bhirt@mobygames.com> wrote: > that's strange, because I'm running 8.1.1. > what Tom is saying is that a patch was applied after 8.1.1 was launched... it will be fixed in 8.1.2 and if you are installing from sources you can apply yourself the patch to your tree source recompile and you will have your problem solved now... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Brian Hirt <bhirt@mobygames.com> writes: > Is there more information i can provide to help find the problem? How about the schema of the table in question? If the backtrace is to be trusted, it's OID 32465292. Does it crash if you ANALYZE that table manually? regards, tom lane
On Wed, Jan 04, 2006 at 12:20:28PM -0500, Jaime Casanova wrote: > On 1/4/06, Brian Hirt <bhirt@mobygames.com> wrote: > > that's strange, because I'm running 8.1.1. > > what Tom is saying is that a patch was applied after 8.1.1 was > launched... Is that what Tom is saying? The commit message he posted had a date of 2005-11-28; 8.1.1 wasn't tagged until 2005-12-08. -- Michael Fuhr
On 1/4/06, Michael Fuhr <mike@fuhr.org> wrote: > On Wed, Jan 04, 2006 at 12:20:28PM -0500, Jaime Casanova wrote: > > On 1/4/06, Brian Hirt <bhirt@mobygames.com> wrote: > > > that's strange, because I'm running 8.1.1. > > > > what Tom is saying is that a patch was applied after 8.1.1 was > > launched... > > Is that what Tom is saying? The commit message he posted had a > date of 2005-11-28; 8.1.1 wasn't tagged until 2005-12-08. > > -- > Michael Fuhr > sorry... you are right... my memory is not as good as some years ago... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Tom, I can analyze that table without problems. I don't know if it's the same table every time. I'm trying to set up a development environment where i can test this stuff better without messing up our production systems. The table does have an expresional index. basement=# select relname from pg_class where oid = 32465292; relname --------------- developer_aka (1 row) basement=# ANALYZE verbose developer_aka; ANALYZE basement=# \d developer_aka Table "public.developer_aka" Column | Type | Modifiers ------------------+------------------------ +---------------------------------------------------------------- developer_id | integer | developer_aka_id | integer | not null default nextval (('developer_id_seq'::text)::regclass) first_name | character varying(255) | last_name | character varying(255) | search_name | character varying(255) | Indexes: "developer_aka_pkey" PRIMARY KEY, btree (developer_aka_id) "developer_aka_developer_id_inde" btree (developer_id) "developer_aka_name_idx" btree (name_idx(first_name, last_name)) "developer_aka_search" btree (search_name) Foreign-key constraints: "$1" FOREIGN KEY (developer_id) REFERENCES developer(developer_id) Triggers: developer_modified AFTER INSERT OR UPDATE ON developer_aka FOR EACH ROW EXECUTE PROCEDURE add_developer_mod() function definition: CREATE FUNCTION name_idx(character varying, character varying) RETURNS character varying AS $_$ DECLARE n varchar; BEGIN select upper($1) || ' ' || upper($2) into n; return n; END; $_$ LANGUAGE plpgsql IMMUTABLE; --brian On Jan 4, 2006, at 10:25 AM, Tom Lane wrote: > Brian Hirt <bhirt@mobygames.com> writes: >> Is there more information i can provide to help find the problem? > > How about the schema of the table in question? If the backtrace is > to be trusted, it's OID 32465292. > > Does it crash if you ANALYZE that table manually? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -------------------------------------------- MobyGames http://www.mobygames.com The world's largest and most comprehensive gaming database project
Brian Hirt <bhirt@mobygames.com> writes: > I can analyze that table without problems. I don't know if it's the > same table every time. I'm trying to set up a development > environment where i can test this stuff better without messing up our > production systems. The table does have an expresional index. I've managed to reproduce this: the triggering condition is that a single autovac iteration has to VACUUM one table and then ANALYZE (no vac) another table that has an expressional index with a plpgsql function. It looks like we missed a path of control where ActiveSnapshot has to be re-set-up, but I'm not clear where. regards, tom lane
Cool, I was just writing to let you know I created an easily reproducible test case too, but I guess you don't need that now. Let me know if there is anything else I can do to help. Also, if a patch is produced, I'd love to get a copy of it. We just upgraded our production servers to 8.1.1 this morning (this issue never came up during testing) and I'l like to get this in there because it's likely to happen again. --brian On Jan 4, 2006, at 11:31 AM, Tom Lane wrote: > Brian Hirt <bhirt@mobygames.com> writes: >> I can analyze that table without problems. I don't know if it's the >> same table every time. I'm trying to set up a development >> environment where i can test this stuff better without messing up our >> production systems. The table does have an expresional index. > > I've managed to reproduce this: the triggering condition is that a > single autovac iteration has to VACUUM one table and then ANALYZE > (no vac) another table that has an expressional index with a plpgsql > function. It looks like we missed a path of control where > ActiveSnapshot has to be re-set-up, but I'm not clear where. > > regards, tom lane -------------------------------------------- MobyGames http://www.mobygames.com The world's largest and most comprehensive gaming database project
Brian Hirt <bhirt@mobygames.com> writes: > I was just writing to let you know I created an easily reproducible > test case too, but I guess you don't need that now. Does your test case agree with my description of the problem? (If you're not sure, crank up log_min_messages and watch the log to see what autovacuum does before it crashes.) The problem I'm seeing seems to be that vacuum() exits with a transaction started but no snapshot set (StartTransactionCommand will leave ActiveSnapshot set to NULL), if it's in use_own_xacts mode. autovac thinks it only has to set the snapshot once, but that's really not the case given this behavior. I'm unsure whether to fix this by adding a CopySnapshot operation right in vacuum(), or in autovacuum.c. The former is probably cleaner but it clutters vacuum.c with something only autovac really needs. Any opinions? > Also, if a patch is produced, I'd love to get a copy of it. We just > upgraded our production servers to 8.1.1 this morning (this issue > never came up during testing) and I'l like to get this in there > because it's likely to happen again. Definitely a big risk of that :-(. It'll be a one-liner patch in any case, we just have to decide where ... regards, tom lane
Brian Hirt <bhirt@mobygames.com> writes: > Also, if a patch is produced, I'd love to get a copy of it. I concluded that patching vacuum.c was much the cleanest way to do it. Here's the patch against 8.1 branch. regards, tom lane Index: src/backend/commands/vacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v retrieving revision 1.317.2.1 diff -c -r1.317.2.1 vacuum.c *** src/backend/commands/vacuum.c 22 Nov 2005 18:23:08 -0000 1.317.2.1 --- src/backend/commands/vacuum.c 4 Jan 2006 19:10:35 -0000 *************** *** 510,515 **** --- 510,523 ---- * PostgresMain(). */ StartTransactionCommand(); + /* + * Re-establish the transaction snapshot. This is wasted effort + * when we are called as a normal utility command, because the + * new transaction will be dropped immediately by PostgresMain(); + * but it's necessary if we are called from autovacuum because + * autovacuum might continue on to do an ANALYZE-only call. + */ + ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } if (vacstmt->vacuum)
On Jan 4, 2006, at 11:47 AM, Tom Lane wrote: > Brian Hirt <bhirt@mobygames.com> writes: >> I was just writing to let you know I created an easily reproducible >> test case too, but I guess you don't need that now. > > Does your test case agree with my description of the problem? > (If you're not sure, crank up log_min_messages and watch the log > to see what autovacuum does before it crashes.) > Yes, I see the same exact same thing you describe. > > Definitely a big risk of that :-(. It'll be a one-liner patch in any > case, we just have to decide where ... > I applied your patch and I no longer see the problem with my test case. Thanks for the quick responses and help! --brian -------------------------------------------- MobyGames http://www.mobygames.com The world's largest and most comprehensive gaming database project
> I concluded that patching vacuum.c was much the cleanest way to do it. > Here's the patch against 8.1 branch. Great, it looks like this patch fixes the remainder of my original problem as well ! ( http://archives.postgresql.org/pgsql-bugs/2005-11/msg00276.php ) -- Best, Frank.