Thread: BUG #16461: Segfault in autovacuum process
The following bug has been logged on the website: Bug reference: 16461 Logged by: Michael Schanne Email address: mschanne@kns.com PostgreSQL version: 9.6.10 Operating system: Linux - Ubuntu 14.04 - 64bit Description: I encountered a crash in the autovacuum process which caused all active database sessions to be terminated. The postgres log contained the following: 2020-05-22 06:51:11.371 UTC,,,4316,,5ec4fbd0.10dc,4,,2020-05-20 09:43:44 UTC,,0,LOG,00000,"server process (PID 28964) was terminated by signal 11: Segmentation fault","Failed process was running: autovacuum: ANALYZE myschema.mytable",,,,,,,,"" 2020-05-22 06:51:11.371 UTC,,,4316,,5ec4fbd0.10dc,5,,2020-05-20 09:43:44 UTC,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,"" 2020-05-22 06:51:11.610 UTC,,,4323,,5ec4fbd1.10e3,2,,2020-05-20 09:43:45 UTC,1/0,0,WARNING,57P02,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,"" I was able to obtain a core dump and extract the following backtrace: Core was generated by `postgres: autovacuum worker process postgres '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000555c1183877b in pfree () (gdb) bt #0 0x0000555c1183877b in pfree () #1 0x0000555c1158e3e1 in ?? () #2 0x0000555c11590bb5 in ?? () #3 0x0000555c11591c8e in analyze_rel () #4 0x0000555c115ef796 in vacuum () #5 0x0000555c116a4606 in ?? () #6 0x0000555c116a4aa4 in ?? () #7 0x0000555c116a4b79 in StartAutoVacWorker () #8 0x0000555c116b285a in ?? () #9 <signal handler called> #10 0x00007f2c09eda8f3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #11 0x0000555c11495fc4 in ?? () #12 0x0000555c116b381d in PostmasterMain () #13 0x0000555c11497772 in main () This is the schema of the table being analyzed: Column | Type | Modifiers ---------------+-----------------------------+---------------------------------------------------------------- colA | integer | colB | integer | not null default nextval('myschema.mytable_colB_seq'::regclass) colC | integer | colD | json | colE | integer | colF | timestamp without time zone | colG | integer | I am currently using 9.6.10. I realize this is a few versions off of the latest 9.6.*, but I skimmed through the changelogs for later patch releases and did not see any bugs that looked like they match this. I attempted to reproduce the issue with a manual "ANALYZE" of the table in question, but it did not segfault again. Please let me know if there is any additional information I can provide for this. Thanks, Mike
On Tue, 26 May 2020 at 21:53, PG Bug reporting form <noreply@postgresql.org> wrote: > Core was generated by `postgres: autovacuum worker process postgres '. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000555c1183877b in pfree () > I attempted to reproduce the issue with a manual "ANALYZE" of the table in > question, but it did not segfault again. That does not really mean that autovacuum is to blame. Both autovacuum analyze and manual ANALYZE just take random samples of rows to include in the statistics. You may just not have hit the same rows with the manual ANALYZE as autovacuum did. I'd suggest checking that you can read all rows and columns from the table without getting a crash. So, providing you can tolerate another crash, you could do: COPY myschema.mytable TO stdout; If that crashes then it seems unlikely that what autovacuum is doing is to blame for your issue. Or if the table is large then you might want to try pg_dump. > Please let me know if there is any additional information I can provide for > this. It would be good if you could report back to mention if the COPY crashed the server again or if it worked without any error. David
Hi all,
I wanted to close the loop on this... after seeing a few other processes mysteriously segfault on the same machine, I ran a memory test (memtest86+) which found 2 bad memory addresses. I replaced the RAM and had no more issues with processes segfaulting. So, this was a hardware issue, not a bug in postgresql.
Thanks,
Mike
Michael Schanne <michael.schanne@gmail.com> writes: > I wanted to close the loop on this... after seeing a few other processes > mysteriously segfault on the same machine, I ran a memory test (memtest86+) > which found 2 bad memory addresses. I replaced the RAM and had no more > issues with processes segfaulting. So, this was a hardware issue, not a > bug in postgresql. Thanks for following up! regards, tom lane