Thread: BUG #16461: Segfault in autovacuum process

BUG #16461: Segfault in autovacuum process

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16461
Logged by:          Michael Schanne
Email address:      mschanne@kns.com
PostgreSQL version: 9.6.10
Operating system:   Linux - Ubuntu 14.04 - 64bit
Description:

I encountered a crash in the autovacuum process which caused all active
database sessions to be terminated.  The postgres log contained the
following:

2020-05-22 06:51:11.371 UTC,,,4316,,5ec4fbd0.10dc,4,,2020-05-20 09:43:44
UTC,,0,LOG,00000,"server process (PID 28964) was terminated by signal 11:
Segmentation fault","Failed process was running: autovacuum: ANALYZE
myschema.mytable",,,,,,,,""
2020-05-22 06:51:11.371 UTC,,,4316,,5ec4fbd0.10dc,5,,2020-05-20 09:43:44
UTC,,0,LOG,00000,"terminating any other active server
processes",,,,,,,,,""
2020-05-22 06:51:11.610 UTC,,,4323,,5ec4fbd1.10e3,2,,2020-05-20 09:43:45
UTC,1/0,0,WARNING,57P02,"terminating connection because of crash of another
server process","The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.","In a moment you should be
able to reconnect to the database and repeat your command.",,,,,,,""

I was able to obtain a core dump and extract the following backtrace:

Core was generated by `postgres: autovacuum worker process   postgres  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000555c1183877b in pfree ()
(gdb) bt
#0  0x0000555c1183877b in pfree ()
#1  0x0000555c1158e3e1 in ?? ()
#2  0x0000555c11590bb5 in ?? ()
#3  0x0000555c11591c8e in analyze_rel ()
#4  0x0000555c115ef796 in vacuum ()
#5  0x0000555c116a4606 in ?? ()
#6  0x0000555c116a4aa4 in ?? ()
#7  0x0000555c116a4b79 in StartAutoVacWorker ()
#8  0x0000555c116b285a in ?? ()
#9  <signal handler called>
#10 0x00007f2c09eda8f3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000555c11495fc4 in ?? ()
#12 0x0000555c116b381d in PostmasterMain ()
#13 0x0000555c11497772 in main ()

This is the schema of the table being analyzed:
    Column     |            Type             |
Modifiers
---------------+-----------------------------+----------------------------------------------------------------
 colA          | integer                     |
 colB          | integer                     | not null default
nextval('myschema.mytable_colB_seq'::regclass)
 colC          | integer                     |
 colD          | json                        |
 colE          | integer                     |
 colF          | timestamp without time zone |
 colG          | integer                     |

I am currently using 9.6.10.  I realize this is a few versions off of the
latest 9.6.*, but I skimmed through the changelogs for later patch releases
and did not see any bugs that looked like they match this.

I attempted to reproduce the issue with a manual "ANALYZE" of the table in
question, but it did not segfault again.

Please let me know if there is any additional information I can provide for
this.

Thanks,
Mike


Re: BUG #16461: Segfault in autovacuum process

From
David Rowley
Date:
On Tue, 26 May 2020 at 21:53, PG Bug reporting form
<noreply@postgresql.org> wrote:
> Core was generated by `postgres: autovacuum worker process   postgres  '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000555c1183877b in pfree ()


> I attempted to reproduce the issue with a manual "ANALYZE" of the table in
> question, but it did not segfault again.

That does not really mean that autovacuum is to blame.  Both
autovacuum analyze and manual ANALYZE just take random samples of rows
to include in the statistics. You may just not have hit the same rows
with the manual ANALYZE as autovacuum did.

I'd suggest checking that you can read all rows and columns from the
table without getting a crash.

So, providing you can tolerate another crash, you could do:

COPY myschema.mytable TO stdout;

If that crashes then it seems unlikely that what autovacuum is doing
is to blame for your issue. Or if the table is large then you might
want to try pg_dump.

> Please let me know if there is any additional information I can provide for
> this.

It would be good if you could report back to mention if the COPY
crashed the server again or if it worked without any error.

David



RE: BUG #16461: Segfault in autovacuum process

From
Michael Schanne
Date:
Hi all,

I wanted to close the loop on this... after seeing a few other processes mysteriously segfault on the same machine, I ran a memory test (memtest86+) which found 2 bad memory addresses.  I replaced the RAM and had no more issues with processes segfaulting.  So, this was a hardware issue, not a bug in postgresql.

Thanks,
Mike

Re: BUG #16461: Segfault in autovacuum process

From
Tom Lane
Date:
Michael Schanne <michael.schanne@gmail.com> writes:
> I wanted to close the loop on this... after seeing a few other processes
> mysteriously segfault on the same machine, I ran a memory test (memtest86+)
> which found 2 bad memory addresses.  I replaced the RAM and had no more
> issues with processes segfaulting.  So, this was a hardware issue, not a
> bug in postgresql.

Thanks for following up!

            regards, tom lane