pg_restore crash when there is a failure before all child process is created - Mailing list pgsql-hackers

From vignesh C
Subject pg_restore crash when there is a failure before all child process is created
Date
Msg-id CALDaNm1Luv-E3sarR+-unz-BjchquHHyfP+YC+2FS2pt_J+wxg@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

I found one crash in pg_restore, this occurs when there is a failure before all the child workers are created. Back trace for the same is given below:
#0  0x00007f9c6d31e337 in raise () from /lib64/libc.so.6
#1  0x00007f9c6d31fa28 in abort () from /lib64/libc.so.6
#2  0x00007f9c6d317156 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f9c6d317202 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000407c9e in WaitForTerminatingWorkers (pstate=0x14af7f0) at parallel.c:515
#5  0x0000000000407bf9 in ShutdownWorkersHard (pstate=0x14af7f0) at parallel.c:451
#6  0x0000000000407ae9 in archive_close_connection (code=1, arg=0x6315a0 <shutdown_info>) at parallel.c:368
#7  0x000000000041a7c7 in exit_nicely (code=1) at pg_backup_utils.c:99
#8  0x0000000000408180 in ParallelBackupStart (AH=0x14972e0) at parallel.c:967
#9  0x000000000040a3dd in RestoreArchive (AHX=0x14972e0) at pg_backup_archiver.c:661
#10 0x0000000000404125 in main (argc=6, argv=0x7ffd5146f308) at pg_restore.c:443

The problem is like:
  • The variable pstate->numWorkers is being set with the number of workers initially in ParallelBackupStart.
  • Then the workers are created one by one.
  • Before creating all the process there is a failure.
  • Then the parent terminates the child process and waits for all the child process to get terminated.
  • This function WaitForTerminatingWorkers checks if all process is terminated by calling HasEveryWorkerTerminated.
  • HasEveryWorkerTerminated will always return false because it will check for the numWorkers rather than the actual forked process count and hits the next assert "Assert(j < pstate->numWorkers);".

Attached patch has the fix for the same. Fixed it by setting pstate->numWorkers with the actual worker count when the child process is being created.

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: TRUNCATE on foreign tables
Next
From: Kohei KaiGai
Date:
Subject: Re: TRUNCATE on foreign tables