Thread: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode

[PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode

From
Yulin PEI
Date:
Hi hackers,

I found that when running command vaccum full in stand-alone mode there will be a core dump.
The core stack looks like this:
------------------------------------
backend> vacuum full
TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 439)
./postgres(ExceptionalCondition+0xac)[0x55c664c36913]
./postgres(dsm_create+0x3c)[0x55c664a79fee]
./postgres(GetSessionDsmHandle+0xdc)[0x55c6645f8296]
./postgres(InitializeParallelDSM+0xf9)[0x55c6646c59ca]
./postgres(+0x16bdef)[0x55c664692def]
./postgres(+0x16951e)[0x55c66469051e]
./postgres(btbuild+0xcc)[0x55c6646903ec]
./postgres(index_build+0x322)[0x55c664719749]
./postgres(reindex_index+0x2ee)[0x55c66471a765]
./postgres(reindex_relation+0x1e5)[0x55c66471acca]
./postgres(finish_heap_swap+0x118)[0x55c6647c8db1]
./postgres(+0x2a0848)[0x55c6647c7848]
./postgres(cluster_rel+0x34e)[0x55c6647c727f]
./postgres(+0x3414cf)[0x55c6648684cf]
./postgres(vacuum+0x4f3)[0x55c664866591]
./postgres(ExecVacuum+0x736)[0x55c66486609b]
./postgres(standard_ProcessUtility+0x840)[0x55c664ab74fc]
./postgres(ProcessUtility+0x131)[0x55c664ab6cb5]
./postgres(+0x58ea69)[0x55c664ab5a69]
./postgres(+0x58ec95)[0x55c664ab5c95]
./postgres(PortalRun+0x307)[0x55c664ab5184]
./postgres(+0x587ef6)[0x55c664aaeef6]
./postgres(PostgresMain+0x819)[0x55c664ab3271]
./postgres(main+0x2e1)[0x55c6648f9df5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f65376192e1]
./postgres(_start+0x2a)[0x55c6645df86a]
Aborted (core dumped)
------------------------------------

I think that when there is a btree index in a table, vacuum full tries to rebuild the btree index in a parallel way.

This will launch several workers and each of then will try to apply a dynamic shared memory segment, which is not allowed in stand-alone mode. 

I think it is better not to use btree index build in parallel in stand-alone mode. My patch is attached below.





Best Regards!
Yulin PEI










Attachment

Re: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode

From
Masahiko Sawada
Date:


On Mon, Nov 30, 2020 at 5:45 PM Yulin PEI <ypeiae@connect.ust.hk> wrote:
Hi hackers,

I found that when running command vaccum full in stand-alone mode there will be a core dump.
The core stack looks like this:
------------------------------------
backend> vacuum full
TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 439)
./postgres(ExceptionalCondition+0xac)[0x55c664c36913]
./postgres(dsm_create+0x3c)[0x55c664a79fee]
./postgres(GetSessionDsmHandle+0xdc)[0x55c6645f8296]
./postgres(InitializeParallelDSM+0xf9)[0x55c6646c59ca]
./postgres(+0x16bdef)[0x55c664692def]
./postgres(+0x16951e)[0x55c66469051e]
./postgres(btbuild+0xcc)[0x55c6646903ec]
./postgres(index_build+0x322)[0x55c664719749]
./postgres(reindex_index+0x2ee)[0x55c66471a765]
./postgres(reindex_relation+0x1e5)[0x55c66471acca]
./postgres(finish_heap_swap+0x118)[0x55c6647c8db1]
./postgres(+0x2a0848)[0x55c6647c7848]
./postgres(cluster_rel+0x34e)[0x55c6647c727f]
./postgres(+0x3414cf)[0x55c6648684cf]
./postgres(vacuum+0x4f3)[0x55c664866591]
./postgres(ExecVacuum+0x736)[0x55c66486609b]
./postgres(standard_ProcessUtility+0x840)[0x55c664ab74fc]
./postgres(ProcessUtility+0x131)[0x55c664ab6cb5]
./postgres(+0x58ea69)[0x55c664ab5a69]
./postgres(+0x58ec95)[0x55c664ab5c95]
./postgres(PortalRun+0x307)[0x55c664ab5184]
./postgres(+0x587ef6)[0x55c664aaeef6]
./postgres(PostgresMain+0x819)[0x55c664ab3271]
./postgres(main+0x2e1)[0x55c6648f9df5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f65376192e1]
./postgres(_start+0x2a)[0x55c6645df86a]
Aborted (core dumped)
------------------------------------

I think that when there is a btree index in a table, vacuum full tries to rebuild the btree index in a parallel way.

This will launch several workers and each of then will try to apply a dynamic shared memory segment, which is not allowed in stand-alone mode. 

I think it is better not to use btree index build in parallel in stand-alone mode. My patch is attached below.



Good catch. This is a bug in parallel index(btree) creation. I could reproduce this assertion failure with HEAD even by using CREATE INDEX command.

- indexRelation->rd_rel->relam == BTREE_AM_OID)
+ indexRelation->rd_rel->relam == BTREE_AM_OID && IsPostmasterEnvironment && !IsBackendStandAlone())

+#define IsBackendStandAlone() (!IsBootstrapProcessingMode() && !IsPostmasterEnvironment)
 /*
  * Auxiliary-process type identifiers.  These used to be in bootstrap.h
  * but it seems saner to have them here, with the ProcessingMode stuff.

I think we can use IsUnderPostmaster instead. If it's false we should not enable parallel index creation.

Regards,

--
Masahiko Sawada
Yes, I agree because (IsNormalProcessingMode() ) means that current process is not in bootstrap mode and postmaster process will not build index.
So my new modified patch is attached.



发件人: Masahiko Sawada <sawada.mshk@gmail.com>
发送时间: 2020年11月30日 17:27
收件人: Yulin PEI <ypeiae@connect.ust.hk>
抄送: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
主题: Re: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode
 


On Mon, Nov 30, 2020 at 5:45 PM Yulin PEI <ypeiae@connect.ust.hk> wrote:
Hi hackers,

I found that when running command vaccum full in stand-alone mode there will be a core dump.
The core stack looks like this:
------------------------------------
backend> vacuum full
TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 439)
./postgres(ExceptionalCondition+0xac)[0x55c664c36913]
./postgres(dsm_create+0x3c)[0x55c664a79fee]
./postgres(GetSessionDsmHandle+0xdc)[0x55c6645f8296]
./postgres(InitializeParallelDSM+0xf9)[0x55c6646c59ca]
./postgres(+0x16bdef)[0x55c664692def]
./postgres(+0x16951e)[0x55c66469051e]
./postgres(btbuild+0xcc)[0x55c6646903ec]
./postgres(index_build+0x322)[0x55c664719749]
./postgres(reindex_index+0x2ee)[0x55c66471a765]
./postgres(reindex_relation+0x1e5)[0x55c66471acca]
./postgres(finish_heap_swap+0x118)[0x55c6647c8db1]
./postgres(+0x2a0848)[0x55c6647c7848]
./postgres(cluster_rel+0x34e)[0x55c6647c727f]
./postgres(+0x3414cf)[0x55c6648684cf]
./postgres(vacuum+0x4f3)[0x55c664866591]
./postgres(ExecVacuum+0x736)[0x55c66486609b]
./postgres(standard_ProcessUtility+0x840)[0x55c664ab74fc]
./postgres(ProcessUtility+0x131)[0x55c664ab6cb5]
./postgres(+0x58ea69)[0x55c664ab5a69]
./postgres(+0x58ec95)[0x55c664ab5c95]
./postgres(PortalRun+0x307)[0x55c664ab5184]
./postgres(+0x587ef6)[0x55c664aaeef6]
./postgres(PostgresMain+0x819)[0x55c664ab3271]
./postgres(main+0x2e1)[0x55c6648f9df5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f65376192e1]
./postgres(_start+0x2a)[0x55c6645df86a]
Aborted (core dumped)
------------------------------------

I think that when there is a btree index in a table, vacuum full tries to rebuild the btree index in a parallel way.

This will launch several workers and each of then will try to apply a dynamic shared memory segment, which is not allowed in stand-alone mode. 

I think it is better not to use btree index build in parallel in stand-alone mode. My patch is attached below.



Good catch. This is a bug in parallel index(btree) creation. I could reproduce this assertion failure with HEAD even by using CREATE INDEX command.

- indexRelation->rd_rel->relam == BTREE_AM_OID)
+ indexRelation->rd_rel->relam == BTREE_AM_OID && IsPostmasterEnvironment && !IsBackendStandAlone())

+#define IsBackendStandAlone() (!IsBootstrapProcessingMode() && !IsPostmasterEnvironment)
 /*
  * Auxiliary-process type identifiers.  These used to be in bootstrap.h
  * but it seems saner to have them here, with the ProcessingMode stuff.

I think we can use IsUnderPostmaster instead. If it's false we should not enable parallel index creation.

Regards,

--
Masahiko Sawada
Attachment
Yulin PEI <ypeiae@connect.ust.hk> writes:
> Yes, I agree because (IsNormalProcessingMode() ) means that current process is not in bootstrap mode and postmaster
processwill not build index. 
> So my new modified patch is attached.

This is a good catch, but the proposed fix still seems pretty random
and unlike how it's done elsewhere.  It seems to me that since
index_build() is relying on plan_create_index_workers() to assess
parallel safety, that's where to check IsUnderPostmaster.  Moreover,
the existing code in compute_parallel_vacuum_workers (which gets
this right) associates the IsUnderPostmaster check with the initial
check on max_parallel_maintenance_workers.  So I think that the
right fix is to adopt the compute_parallel_vacuum_workers coding
in plan_create_index_workers, and thereby create a model for future
uses of max_parallel_maintenance_workers to follow.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 247f7d4625..1a94b58f8b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6375,8 +6375,11 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
     double        reltuples;
     double        allvisfrac;

-    /* Return immediately when parallelism disabled */
-    if (max_parallel_maintenance_workers == 0)
+    /*
+     * We don't allow performing parallel operation in standalone backend or
+     * when parallelism is disabled.
+     */
+    if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
         return 0;

     /* Set up largely-dummy planner state */

On Tue, Dec 1, 2020 at 3:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Yulin PEI <ypeiae@connect.ust.hk> writes:
> > Yes, I agree because (IsNormalProcessingMode() ) means that current process is not in bootstrap mode and postmaster
processwill not build index.
 
> > So my new modified patch is attached.
>
> This is a good catch, but the proposed fix still seems pretty random
> and unlike how it's done elsewhere.  It seems to me that since
> index_build() is relying on plan_create_index_workers() to assess
> parallel safety, that's where to check IsUnderPostmaster.  Moreover,
> the existing code in compute_parallel_vacuum_workers (which gets
> this right) associates the IsUnderPostmaster check with the initial
> check on max_parallel_maintenance_workers.  So I think that the
> right fix is to adopt the compute_parallel_vacuum_workers coding
> in plan_create_index_workers, and thereby create a model for future
> uses of max_parallel_maintenance_workers to follow.

+1

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



I think you are right after reading code in compute_parallel_vacuum_workers() :)


发件人: Tom Lane <tgl@sss.pgh.pa.us>
发送时间: 2020年12月1日 2:54
收件人: Yulin PEI <ypeiae@connect.ust.hk>
抄送: Masahiko Sawada <sawada.mshk@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
主题: Re: 回复: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode
 
Yulin PEI <ypeiae@connect.ust.hk> writes:
> Yes, I agree because (IsNormalProcessingMode() ) means that current process is not in bootstrap mode and postmaster process will not build index.
> So my new modified patch is attached.

This is a good catch, but the proposed fix still seems pretty random
and unlike how it's done elsewhere.  It seems to me that since
index_build() is relying on plan_create_index_workers() to assess
parallel safety, that's where to check IsUnderPostmaster.  Moreover,
the existing code in compute_parallel_vacuum_workers (which gets
this right) associates the IsUnderPostmaster check with the initial
check on max_parallel_maintenance_workers.  So I think that the
right fix is to adopt the compute_parallel_vacuum_workers coding
in plan_create_index_workers, and thereby create a model for future
uses of max_parallel_maintenance_workers to follow.

                        regards, tom lane