Thread: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan
Hi,
Commit b04aeb0a053e7cf7faad89f7d47844d8ba0dc839 add assertions that we hold
some relevant lock during relation open as opening a relation with no lock
at all is unsafe;
With above commit below test case is failing and hitting the newly added
Assert().
Test case:
===========
CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);
CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;
postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)
postgres@100858=#select f1(4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Call stack:
=============
#0 0x00007f230b2d61d7 in raise () from /lib64/libc.so.6
#1 0x00007f230b2d78c8 in abort () from /lib64/libc.so.6
#2 0x0000000000a26ce5 in ExceptionalCondition (
conditionName=0xabdca8 "!(lockmode != 0 || (Mode == BootstrapProcessing) || CheckRelationLockedByMe(r, 1, 1))", errorType=0xabc529 "FailedAssertion",
fileName=0xabc668 "heapam.c", lineNumber=1146) at assert.c:54
#3 0x00000000004dacd8 in relation_open (relationId=16387, lockmode=0) at heapam.c:1144
#4 0x00000000004f94bf in index_open (relationId=16387, lockmode=0) at indexam.c:155
#5 0x0000000000701403 in ExecInitIndexScan (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at nodeIndexscan.c:994
#6 0x00000000006dde31 in ExecInitNode (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at execProcnode.c:217
#7 0x000000000070d7b8 in ExecInitModifyTable (node=0x1cf8da8, estate=0x1cec418, eflags=0) at nodeModifyTable.c:2190
#8 0x00000000006ddd39 in ExecInitNode (node=0x1cf8da8, estate=0x1cec418, eflags=0) at execProcnode.c:174
#9 0x00000000006d48a5 in InitPlan (queryDesc=0x1cdc378, eflags=0) at execMain.c:1024
#10 0x00000000006d36e9 in standard_ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:265
#11 0x00000000006d3485 in ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:147
#12 0x0000000000725880 in _SPI_pquery (queryDesc=0x1cdc378, fire_triggers=true, tcount=0) at spi.c:2469
#13 0x00000000007252d4 in _SPI_execute_plan (plan=0x1cc4748, paramLI=0x1ce23c8, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false, fire_triggers=true,
tcount=0) at spi.c:2235
#14 0x00000000007222a1 in SPI_execute_plan_with_paramlist (plan=0x1cc4748, params=0x1ce23c8, read_only=false, tcount=0) at spi.c:516
#15 0x00007f23004a5f9a in exec_stmt_execsql (estate=0x7fff0bcaa4d0, stmt=0x1cc78f0) at pl_exec.c:4114
Here DELETE statement pick the IndexScan plan, and later during execution
when it tries to open index relation, ExecInitIndexScan() assumes that index
relation should already hold require a lock. But in this case, that's not the
case - which got exposed through the newly added ASSERT() for NoLock.
When query get's execute first time build_simple_rel() -> get_relation_info(),
takes a lock on the indexes of a relation (this do happen from the
BuildCachePlan()). When we execute the same query second time, a plan is caches
and CheckCachePlan() calls the AquireExecutorLocks() take the require locks
needed for execution of the caches plan. Here it takes a lock on the rtable
list - but doesn't take any lock on the index relation.
I observed that ExecInitModifyTable() to open the index relation if indices on
the result relation. But here also - it doesn't do anything for CMD_DELETE:
/*
* If there are indices on the result relation, open them and save
* descriptors in the result relation info, so that we can add new
* index entries for the tuples we add/update. We need not do this
* for a DELETE, however, since deletion doesn't affect indexes. Also,
* inside an EvalPlanQual operation, the indexes might be open
* already, since we share the resultrel state with the original
* query.
*/
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);
Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.
Commit b04aeb0a053e7cf7faad89f7d47844d8ba0dc839 add assertions that we hold
some relevant lock during relation open as opening a relation with no lock
at all is unsafe;
With above commit below test case is failing and hitting the newly added
Assert().
Test case:
===========
CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);
CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;
postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)
postgres@100858=#select f1(4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Call stack:
=============
#0 0x00007f230b2d61d7 in raise () from /lib64/libc.so.6
#1 0x00007f230b2d78c8 in abort () from /lib64/libc.so.6
#2 0x0000000000a26ce5 in ExceptionalCondition (
conditionName=0xabdca8 "!(lockmode != 0 || (Mode == BootstrapProcessing) || CheckRelationLockedByMe(r, 1, 1))", errorType=0xabc529 "FailedAssertion",
fileName=0xabc668 "heapam.c", lineNumber=1146) at assert.c:54
#3 0x00000000004dacd8 in relation_open (relationId=16387, lockmode=0) at heapam.c:1144
#4 0x00000000004f94bf in index_open (relationId=16387, lockmode=0) at indexam.c:155
#5 0x0000000000701403 in ExecInitIndexScan (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at nodeIndexscan.c:994
#6 0x00000000006dde31 in ExecInitNode (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at execProcnode.c:217
#7 0x000000000070d7b8 in ExecInitModifyTable (node=0x1cf8da8, estate=0x1cec418, eflags=0) at nodeModifyTable.c:2190
#8 0x00000000006ddd39 in ExecInitNode (node=0x1cf8da8, estate=0x1cec418, eflags=0) at execProcnode.c:174
#9 0x00000000006d48a5 in InitPlan (queryDesc=0x1cdc378, eflags=0) at execMain.c:1024
#10 0x00000000006d36e9 in standard_ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:265
#11 0x00000000006d3485 in ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:147
#12 0x0000000000725880 in _SPI_pquery (queryDesc=0x1cdc378, fire_triggers=true, tcount=0) at spi.c:2469
#13 0x00000000007252d4 in _SPI_execute_plan (plan=0x1cc4748, paramLI=0x1ce23c8, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false, fire_triggers=true,
tcount=0) at spi.c:2235
#14 0x00000000007222a1 in SPI_execute_plan_with_paramlist (plan=0x1cc4748, params=0x1ce23c8, read_only=false, tcount=0) at spi.c:516
#15 0x00007f23004a5f9a in exec_stmt_execsql (estate=0x7fff0bcaa4d0, stmt=0x1cc78f0) at pl_exec.c:4114
Here DELETE statement pick the IndexScan plan, and later during execution
when it tries to open index relation, ExecInitIndexScan() assumes that index
relation should already hold require a lock. But in this case, that's not the
case - which got exposed through the newly added ASSERT() for NoLock.
When query get's execute first time build_simple_rel() -> get_relation_info(),
takes a lock on the indexes of a relation (this do happen from the
BuildCachePlan()). When we execute the same query second time, a plan is caches
and CheckCachePlan() calls the AquireExecutorLocks() take the require locks
needed for execution of the caches plan. Here it takes a lock on the rtable
list - but doesn't take any lock on the index relation.
I observed that ExecInitModifyTable() to open the index relation if indices on
the result relation. But here also - it doesn't do anything for CMD_DELETE:
/*
* If there are indices on the result relation, open them and save
* descriptors in the result relation info, so that we can add new
* index entries for the tuples we add/update. We need not do this
* for a DELETE, however, since deletion doesn't affect indexes. Also,
* inside an EvalPlanQual operation, the indexes might be open
* already, since we share the resultrel state with the original
* query.
*/
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);
Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.
Thoughts/Comments?
Thanks,
Rushabh Lathia
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > CREATE TABLE foo (x int primary key); > INSERT INTO foo VALUES (1), (2), (3), (4), (5); > > CREATE OR REPLACE FUNCTION f1(a int) RETURNS int > AS $$ > BEGIN > DELETE FROM foo where x = a; > return 0; > END; > $$ LANGUAGE plpgsql; > > postgres@100858=#set plan_cache_mode = force_generic_plan; > SET > postgres@100858=#select f1(4); > f1 > ---- > 0 > (1 row) > > postgres@100858=#select f1(4); > server closed the connection unexpectedly > Now, to fix this issue either we need to hold proper lock before reaching > to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock > on the scan coming from CMD_DELETE. I'd say the following comment and code in nodeIndexscan.c is outdated: /* * Open the index relation. * * If the parent table is one of the target relations of the query, then * InitPlan already opened and write-locked the index, so we can avoid * taking another lock here. Otherwise we need a normal reader's lock. */ relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid); indexstate->iss_RelationDesc = index_open(node->indexid, relistarget ? NoLock : AccessShareLock); Despite what the above comment claims, these indexes have not been opened in InitPlan since 389af951552ff2. As you mentioned, they're opened in nodeModifyTable.c for non-delete statements. I think we either need to update the above code to align it to what's now in nodeModifyTable.c, or just obtain an AccessShareLock regardless. I kinda think we should just take the lock regardless as determining if the relation is a result relation may be much more expensive than just taking another lower-level lock on the index. Anyway. I've attached a small patch to update the above fragment. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: >> Now, to fix this issue either we need to hold proper lock before reaching >> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock >> on the scan coming from CMD_DELETE. > I'd say the following comment and code in nodeIndexscan.c is outdated: I complained about this previously in https://www.postgresql.org/message-id/19465.1541636036@sss.pgh.pa.us but got nothing but crickets :-( regards, tom lane
On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
> DELETE FROM foo where x = a;
> return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
> f1
> ----
> 0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly
> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.
I'd say the following comment and code in nodeIndexscan.c is outdated:
/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);
Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.
+1.
I tried the same approach and with further testing I haven't found
any issues.
I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.
Anyway. I've attached a small patch to update the above fragment.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Rushabh Lathia
While looking code further around this, I realized that we need
similar kind of fix for bitmap as well as index only scan as well.On Fri, Nov 23, 2018 at 6:47 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com> wrote:On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
> DELETE FROM foo where x = a;
> return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
> f1
> ----
> 0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly
> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.
I'd say the following comment and code in nodeIndexscan.c is outdated:
/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);
Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.+1.I tried the same approach and with further testing I haven't foundany issues.I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.
Anyway. I've attached a small patch to update the above fragment.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--Rushabh Lathia
--
Rushabh Lathia
Attachment
On Fri, Dec 14, 2018 at 11:32 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > > While looking code further around this, I realized that we need > similar kind of fix for bitmap as well as index only scan as well. > I think if we want to move forward with this patch, we need to first investigate the deadlock hazard mentioned by Tom in other related thread [1]. I and others have also responded on that thread, see if you can respond to it. [1] - https://www.postgresql.org/message-id/3046.1542990312%40sss.pgh.pa.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com