Thread: BUG #18314: PARALLEL UNSAFE function does not prevent parallel index build

BUG #18314: PARALLEL UNSAFE function does not prevent parallel index build

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18314
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:

When planner chooses to perform parallel index build, it doesn't
check whether functions used in an index expression or predicate are
parallel-safe. For example:
CREATE FUNCTION f() RETURNS int IMMUTABLE PARALLEL UNSAFE
AS $$
BEGIN    
    RETURN 0;
EXCEPTION WHEN OTHERS THEN
    RETURN 1;
END$$ LANGUAGE plpgsql;

CREATE TABLE t(i int);
INSERT INTO t SELECT g FROM generate_series(1, 300000) g;
CREATE INDEX ON t((i + f()));

results in:
ERROR:  cannot start subtransactions during a parallel operation
CONTEXT:  PL/pgSQL function f() line 2 during statement block entry
parallel worker

Although with 200000 records, the index created with no error.

Reproduced on REL_12_STABLE .. master.


Thanks for reporting this issue.

I debug the process of creating index, and in plan_create_index_workers() func, which use the planner to
decide if use parallel building index.

But the index expression passed to is_parallel_safe() has already simplified by RelationGetIndexExpressions() func.
In this case, FunExpr node will be simplified to Const node. So the check in is_parallel_safe() will return true.

The goal of plan_create_index_workers() is to decide whether it's safe to build index parallelly. I think we just check the raw index
expression no need to simplify the index expression.


PG Bug reporting form <noreply@postgresql.org> 于2024年1月29日周一 19:02写道:
The following bug has been logged on the website:

Bug reference:      18314
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:       

When planner chooses to perform parallel index build, it doesn't
check whether functions used in an index expression or predicate are
parallel-safe. For example:
CREATE FUNCTION f() RETURNS int IMMUTABLE PARALLEL UNSAFE
AS $$
BEGIN   
    RETURN 0;
EXCEPTION WHEN OTHERS THEN
    RETURN 1;
END$$ LANGUAGE plpgsql;

CREATE TABLE t(i int);
INSERT INTO t SELECT g FROM generate_series(1, 300000) g;
CREATE INDEX ON t((i + f()));

results in:
ERROR:  cannot start subtransactions during a parallel operation
CONTEXT:  PL/pgSQL function f() line 2 during statement block entry
parallel worker

Although with 200000 records, the index created with no error.

Reproduced on REL_12_STABLE .. master.



--
Tender Wang
OpenPie:  https://en.openpie.com/
In pg document, I found this:

Also, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction.

So it will report error for this case even though we replace PARALLEL UNSAFE with PARALLEL SAFE.

I think if PARALLEL UNSAFE is specified by users,  PG internal should not choose to build index parallel, even if the function is
too simply that can be transform to Const node.

Attached patch will fix Alexander reported issue. I choose pass the raw index expression to is_parallel_safe(). 
The comments of RelationGetIndexExpressions() doesn't say it will return optimized expression. So I add a bool argument to RelationGetIndexExpressions().
I don't decide to write a new function like RelationGetIndexRawExpression. I think the RelationGetIndexExpressions() is enough after adding a bool argument to indicate
whether caller needing a raw index expression.

But if users specify PARALLEL SAFE, like I said before, it also reports error. But I think it is another thing. Maybe it is reasonable?



Tender Wang <tndrwang@gmail.com> 于2024年1月29日周一 22:36写道:
Thanks for reporting this issue.

I debug the process of creating index, and in plan_create_index_workers() func, which use the planner to
decide if use parallel building index.

But the index expression passed to is_parallel_safe() has already simplified by RelationGetIndexExpressions() func.
In this case, FunExpr node will be simplified to Const node. So the check in is_parallel_safe() will return true.

The goal of plan_create_index_workers() is to decide whether it's safe to build index parallelly. I think we just check the raw index
expression no need to simplify the index expression.


PG Bug reporting form <noreply@postgresql.org> 于2024年1月29日周一 19:02写道:
The following bug has been logged on the website:

Bug reference:      18314
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:       

When planner chooses to perform parallel index build, it doesn't
check whether functions used in an index expression or predicate are
parallel-safe. For example:
CREATE FUNCTION f() RETURNS int IMMUTABLE PARALLEL UNSAFE
AS $$
BEGIN   
    RETURN 0;
EXCEPTION WHEN OTHERS THEN
    RETURN 1;
END$$ LANGUAGE plpgsql;

CREATE TABLE t(i int);
INSERT INTO t SELECT g FROM generate_series(1, 300000) g;
CREATE INDEX ON t((i + f()));

results in:
ERROR:  cannot start subtransactions during a parallel operation
CONTEXT:  PL/pgSQL function f() line 2 during statement block entry
parallel worker

Although with 200000 records, the index created with no error.

Reproduced on REL_12_STABLE .. master.



--
Tender Wang
OpenPie:  https://en.openpie.com/


--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
On Wed, Feb 7, 2024 at 7:54 PM Tender Wang <tndrwang@gmail.com> wrote:
>
> In pg document, I found this:
>
> Also, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without
affectingthe outer transaction. 
>
> So it will report error for this case even though we replace PARALLEL UNSAFE with PARALLEL SAFE.
>
> I think if PARALLEL UNSAFE is specified by users,  PG internal should not choose to build index parallel, even if the
functionis 
> too simply that can be transform to Const node.
>
> Attached patch will fix Alexander reported issue. I choose pass the raw index expression to is_parallel_safe().
> The comments of RelationGetIndexExpressions() doesn't say it will return optimized expression. So I add a bool
argumentto RelationGetIndexExpressions(). 
> I don't decide to write a new function like RelationGetIndexRawExpression. I think the RelationGetIndexExpressions()
isenough after adding a bool argument to indicate 
> whether caller needing a raw index expression.
>
> But if users specify PARALLEL SAFE, like I said before, it also reports error. But I think it is another thing. Maybe
itis reasonable? 
>

/home/jian/Downloads/patches/0001-Fix-parallel-index-build-failed.patch:273:
trailing whitespace.
BEGIN
warning: 1 line adds whitespace errors.

+CREATE TABLE btree_para_bld(i int);
+INSERT INTO btree_para_bld SELECT g FROM generate_series(1, 300000) g;
+CREATE INDEX ON btree_para_bld((i + para_unsafe_f()));
you don't need an insert of 300000 rows to test it.

you can just
CREATE TABLE btree_para_bld(i int);
ALTER TABLE btree_para_bld SET (parallel_workers = 4);
SET max_parallel_maintenance_workers TO 4;
CREATE INDEX ON btree_para_bld(i) WHERE i > para_unsafe_f();
reset max_parallel_maintenance_workers;

demo: https://dbfiddle.uk/HpLKN6zj



On Wed, Feb 7, 2024 at 11:43 PM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 7:54 PM Tender Wang <tndrwang@gmail.com> wrote:
> >
> > In pg document, I found this:
> >
> > Also, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without
affectingthe outer transaction. 
> >
> > So it will report error for this case even though we replace PARALLEL UNSAFE with PARALLEL SAFE.
> >
> > I think if PARALLEL UNSAFE is specified by users,  PG internal should not choose to build index parallel, even if
thefunction is 
> > too simply that can be transform to Const node.
> >
> > Attached patch will fix Alexander reported issue. I choose pass the raw index expression to is_parallel_safe().
> > The comments of RelationGetIndexExpressions() doesn't say it will return optimized expression. So I add a bool
argumentto RelationGetIndexExpressions(). 
> > I don't decide to write a new function like RelationGetIndexRawExpression. I think the
RelationGetIndexExpressions()is enough after adding a bool argument to indicate 
> > whether caller needing a raw index expression.
> >
> > But if users specify PARALLEL SAFE, like I said before, it also reports error. But I think it is another thing.
Maybeit is reasonable? 
> >

based on index_build's ereport(DEBUG1...) output information
you patch disabled parallel  index build for parallel safe function.

after apply your patch:
set log_min_messages to debug1;
begin;
drop table if exists btree_para_bld;
CREATE TABLE btree_para_bld(i int);
ALTER TABLE btree_para_bld SET (parallel_workers = 4);
SET local max_parallel_maintenance_workers TO 4;
CREATE or replace FUNCTION para_unsafe_f1() RETURNS int IMMUTABLE
PARALLEL UNSAFE
AS $$
BEGIN
RETURN 0;
END$$ LANGUAGE plpgsql;

CREATE or replace FUNCTION para_safe() RETURNS int IMMUTABLE PARALLEL UNSAFE
AS $$
BEGIN
RETURN 0;
END$$ LANGUAGE plpgsql;
CREATE INDEX ON btree_para_bld((para_unsafe_f1()));
CREATE INDEX ON btree_para_bld((para_safe()));
commit;

debug information:
2024-02-08 14:11:26.664 CST [33245] DEBUG:  building index
"btree_para_bld_para_unsafe_f1_idx" on table "btree_para_bld" serially
2024-02-08 14:11:26.664 CST [33245] DEBUG:  index
"btree_para_bld_para_unsafe_f1_idx" can safely use deduplication
CREATE INDEX
2024-02-08 14:11:26.666 CST [33245] DEBUG:  building index
"btree_para_bld_para_safe_idx" on table "btree_para_bld" serially
2024-02-08 14:11:26.666 CST [33245] DEBUG:  index
"btree_para_bld_para_safe_idx" can safely use deduplication
CREATE INDEX

the problem entry point is_parallel_safe, i think.



On Thu, Feb 8, 2024 at 2:17 PM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 11:43 PM jian he <jian.universality@gmail.com> wrote:
> >
> > On Wed, Feb 7, 2024 at 7:54 PM Tender Wang <tndrwang@gmail.com> wrote:
> > >
> > > In pg document, I found this:
> > >
> > > Also, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without
affectingthe outer transaction. 
> > >
> > > So it will report error for this case even though we replace PARALLEL UNSAFE with PARALLEL SAFE.
> > >
> > > I think if PARALLEL UNSAFE is specified by users,  PG internal should not choose to build index parallel, even if
thefunction is 
> > > too simply that can be transform to Const node.
> > >
> > > Attached patch will fix Alexander reported issue. I choose pass the raw index expression to is_parallel_safe().
> > > The comments of RelationGetIndexExpressions() doesn't say it will return optimized expression. So I add a bool
argumentto RelationGetIndexExpressions(). 
> > > I don't decide to write a new function like RelationGetIndexRawExpression. I think the
RelationGetIndexExpressions()is enough after adding a bool argument to indicate 
> > > whether caller needing a raw index expression.
> > >
> > > But if users specify PARALLEL SAFE, like I said before, it also reports error. But I think it is another thing.
Maybeit is reasonable? 
> > >


I think it's reasonable.
It's the function owner's responsibility to specify PARALLEL safety correctly.


> after apply your patch:
> set log_min_messages to debug1;
> begin;
> drop table if exists btree_para_bld;
> CREATE TABLE btree_para_bld(i int);
> ALTER TABLE btree_para_bld SET (parallel_workers = 4);
> SET local max_parallel_maintenance_workers TO 4;
> CREATE or replace FUNCTION para_unsafe_f1() RETURNS int IMMUTABLE
> PARALLEL UNSAFE
> AS $$
> BEGIN
> RETURN 0;
> END$$ LANGUAGE plpgsql;
>
> CREATE or replace FUNCTION para_safe() RETURNS int IMMUTABLE PARALLEL UNSAFE
> AS $$
> BEGIN
> RETURN 0;
> END$$ LANGUAGE plpgsql;
> CREATE INDEX ON btree_para_bld((para_unsafe_f1()));
> CREATE INDEX ON btree_para_bld((para_safe()));
> commit;
>
> debug information:
> 2024-02-08 14:11:26.664 CST [33245] DEBUG:  building index
> "btree_para_bld_para_unsafe_f1_idx" on table "btree_para_bld" serially
> 2024-02-08 14:11:26.664 CST [33245] DEBUG:  index
> "btree_para_bld_para_unsafe_f1_idx" can safely use deduplication
> CREATE INDEX
> 2024-02-08 14:11:26.666 CST [33245] DEBUG:  building index
> "btree_para_bld_para_safe_idx" on table "btree_para_bld" serially
> 2024-02-08 14:11:26.666 CST [33245] DEBUG:  index
> "btree_para_bld_para_safe_idx" can safely use deduplication
> CREATE INDEX
>
> the problem entry point is_parallel_safe, i think.

I am sorry, this part was wrong.
Your patch works perfectly!

if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
!is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, false)) ||
!is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, false)))
{
parallel_workers = 0;
goto done;
}
add more comments in above code would be perfect.
As I mentioned before, your test case can be simplified.



Hi.
one minor  issue.

-RelationGetIndexPredicate(Relation relation)
+RelationGetIndexPredicate(Relation relation, bool need_modify)
need_modify name may not be that good IMHO, but so far I can only come
up with are "raw_expr", "is_raw".





jian he <jian.universality@gmail.com> 于2024年2月27日周二 18:17写道:
Hi.
one minor  issue.

-RelationGetIndexPredicate(Relation relation)
+RelationGetIndexPredicate(Relation relation, bool need_modify)
need_modify name may not be that good IMHO, but so far I can only come
up with are "raw_expr", "is_raw".

Hi jian he,
  I'm sorry for the delayed response.  Thanks for your reviewing, and the new v2 patch include your advices.

As no others reply to this patch. I would add it to the 2024-03 commitfest.
--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
On Thu, Feb 29, 2024 at 3:08 PM Tender Wang <tndrwang@gmail.com> wrote:
>
> Hi jian he,
>   I'm sorry for the delayed response.  Thanks for your reviewing, and the new v2 patch include your advices.
>
> As no others reply to this patch. I would add it to the 2024-03 commitfest.
> --


`
if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
!is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, true)) ||
!is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, true)))
{
parallel_workers = 0;
goto done;
}
`
I refactored the comments for the code above.
Feel free to change it.

I also toggled {RelationGetIndexExpressions,
RelationGetIndexPredicate}'s second argument
from false to true where it's false,
true to false where it's true.

I guess it's more intuitive to do
`
if (get_raw_expr)
  return result;
`
since at that time, the `result` is a raw expression.

Overall it looks good.

Attachment
On Thu, Feb 29, 2024 at 05:17:41PM +0800, jian he wrote:
> if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
> !is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, true)) ||
> !is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, true)))
> {
> parallel_workers = 0;
> goto done;
> }

Interesting bug.

> Overall it looks good.

I disagree with this idea.  Your patch is creating a shortcut in the
relcache code to retrieve data about index predicates and expressions
that make it bypass the flattening that would be done in
eval_const_expressions().  Giving the option to bypass that in the
relcache is a very bad idea, because flattening of the expressions for
the planner is not an option: it must happen.  So, I fear that your
patch could cause bugs if we introduce new code that calls
RelationGetIndexExpressions() with an incorrect option.  Not
documenting the new option is not acceptable either, with only one
comment added at the top of plan_create_index_workers().

On top of that, your approach has two bugs:
RelationGetIndexExpressions and RelationGetIndexPredicate would return
the flattened information if available directly from the relcache, so
even if you pass get_raw_expr=true you may finish with flattened
expressions and/or predicates, facing the same issues.

I think that the correct way to do that would be to get the
information from the syscache when calculating the number of parallel
workers to use for the parallel index builds, with SysCacheGetAttr(),
for example.  That would ensure that the expressions are not
flattened, letting the relcache be.

By the way, this reminds me of 7cce159349cc and its thread for a bug
fix that I did for REINDEX CONCURRENTLY a couple of years ago.  The
mistake is mostly the same:
https://www.postgresql.org/message-id/CA%2Bu7OA5Hp0ra235F3czPom_FyAd-3%2BXwSJmX95r1%2BsRPOJc9VQ%40mail.gmail.com
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年3月5日周二 11:45写道:
On Thu, Feb 29, 2024 at 05:17:41PM +0800, jian he wrote:
> if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
> !is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, true)) ||
> !is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, true)))
> {
> parallel_workers = 0;
> goto done;
> }

Interesting bug.

> Overall it looks good.

I disagree with this idea.  Your patch is creating a shortcut in the
relcache code to retrieve data about index predicates and expressions
that make it bypass the flattening that would be done in
eval_const_expressions().  Giving the option to bypass that in the
relcache is a very bad idea, because flattening of the expressions for
the planner is not an option: it must happen.  So, I fear that your
patch could cause bugs if we introduce new code that calls
RelationGetIndexExpressions() with an incorrect option. 
  Not
documenting the new option is not acceptable either, with only one
comment added at the top of plan_create_index_workers().
 
Agreed! It is not friendly to others who maybe use these functions.

On top of that, your approach has two bugs:
RelationGetIndexExpressions and RelationGetIndexPredicate would return
the flattened information if available directly from the relcache, so
even if you pass get_raw_expr=true you may finish with flattened
expressions and/or predicates, facing the same issues.

Is that possible? In this case, it is CREATE INDEX, the relation in relcache should not have this index definition?
I try to create index on create table phase, but I failed. It looks like index on expression only be created using CREATE INDEX statement.
Is there something I need to learn?
 
I think that the correct way to do that would be to get the
information from the syscache when calculating the number of parallel
workers to use for the parallel index builds, with SysCacheGetAttr(),
for example.  That would ensure that the expressions are not
flattened, letting the relcache be.

By the way, this reminds me of 7cce159349cc and its thread for a bug
fix that I did for REINDEX CONCURRENTLY a couple of years ago.  The
mistake is mostly the same:
https://www.postgresql.org/message-id/CA%2Bu7OA5Hp0ra235F3czPom_FyAd-3%2BXwSJmX95r1%2BsRPOJc9VQ%40mail.gmail.com
 
In aboved link, I found that 
"Even without that, changing the APIs of the functions you've touched
here seems pretty invasive." (Tom said that). It sounds sounds reasonable.

 --
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


Michael Paquier <michael@paquier.xyz> 于2024年3月5日周二 11:45写道:
On Thu, Feb 29, 2024 at 05:17:41PM +0800, jian he wrote:
> if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
> !is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, true)) ||
> !is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, true)))
> {
> parallel_workers = 0;
> goto done;
> }

Interesting bug.

> Overall it looks good.

I disagree with this idea.  Your patch is creating a shortcut in the
relcache code to retrieve data about index predicates and expressions
that make it bypass the flattening that would be done in
eval_const_expressions().  Giving the option to bypass that in the
relcache is a very bad idea, because flattening of the expressions for
the planner is not an option: it must happen.  So, I fear that your
patch could cause bugs if we introduce new code that calls
RelationGetIndexExpressions() with an incorrect option.  Not
documenting the new option is not acceptable either, with only one
comment added at the top of plan_create_index_workers().

On top of that, your approach has two bugs:
RelationGetIndexExpressions and RelationGetIndexPredicate would return
the flattened information if available directly from the relcache, so
even if you pass get_raw_expr=true you may finish with flattened
expressions and/or predicates, facing the same issues.

Yeah,  I have carefully considered what you said. It is a bad idea to change the RelationGetIndexExpressions().
Even though it can fix this thread issue, but if new code calls RelationGetIndexExpressions() and
want to get the raw index expression, so it pass get_raw_expr=tree, it maybe get optimized index expression,  that could cause some bugs.
 
I think that the correct way to do that would be to get the
information from the syscache when calculating the number of parallel
workers to use for the parallel index builds, with SysCacheGetAttr(),
for example.  That would ensure that the expressions are not
flattened, letting the relcache be.
 
The fix  here is to pass the raw index expression to is_parallel_safe().  Getting the information from the syscache sournds reasonable.

By the way, this reminds me of 7cce159349cc and its thread for a bug
fix that I did for REINDEX CONCURRENTLY a couple of years ago.  The
mistake is mostly the same:
https://www.postgresql.org/message-id/CA%2Bu7OA5Hp0ra235F3czPom_FyAd-3%2BXwSJmX95r1%2BsRPOJc9VQ%40mail.gmail.com
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


Michael Paquier <michael@paquier.xyz> 于2024年3月5日周二 11:45写道:
On Thu, Feb 29, 2024 at 05:17:41PM +0800, jian he wrote:
> if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
> !is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index, true)) ||
> !is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index, true)))
> {
> parallel_workers = 0;
> goto done;
> }

Interesting bug.

> Overall it looks good.

I disagree with this idea.  Your patch is creating a shortcut in the
relcache code to retrieve data about index predicates and expressions
that make it bypass the flattening that would be done in
eval_const_expressions().  Giving the option to bypass that in the
relcache is a very bad idea, because flattening of the expressions for
the planner is not an option: it must happen.  So, I fear that your
patch could cause bugs if we introduce new code that calls
RelationGetIndexExpressions() with an incorrect option.  Not
documenting the new option is not acceptable either, with only one
comment added at the top of plan_create_index_workers().

On top of that, your approach has two bugs:
RelationGetIndexExpressions and RelationGetIndexPredicate would return
the flattened information if available directly from the relcache, so
even if you pass get_raw_expr=true you may finish with flattened
expressions and/or predicates, facing the same issues.

I think that the correct way to do that would be to get the
information from the syscache when calculating the number of parallel
workers to use for the parallel index builds, with SysCacheGetAttr(),
for example.  That would ensure that the expressions are not
flattened, letting the relcache be.
 
I refactor the v2 version patch according to your advice.  
Any thoughts?


By the way, this reminds me of 7cce159349cc and its thread for a bug
fix that I did for REINDEX CONCURRENTLY a couple of years ago.  The
mistake is mostly the same:
https://www.postgresql.org/message-id/CA%2Bu7OA5Hp0ra235F3czPom_FyAd-3%2BXwSJmX95r1%2BsRPOJc9VQ%40mail.gmail.com
--
Michael

--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
On Tue, Mar 5, 2024 at 7:49 PM Tender Wang <tndrwang@gmail.com> wrote:
>
>>
>> On top of that, your approach has two bugs:
>> RelationGetIndexExpressions and RelationGetIndexPredicate would return
>> the flattened information if available directly from the relcache, so
>> even if you pass get_raw_expr=true you may finish with flattened
>> expressions and/or predicates, facing the same issues.
>>
>> I think that the correct way to do that would be to get the
>> information from the syscache when calculating the number of parallel
>> workers to use for the parallel index builds, with SysCacheGetAttr(),
>> for example.  That would ensure that the expressions are not
>> flattened, letting the relcache be.
>
>
> I refactor the v2 version patch according to your advice.
> Any thoughts?

in RelationGetIndexClause to, I think you can use the following to
save a SearchSysCache1 cycle?
if (relation->rd_indextuple == NULL ||
heap_attisnull(relation->rd_indextuple, Anum_pg_index_indexprs, NULL))
return NIL;

or
if (relation->rd_indextuple == NULL ||
heap_attisnull(relation->rd_indextuple, Anum_pg_index_indpred, NULL))
return NIL;

main question would be why not two functions,
like RelationGetIndexRawExpr(Relation relation),
RelationGetIndexRawPred(Relation relation)



On Tue, Mar 05, 2024 at 09:22:08PM +0800, jian he wrote:
> in RelationGetIndexClause to, I think you can use the following to
> save a SearchSysCache1 cycle?
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indexprs, NULL))
> return NIL;
>
> or
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indpred, NULL))
> return NIL;

Don't think so.  The point is to not rely on the relcache at all to
retrieve this information.

> main question would be why not two functions,
> like RelationGetIndexRawExpr(Relation relation),
> RelationGetIndexRawPred(Relation relation)

This comes down to if it is clean to have references to the catalog
pg_index in the planner, which is not the case yet so my take is that
two functions is much cleaner even if both return a List.

Anyway, why do you insist in putting the new functions in relcache.c?
I would suggest to move that to lsyscache.c instead, close to
get_index_column_opclass where there are routines for the syscache of
pg_index.  It would be possible to reuse that in the reindex code, for
example.

The patch should add a comment in in plan_create_index_workers()
explaining why we care about raw expressions and indexes rather than
the relcache information.
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年3月6日周三 06:45写道:
On Tue, Mar 05, 2024 at 09:22:08PM +0800, jian he wrote:
> in RelationGetIndexClause to, I think you can use the following to
> save a SearchSysCache1 cycle?
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indexprs, NULL))
> return NIL;
>
> or
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indpred, NULL))
> return NIL;

Don't think so.  The point is to not rely on the relcache at all to
retrieve this information.

> main question would be why not two functions,
> like RelationGetIndexRawExpr(Relation relation),
> RelationGetIndexRawPred(Relation relation)

This comes down to if it is clean to have references to the catalog
pg_index in the planner, which is not the case yet so my take is that
two functions is much cleaner even if both return a List.

Anyway, why do you insist in putting the new functions in relcache.c?
I would suggest to move that to lsyscache.c instead, close to
get_index_column_opclass where there are routines for the syscache of
pg_index.  It would be possible to reuse that in the reindex code, for
example.
 
Make sense. 
Please review attached v4 patch.

The patch should add a comment in in plan_create_index_workers()
explaining why we care about raw expressions and indexes rather than
the relcache information.
 
Added 

--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
Thanks for pushing the patch. 

I found a typos:
/*
+ * get_index_expressions
+ *
+ *             Given the index OID, its a List of its expressions or NIL if none.
+ */
"its" should be "return"?


Tender Wang <tndrwang@gmail.com> 于2024年3月6日周三 12:30写道:


Michael Paquier <michael@paquier.xyz> 于2024年3月6日周三 06:45写道:
On Tue, Mar 05, 2024 at 09:22:08PM +0800, jian he wrote:
> in RelationGetIndexClause to, I think you can use the following to
> save a SearchSysCache1 cycle?
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indexprs, NULL))
> return NIL;
>
> or
> if (relation->rd_indextuple == NULL ||
> heap_attisnull(relation->rd_indextuple, Anum_pg_index_indpred, NULL))
> return NIL;

Don't think so.  The point is to not rely on the relcache at all to
retrieve this information.

> main question would be why not two functions,
> like RelationGetIndexRawExpr(Relation relation),
> RelationGetIndexRawPred(Relation relation)

This comes down to if it is clean to have references to the catalog
pg_index in the planner, which is not the case yet so my take is that
two functions is much cleaner even if both return a List.

Anyway, why do you insist in putting the new functions in relcache.c?
I would suggest to move that to lsyscache.c instead, close to
get_index_column_opclass where there are routines for the syscache of
pg_index.  It would be possible to reuse that in the reindex code, for
example.
 
Make sense. 
Please review attached v4 patch.

The patch should add a comment in in plan_create_index_workers()
explaining why we care about raw expressions and indexes rather than
the relcache information.
 
Added 

--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


--
Tender Wang
OpenPie:  https://en.openpie.com/
Tender Wang <tndrwang@gmail.com> writes:
> Thanks for pushing the patch.

My apologies for not having paid much attention to this thread
earlier, but ...

I don't like anything about the committed patch.  It's a band-aid
that doesn't really solve the problem: if I simply declare the
function as PARALLEL SAFE, the failure comes right back.  Of
course you can argue that then I lied about the parallel safety
of the function, but I think a far better answer is to make such
functions actually parallel safe, which I have a patch for
awaiting review in the current commitfest [1].

Moreover I think this breaks a fairly long-standing principle of
the planner, namely that if we can inline or const-fold a function
then its original markings no longer matter.  People are likely
to be surprised that that no longer applies to parallel-safety
decisions, and even more surprised that it only doesn't apply
in this one narrow context.

I also believe that this has introduced new bugs, caused by
passing is_parallel_safe() raw parse-analysis output rather
than a tree that's been through eval_const_expressions.
For example, that processing is responsible for inserting
function default-argument expressions.  If a default argument
is parallel-unsafe, this coding would fail to notice that.
(Compare commit 743ddafc7.)  There might be other structures
in an un-simplified tree that is_parallel_safe() isn't expecting
and would spit up on.

As far as HEAD goes, I think we should revert this and then look
to get [1] committed.  I'm unsure what to do in the back branches,
but given the lack of prior complaints I suspect we could get away
with just reverting and leaving the issue unfixed.  (I wouldn't
propose back-patching [1], it seems too risky.)

If we want to do something about the more generic issue that
maybe const-folding will succeed in the leader process but
fail in workers, then I think the way to do that is to adjust
parallel index build so that the processed expression trees are
passed to the workers from the leader.  That is how it's done
with regular query trees, and index build is evidently taking
a not-very-safe shortcut by not doing it like that.  But I'm not
excited enough about the mostly-theoretical hazard to put any
work into that myself.

Another line of thought is whether we should just forbid any
parallel-unsafe functions in index expressions.  If we continue
to allow them, I think we have to reject parallelizing any
query or DDL operation that touches the associated table at all,
because there's too much risk of a worker trying to evaluate
such an expression somewhere along the line.  Again though,
given the lack of complaints I'm not excited about imposing
such a draconian policy.  (The fact that functions are marked
parallel unsafe by default means that if we did, we'd almost
certainly break cases that are working fine for users today.)

            regards, tom lane

[1] https://commitfest.postgresql.org/47/4776/



On Wed, Mar 06, 2024 at 12:28:11PM -0500, Tom Lane wrote:
> As far as HEAD goes, I think we should revert this and then look
> to get [1] committed.
>
> I'm unsure what to do in the back branches,
> but given the lack of prior complaints I suspect we could get away
> with just reverting and leaving the issue unfixed.

Still the issue fixed here and the issue of the other thread are
different?  Relying on the relcache tricks the planner to spawn
workers for nothing because they would just fail if the internals of
the function are unsafe to run in the workers, and the function is
marked as such.  In some cases users like to use properties as hints
to give to the planner about what to use or not.  I'd like to believe
(and perhaps that's a cute thought) that functions are marked as
parallel-unsafe for a purpose, which is that they cannot run in
workers as they rely on a shared state.  Whether the internals of the
functions are implemented correctly is up to the user.  I mean, it's
perfectly possible to mark a function as immutable while its internals
are volatile, and I've seen people rely on that to push down clauses.

> (I wouldn't propose back-patching [1], it seems too risky.)

Agreed.  Just looked at it and the change of blockState can be
sometimes tricky to get right.  There are a bunch of internal
assumptions as far as I recall this area of the code.

> If we want to do something about the more generic issue that
> maybe const-folding will succeed in the leader process but
> fail in workers, then I think the way to do that is to adjust
> parallel index build so that the processed expression trees are
> passed to the workers from the leader. That is how it's done
> with regular query trees, and index build is evidently taking
> a not-very-safe shortcut by not doing it like that.  But I'm not
> excited enough about the mostly-theoretical hazard to put any
> work into that myself.

You mean in the serialization and deserialization logic used when
workers are spawned and then let the workers complain when they try to
use such a function in a new path?  That would be backpatchable, but
feels rather a bit invasive.

> Another line of thought is whether we should just forbid any
> parallel-unsafe functions in index expressions.  If we continue
> to allow them, I think we have to reject parallelizing any
> query or DDL operation that touches the associated table at all,
> because there's too much risk of a worker trying to evaluate
> such an expression somewhere along the line.  Again though,
> given the lack of complaints I'm not excited about imposing
> such a draconian policy.  (The fact that functions are marked
> parallel unsafe by default means that if we did, we'd almost
> certainly break cases that are working fine for users today.)

Neither am I to restrict that moving forward.  That could hurt
existing use cases that were allowed, still were able to work.

Saying that, seeing the lack of complaints and the fact that you're
not comfortable with that, I have no objections to revert the change
on all the branches.  Will do so in a bit.
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> Still the issue fixed here and the issue of the other thread are
> different?  Relying on the relcache tricks the planner to spawn
> workers for nothing because they would just fail if the internals of
> the function are unsafe to run in the workers, and the function is
> marked as such.

Sure, but the current patch just trades one set of errors for others.
As I said, if we care about this problem then the right fix will
involve serializing/deserializing already-const-folded expression
trees instead of expecting that the workers can regenerate those.
I don't personally think it's worth the trouble, especially not if
we can get rid of the need for exception-using functions to be marked
parallel unsafe.

> In some cases users like to use properties as hints
> to give to the planner about what to use or not.  I'd like to believe
> (and perhaps that's a cute thought) that functions are marked as
> parallel-unsafe for a purpose, which is that they cannot run in
> workers as they rely on a shared state.  Whether the internals of the
> functions are implemented correctly is up to the user.  I mean, it's
> perfectly possible to mark a function as immutable while its internals
> are volatile, and I've seen people rely on that to push down clauses.

It's the other direction that I'm concerned about: if we inline a SQL
function and find that its innards are less volatile than the original
marking, we are free to exploit that, and people will complain if we
don't.  Similarly, if the innards are more parallel safe than the
original marking, we should be able to exploit that.

            regards, tom lane



On Wed, Mar 06, 2024 at 06:01:50PM -0500, Tom Lane wrote:
> Sure, but the current patch just trades one set of errors for others.
> As I said, if we care about this problem then the right fix will
> involve serializing/deserializing already-const-folded expression
> trees instead of expecting that the workers can regenerate those.
> I don't personally think it's worth the trouble, especially not if
> we can get rid of the need for exception-using functions to be marked
> parallel unsafe.

I don't know if that's worth tackling either.  It sounds like this is
just something we should do only if someone comes up with a good use
case for it, outside of the subtransaction handling in the workers.

Anyway, I've just reverted the change for now.
--
Michael

Attachment