Thread: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
Hi, Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists clause, the existence of the relation gets checked during the execution of the select part and an error is thrown there. All the unnecessary rewrite and planning for the select part would have happened just to fail later. However, if if-not-exists clause is present, then a notice is issued and returned immediately without any further rewrite or planning for . This seems somewhat inconsistent to me. I propose to check the relation existence early in ExecCreateTableAs() as well as in ExplainOneUtility() and throw an error in case it exists already to avoid unnecessary rewrite, planning and execution of the select part. Attaching a patch. Note that I have not added any test cases as the existing test cases in create_table.sql and matview.sql would cover the code. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
"Hou, Zhijie"
Date:
> Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists > clause, the existence of the relation gets checked during the execution > of the select part and an error is thrown there. > All the unnecessary rewrite and planning for the select part would have > happened just to fail later. However, if if-not-exists clause is present, > then a notice is issued and returned immediately without any further rewrite > or planning for . This seems somewhat inconsistent to me. > > I propose to check the relation existence early in ExecCreateTableAs() as > well as in ExplainOneUtility() and throw an error in case it exists already > to avoid unnecessary rewrite, planning and execution of the select part. > > Attaching a patch. Note that I have not added any test cases as the existing > test cases in create_table.sql and matview.sql would cover the code. > > Thoughts? Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error before any further operation too. I am just a little worried about the behavior change of [explain CTAS]. May be someone will complain the change from normal explaininfo to error output. And I took a look into the patch. + StringInfoData emsg; + + initStringInfo(&emsg); + + if (level == NOTICE) + appendStringInfo(&emsg, Using variable emsg and level seems a little complicated to me. How about just: if (!is_explain && ctas->if_not_exists) ereport(NOTICE,xxx else ereport(ERROR,xxx Best regards, houzj
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
Thanks for taking a look at this. On Fri, Dec 11, 2020 at 6:33 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists > > clause, the existence of the relation gets checked during the execution > > of the select part and an error is thrown there. > > All the unnecessary rewrite and planning for the select part would have > > happened just to fail later. However, if if-not-exists clause is present, > > then a notice is issued and returned immediately without any further rewrite > > or planning for . This seems somewhat inconsistent to me. > > > > I propose to check the relation existence early in ExecCreateTableAs() as > > well as in ExplainOneUtility() and throw an error in case it exists already > > to avoid unnecessary rewrite, planning and execution of the select part. > > > > Attaching a patch. Note that I have not added any test cases as the existing > > test cases in create_table.sql and matview.sql would cover the code. > > > > Thoughts? > > Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error > before any further operation too. > > I am just a little worried about the behavior change of [explain CTAS]. > May be someone will complain the change from normal explaininfo to error output. I think we are clear with the patch for plain i.e. non-EXPLAIN and EXPLAIN ANALYZE CTAS/CMV cases. The behaviour for EXPLAIN is as follows: 1)EXPLAIN without ANALYZE, without patch: select part is planned(note that the relations in the select part are checked for their existence while planning, fails any of them don't exist) , relation(CTAS/CMV being created) existence is not checked as we will not create the relation and execute the plan. 2)EXPLAIN with ANALYZE, without patch: select part is planned, as we execute the plan, relation(CTAS/CMV) existence is checked during the execution and fails there if it exists. 3) EXPLAIN without ANALYZE, with patch: relation(CTAS/CMV) existence is checked before the planning and fails if it exists, without going further to the planning for select part. 4)EXPLAIN with ANALYZE, with patch: relation(CTAS/CMV) existence is checked before the rewrite, planning and fails if it exists, without going further. IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, I can do the following way in ExplainOneUtility and will add a comment on why we are doing this. if (es->analyze) (void) CheckRelExistenceInCTAS(ctas, true); Thoughts? > And I took a look into the patch. > > + StringInfoData emsg; > + > + initStringInfo(&emsg); > + > + if (level == NOTICE) > + appendStringInfo(&emsg, > > Using variable emsg and level seems a little complicated to me. > How about just: > > if (!is_explain && ctas->if_not_exists) > ereport(NOTICE,xxx > else > ereport(ERROR,xxx I will modify it in the next version of the patch which I plan to send once agreed on the above point. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
"Hou, Zhijie"
Date:
> IMO, let's not change the 1) behaviour to 3) with the patch. If agreed,
> I can do the following way in ExplainOneUtility and will add a comment on
> why we are doing this.
>
> if (es->analyze)
> (void) CheckRelExistenceInCTAS(ctas, true);
>
> Thoughts?
Agreed.
Just in case, I took a look at Oracle 12’s behavior about [explain CTAS].
Oracle 12 will output the plan without throwing any msg in this case.
Best regards,
houzj
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Fri, Dec 11, 2020 at 12:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, > > > I can do the following way in ExplainOneUtility and will add a comment on > > > why we are doing this. > > > if (es->analyze) > > > (void) CheckRelExistenceInCTAS(ctas, true); > > > Thoughts? > > Agreed. Thanks! So, I will post an updated patch soon. > Just in case, I took a look at Oracle 12’s behavior about [explain CTAS]. > > Oracle 12 will output the plan without throwing any msg in this case. I'm not quite sure how other databases behave. If I go by the main intention of EXPLAIN without ANALYZE, that should do the planning, show it in the output and no execution of the query should happen. For EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and no execution happens so no existence check for the CTAS/CMV relation that will get created if the CTAS/CMV is executed. Having said that, the existence of the relations that are in the SELECT part are anyways checked during planning for EXPLAIN without ANALYZE. IMHO, let's not alter the existing behaviour, if needed, that can be discussed separately. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote: > I'm not quite sure how other databases behave. If I go by the main > intention of EXPLAIN without ANALYZE, that should do the planning, > show it in the output and no execution of the query should happen. For > EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and > no execution happens so no existence check for the CTAS/CMV relation > that will get created if the CTAS/CMV is executed. Having said that, > the existence of the relations that are in the SELECT part are anyways > checked during planning for EXPLAIN without ANALYZE. I think that it is tricky to define IF NOT EXISTS for a CTAS with EXPLAIN. How would you for example treat an EXPLAIN ANALYZE with a query that includes an INSERT RETURNING in a WITH clause. Would you say that we do nothing if the relation exists? Or would you execute it, still insert nothing on the result relation because it already exists, even if the inner query may have inserted something as part of its execution on a different relation? -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Fri, Dec 11, 2020 at 1:40 PM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote: > > I'm not quite sure how other databases behave. If I go by the main > > intention of EXPLAIN without ANALYZE, that should do the planning, > > show it in the output and no execution of the query should happen. For > > EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and > > no execution happens so no existence check for the CTAS/CMV relation > > that will get created if the CTAS/CMV is executed. Having said that, > > the existence of the relations that are in the SELECT part are anyways > > checked during planning for EXPLAIN without ANALYZE. > > I think that it is tricky to define IF NOT EXISTS for a CTAS with > EXPLAIN. How would you for example treat an EXPLAIN ANALYZE with a > query that includes an INSERT RETURNING in a WITH clause. Would you > say that we do nothing if the relation exists? Or would you execute > it, still insert nothing on the result relation because it already > exists, even if the inner query may have inserted something as part of > its execution on a different relation? I may not have got your above scenario correctly(it will be good if you can provide the use case in case I want to check something there). I tried the following way, all the involved relations are being checked for existence even though for EXPLAIN: postgres=# EXPLAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO t1_does_not_exit VALUES (1); ERROR: relation "t1_does_not_exit" does not exist LINE 1: ...LAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO t1_does_no... ^ IIUC, is it that we want the following behaviour in case the relation CTAS/CMV is trying to create does not exist? Note that the sample queries are run on latest master branch: EXPLAIN: throw an error, instead of the query showing select plan on master branch currently? postgres=# explain create table t2 as select * from t1; QUERY PLAN ---------------------------------------------------- Seq Scan on t1 (cost=0.00..2.00 rows=100 width=8) EXPLAIN ANALYZE: throw an error as it does on master branch? postgres=# explain analyze create table t2 as select * from t1; ERROR: relation "t2" already exists EXPLAIN with if-not-exists clause: throw a warning and an empty plan from ExplainOneUtility? If not an empty plan, we should be doing the relation existence check before we come to explain routines, maybe in gram.c? On the master branch it doesn't happen, the query shows the plan for select part as shown below. postgres=# explain create table if not exists t2 as select * from t1; QUERY PLAN ---------------------------------------------------- Seq Scan on t1 (cost=0.00..2.00 rows=100 width=8) EXPLAIN ANALYZE with if-not-exists clause: (ideally, for if-not-exists clause we expect a warning to be issued, but currently relation existence error is thrown) a warning and an empty plan from ExplainOneUtility? If not an empty plan, we should be doing the relation existence check before we come to explain routines, maybe in gram.c? On the master branch an ERROR is thrown. postgres=# explain analyze create table if not exists t2 as select * from t1; ERROR: relation "t2" already exists For plain CTAS -> throw an error as it happens on master branch. postgres=# create table t2 as select * from t1; ERROR: relation "t2" already exists For plain CTAS with if-not-exists clause -> a warning is issued as it happens on master branch. postgres=# create table if not exists t2 as select * from t1; NOTICE: relation "t2" already exists, skipping CREATE TABLE AS With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Fri, Dec 11, 2020 at 03:03:46PM +0530, Bharath Rupireddy wrote: > I may not have got your above scenario correctly(it will be good if > you can provide the use case in case I want to check something there). It is possible to have DML queries in WITH clauses, as long as they use RETURNING to feed tuples to the outer query. Just imagine something like that: =# explain analyze create table if not exists aa as with insert_query as (insert into aa values (1) returning a) select * from insert_query; Please note that this case fails with your patch, but the presence of IF NOT EXISTS should ensure that we don't fail and issue a NOTICE instead, no? Taking this case specifically (OK, I am playing with the rules a bit to insert data into the relation itself, still), this query may finish by adding tuples to the table whose creation should have been bypassed but the query got executed and inserted tuples. That's one example of behavior that may be confusing. There may be others, but it seems to me that it may be simpler to execute or even plan the query at all if the relation already exists. -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote: > Please note that this case fails with your patch, but the presence of > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE > instead, no? Taking this case specifically (OK, I am playing with > the rules a bit to insert data into the relation itself, still), this > query may finish by adding tuples to the table whose creation should > have been bypassed but the query got executed and inserted tuples. > That's one example of behavior that may be confusing. There may be > others, but it seems to me that it may be simpler to execute or even > plan the query at all if the relation already exists. Er.. Sorry. I meant here to *not* execute or even *not* plan the query at all if the relation already exists. -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote: > > Please note that this case fails with your patch, but the presence of > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE > > instead, no? Thanks for the use case. The provided use case (or for that matter any use case with explain analyze ctas if-not-exists) fails if the relation already exists. It happens on the master branch, please have a look at tests [1]. You are right in saying that whether it is explain/explain analyze ctas if there is if-not-exists we should issue notice instead of error as with plain ctas. Do we want to fix this behaviour for explain/explain analyze ctat with if-not-exists cases? Thoughts? If yes, we could change the code in ExplainOneUtility() such that we check relation existence before rewrite/planning, issue notice and return. Then. the user sees a notice and an empty plan as we are returning from ExplainOneUtility(). Is it okay to show a warning and an empty plan to the user? Thoughts? >> Taking this case specifically (OK, I am playing with > > the rules a bit to insert data into the relation itself, still), this > > query may finish by adding tuples to the table whose creation should > > have been bypassed but the query got executed and inserted tuples. IIUC, with the use case provided, the tuples will not be inserted as the query later fails (and the txn gets aborted) if the relation exists. > > That's one example of behavior that may be confusing. There may be > > others, but it seems to me that it may be simpler to execute or even > > plan the query at all if the relation already exists. > > Er.. Sorry. I meant here to *not* execute or even *not* plan the > query at all if the relation already exists. +1 to not plan and execute the query at all if the relation which ctas/cmv is trying to create already exists. [1] - postgres=# explain analyze postgres-# create table if not exists aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; ERROR: relation "aa" already exists postgres=# explain analyze postgres-# create table aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; ERROR: relation "aa" already exists postgres=# explain postgres-# create table aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; QUERY PLAN ------------------------------------------------------------ CTE Scan on insert_query (cost=0.01..0.03 rows=1 width=4) CTE insert_query -> Insert on aa (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) postgres=# explain postgres-# create table if not exists aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; QUERY PLAN ------------------------------------------------------------ CTE Scan on insert_query (cost=0.01..0.03 rows=1 width=4) CTE insert_query -> Insert on aa (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) postgres=# create table aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; ERROR: relation "aa" already exists postgres=# create table if not exists aa as postgres-# with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-# select * from insert_query; NOTICE: relation "aa" already exists, skipping CREATE TABLE AS With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Mon, Dec 14, 2020 at 1:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote:
> > > Please note that this case fails with your patch, but the presence of
> > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
> > > instead, no?
>
> Thanks for the use case. The provided use case (or for that matter any
> use case with explain analyze ctas if-not-exists) fails if the
> relation already exists. It happens on the master branch, please have
> a look at tests [1]. You are right in saying that whether it is
> explain/explain analyze ctas if there is if-not-exists we should issue
> notice instead of error as with plain ctas.
>
> Do we want to fix this behaviour for explain/explain analyze ctat with
> if-not-exists cases? Thoughts?
>
> If yes, we could change the code in ExplainOneUtility() such that we
> check relation existence before rewrite/planning, issue notice and
> return. Then. the user sees a notice and an empty plan as we are
> returning from ExplainOneUtility(). Is it okay to show a warning and
> an empty plan to the user? Thoughts?
>
> >> Taking this case specifically (OK, I am playing with
> > > the rules a bit to insert data into the relation itself, still), this
> > > query may finish by adding tuples to the table whose creation should
> > > have been bypassed but the query got executed and inserted tuples.
>
> IIUC, with the use case provided, the tuples will not be inserted as
> the query later fails (and the txn gets aborted) if the relation
> exists.
>
> > > That's one example of behavior that may be confusing. There may be
> > > others, but it seems to me that it may be simpler to execute or even
> > > plan the query at all if the relation already exists.
> >
> > Er.. Sorry. I meant here to *not* execute or even *not* plan the
> > query at all if the relation already exists.
>
> +1 to not plan and execute the query at all if the relation which
> ctas/cmv is trying to create already exists.
Posting a v2 patch after modifying the new function CheckRelExistenceInCTAS() a bit as suggested earlier.
postgres=# create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE: relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table if not exists foo as select 1;
ERROR: relation "foo" already exists
On master/without patch:
postgres=# create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE: relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table foo as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
(1 row)
postgres=# explain create table if not exists foo as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
(1 row)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
> On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote:
> > > Please note that this case fails with your patch, but the presence of
> > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
> > > instead, no?
>
> Thanks for the use case. The provided use case (or for that matter any
> use case with explain analyze ctas if-not-exists) fails if the
> relation already exists. It happens on the master branch, please have
> a look at tests [1]. You are right in saying that whether it is
> explain/explain analyze ctas if there is if-not-exists we should issue
> notice instead of error as with plain ctas.
>
> Do we want to fix this behaviour for explain/explain analyze ctat with
> if-not-exists cases? Thoughts?
>
> If yes, we could change the code in ExplainOneUtility() such that we
> check relation existence before rewrite/planning, issue notice and
> return. Then. the user sees a notice and an empty plan as we are
> returning from ExplainOneUtility(). Is it okay to show a warning and
> an empty plan to the user? Thoughts?
>
> >> Taking this case specifically (OK, I am playing with
> > > the rules a bit to insert data into the relation itself, still), this
> > > query may finish by adding tuples to the table whose creation should
> > > have been bypassed but the query got executed and inserted tuples.
>
> IIUC, with the use case provided, the tuples will not be inserted as
> the query later fails (and the txn gets aborted) if the relation
> exists.
>
> > > That's one example of behavior that may be confusing. There may be
> > > others, but it seems to me that it may be simpler to execute or even
> > > plan the query at all if the relation already exists.
> >
> > Er.. Sorry. I meant here to *not* execute or even *not* plan the
> > query at all if the relation already exists.
>
> +1 to not plan and execute the query at all if the relation which
> ctas/cmv is trying to create already exists.
Posting a v2 patch after modifying the new function CheckRelExistenceInCTAS() a bit as suggested earlier.
The behavior of the ctas/cmv, in case the relation already exists is as shown in [1]. The things that have been changed with the patch are: 1) In any case we do not rewrite or plan the select part if the relation already exists 2) For explain ctas/cmv (without analyze), now the relation existence is checked early and the error is thrown as highlighted in [1].
With patch, there is no behavioral change(from that of master branch) in explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the notice.
Thoughts?
[1]
With patch:postgres=# create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE: relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table if not exists foo as select 1;
ERROR: relation "foo" already exists
On master/without patch:
postgres=# create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE: relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR: relation "foo" already exists
postgres=# explain create table foo as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
(1 row)
postgres=# explain create table if not exists foo as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
(1 row)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > The behavior of the ctas/cmv, in case the relation already exists is as > shown in [1]. The things that have been changed with the patch are: 1) In > any case we do not rewrite or plan the select part if the relation already > exists 2) For explain ctas/cmv (without analyze), now the relation > existence is checked early and the error is thrown as highlighted in [1]. > > With patch, there is no behavioral change(from that of master branch) in > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > notice. > > Thoughts? HEAD is already a mixed bad of behaviors, and the set of results you are presenting here is giving a similar impression. It brings in some sanity by just ignoring the effects of the IF NOT EXISTS clause all the time still that's not consistent with the queries not using EXPLAIN. Hmm. Looking for similar behaviors, I can see one case in select_into.sql where we just never execute the plan when using WITH NO DATA but still show the plan, meaning that the query gets planned but it just gets marked as "(never executed)" if attempting to use ANALYZE. There may be use cases for that as the user directly asked directly for an EXPLAIN. Note: the patch needs tests for all the patterns you would like to stress. This way it is easier to follow the patterns that are changing with your patch and compare them with the HEAD behavior (like looking at the diffs with the tests of the patch, but without the diffs in src/backend/). -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > > The behavior of the ctas/cmv, in case the relation already exists is as > > shown in [1]. The things that have been changed with the patch are: 1) In > > any case we do not rewrite or plan the select part if the relation already > > exists 2) For explain ctas/cmv (without analyze), now the relation > > existence is checked early and the error is thrown as highlighted in [1]. > > > > With patch, there is no behavioral change(from that of master branch) in > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > > notice. > > > > Thoughts? > > HEAD is already a mixed bad of behaviors, and the set of results you > are presenting here is giving a similar impression. It brings in some > sanity by just ignoring the effects of the IF NOT EXISTS clause all > the time still that's not consistent with the queries not using > EXPLAIN. I tried to make it consistent by issuing NOTICE (not an error) even for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and exit from the ExplainOneUtility, we could output an empty plan to the user because, by now ExplainResultDesc would have been called at the start of the explain via PortalStart(). I didn't find a clean way of coding if we are not okay to show notice and empty plan to the user. Any suggestions on achieving above? > Hmm. Looking for similar behaviors, I can see one case in > select_into.sql where we just never execute the plan when using WITH > NO DATA but still show the plan, meaning that the query gets planned > but it just gets marked as "(never executed)" if attempting to use > ANALYZE. Yes, with no data we would see "(never executed)" for explain analyze if the relation does not already exist. If the relation does exist, then the error/notice. >There may be use cases for that as the user directly asked directly for an EXPLAIN. IMHO, in any case checking for the existence of the relations specified in a query is must before we output something to the user. For instance, the query "explain select * from non_existent_tbl;" where non_existent_tbl doesn't exist, throws an error. Similarly, "explain create table already_existing_tbl as select * from another_tbl;" where the table ctas/select into trying to create already exists, should also throw error. But that's not happening currently on master. Which seems to be a problem to me. So, with the patch proposed here, we error out in this case. If the user really wants to see the explain plan, then he/she should use the correct query. > Note: the patch needs tests for all the patterns you would like to > stress. This way it is easier to follow the patterns that are > changing with your patch and compare them with the HEAD behavior (like > looking at the diffs with the tests of the patch, but without the > diffs in src/backend/). Sure, I will add test cases and post v3 patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > > > The behavior of the ctas/cmv, in case the relation already exists is as > > > shown in [1]. The things that have been changed with the patch are: 1) In > > > any case we do not rewrite or plan the select part if the relation already > > > exists 2) For explain ctas/cmv (without analyze), now the relation > > > existence is checked early and the error is thrown as highlighted in [1]. > > > > > > With patch, there is no behavioral change(from that of master branch) in > > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > > > notice. > > > > > > Thoughts? > > > > HEAD is already a mixed bad of behaviors, and the set of results you > > are presenting here is giving a similar impression. It brings in some > > sanity by just ignoring the effects of the IF NOT EXISTS clause all > > the time still that's not consistent with the queries not using > > EXPLAIN. > > I tried to make it consistent by issuing NOTICE (not an error) even > for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and > exit from the ExplainOneUtility, we could output an empty plan to the > user because, by now ExplainResultDesc would have been called at the > start of the explain via PortalStart(). I didn't find a clean way of > coding if we are not okay to show notice and empty plan to the user. > > Any suggestions on achieving above? > > > Hmm. Looking for similar behaviors, I can see one case in > > select_into.sql where we just never execute the plan when using WITH > > NO DATA but still show the plan, meaning that the query gets planned > > but it just gets marked as "(never executed)" if attempting to use > > ANALYZE. > > Yes, with no data we would see "(never executed)" for explain analyze > if the relation does not already exist. If the relation does exist, > then the error/notice. > > >There may be use cases for that as the user directly asked directly for an EXPLAIN. > > IMHO, in any case checking for the existence of the relations > specified in a query is must before we output something to the user. > For instance, the query "explain select * from non_existent_tbl;" > where non_existent_tbl doesn't exist, throws an error. Similarly, > "explain create table already_existing_tbl as select * from > another_tbl;" where the table ctas/select into trying to create > already exists, should also throw error. But that's not happening > currently on master. Which seems to be a problem to me. So, with the > patch proposed here, we error out in this case. > > If the user really wants to see the explain plan, then he/she should > use the correct query. > > > Note: the patch needs tests for all the patterns you would like to > > stress. This way it is easier to follow the patterns that are > > changing with your patch and compare them with the HEAD behavior (like > > looking at the diffs with the tests of the patch, but without the > > diffs in src/backend/). > > Sure, I will add test cases and post v3 patch. Attaching v3 patch that also contains test cases. Please review it further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Mon, Dec 21, 2020 at 12:01:38PM +0530, Bharath Rupireddy wrote: > On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy >> I tried to make it consistent by issuing NOTICE (not an error) even >> for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and >> exit from the ExplainOneUtility, we could output an empty plan to the >> user because, by now ExplainResultDesc would have been called at the >> start of the explain via PortalStart(). I didn't find a clean way of >> coding if we are not okay to show notice and empty plan to the user. >> >> Any suggestions on achieving above? I was looking at your patch today, and I actually found the conclusion to output an empty plan while issuing a NOTICE to be quite intuitive if the caller uses IF NOT EXISTS with EXPLAIN. > Attaching v3 patch that also contains test cases. Please review it further. Thanks for adding some test cases! Some of them were exact duplicates, so it is possible to reduce the number of queries without impacting the coverage. I have also chosen a query that forces an error within the planner. Please see the attached. IF NOT EXISTS implies that CTAS or CREATE MATVIEW will never ERROR if the relation already exists, with or without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent behavior across all the patterns. Note: I'd like to think that we could choose a better name for CheckRelExistenceInCTAS(). -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote: > I was looking at your patch today, and I actually found the conclusion > to output an empty plan while issuing a NOTICE to be quite intuitive > if the caller uses IF NOT EXISTS with EXPLAIN. Thanks! > Thanks for adding some test cases! Some of them were exact > duplicates, so it is possible to reduce the number of queries without > impacting the coverage. I have also chosen a query that forces an > error within the planner. > Please see the attached. IF NOT EXISTS implies that CTAS or CREATE > MATVIEW will never ERROR if the relation already exists, with or > without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent > behavior across all the patterns. LGTM. > Note: I'd like to think that we could choose a better name for > CheckRelExistenceInCTAS(). I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. Please let me know if this is okay. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote: > On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote: >> Note: I'd like to think that we could choose a better name for >> CheckRelExistenceInCTAS(). > > I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. > Please let me know if this is okay. After thinking about that, using "CTAS" while other routines in the same area use "CreateTableAs" looks inconsistent to me. So I have come up with CreateTableAsRelExists() as name. As the same time, I have looked at the git history to note 9bd27b7 where we had better not give an empty output for non-text formats. So I'd like to think that it makes sense to use ExplainDummyGroup() if the relation exists with IF NOT EXISTS, keeping some consistency. What do you think? -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Wed, Dec 23, 2020 at 6:01 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote: > > On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote: > >> Note: I'd like to think that we could choose a better name for > >> CheckRelExistenceInCTAS(). > > > > I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. > > Please let me know if this is okay. > > After thinking about that, using "CTAS" while other routines in the > same area use "CreateTableAs" looks inconsistent to me. So I have > come up with CreateTableAsRelExists() as name. I think CreateTableAsRelExists() can return true if the relation already exists and false otherwise, to keep in sync with the function name. I updated this and attached v7 patch. > As the same time, I have looked at the git history to note 9bd27b7 > where we had better not give an empty output for non-text formats. So > I'd like to think that it makes sense to use ExplainDummyGroup() if > the relation exists with IF NOT EXISTS, keeping some consistency. > > What do you think? +1. Shall we add some test cases(with xml, yaml, json formats as is currently being done in explain.sql) to cover that? We can have the explain_filter() function to remove the unstable parts in the output, it looks something like below. If yes, please let me know I can add them to matview and select_into. postgres=# select explain_filter('explain(analyze, format xml) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter ------------------------------------------------------- <explain xmlns="http://www.postgresql.org/N/explain">+ <CREATE-TABLE-AS /> + </explain> (1 row) postgres=# select explain_filter('explain(analyze, format yaml) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter --------------------- - "CREATE TABLE AS" (1 row) postgres=# select explain_filter('explain(analyze, format json) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter --------------------- [ + "CREATE TABLE AS"+ ] (1 row) With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote: > +1. Shall we add some test cases(with xml, yaml, json formats as is > currently being done in explain.sql) to cover that? We can have the > explain_filter() function to remove the unstable parts in the output, > it looks something like below. If yes, please let me know I can add > them to matview and select_into. I am not sure that we need tests for all the formats, but having at least one of them sounds good to me. I leave the choice up to you. What we have here looks rather committable. Let's wait until the period of vacations is over before wrapping this up to give others the occasion to comment. -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Bharath Rupireddy
Date:
On Thu, Dec 24, 2020 at 7:40 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote: > > +1. Shall we add some test cases(with xml, yaml, json formats as is > > currently being done in explain.sql) to cover that? We can have the > > explain_filter() function to remove the unstable parts in the output, > > it looks something like below. If yes, please let me know I can add > > them to matview and select_into. > > I am not sure that we need tests for all the formats, but having at > least one of them sounds good to me. I leave the choice up to you. Since I tested that with all the formats manually here and it works, so I don't want to make the test cases complicated with adding explain_filter() function into matview.sql and select_into.sql and all that. I'm okay without those test cases. > What we have here looks rather committable. Let's wait until the > period of vacations is over before wrapping this up to give others the > occasion to comment. Thanks! Happy Vacations! With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Thu, Dec 24, 2020 at 09:10:22AM +0530, Bharath Rupireddy wrote: > Since I tested that with all the formats manually here and it works, > so I don't want to make the test cases complicated with adding > explain_filter() function into matview.sql and select_into.sql and all > that. I'm okay without those test cases. Please note that I have added an entry in the CF app for the moment so as we don't lose track of it: https://commitfest.postgresql.org/31/2892/ >> What we have here looks rather committable. Let's wait until the >> period of vacations is over before wrapping this up to give others the >> occasion to comment. > > Thanks! Happy Vacations! You too! -- Michael
Attachment
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
From
Michael Paquier
Date:
On Thu, Dec 24, 2020 at 01:23:40PM +0900, Michael Paquier wrote: > Please note that I have added an entry in the CF app for the moment so > as we don't lose track of it: > https://commitfest.postgresql.org/31/2892/ I have been able to look at that again today, and applied it. I have tweaked a bit the comments, and added an elog(ERROR) as a safety net for explain.c if the IFNE code path is taken for an object type that is not expected with CreateTableAsStmt. -- Michael