Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [Proposal] Global temporary tables
Date
Msg-id CAFj8pRDVURvQBZhGeUEBBTaqnjfK3w7u+xZfAqVB+-ncUezFZA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (曾文旌 <wenjing.zwj@alibaba-inc.com>)
Responses Re: [Proposal] Global temporary tables  (Prabhat Sahu <prabhat.sahu@enterprisedb.com>)
Re: [Proposal] Global temporary tables  (曾文旌 <wenjing.zwj@alibaba-inc.com>)
List pgsql-hackers


čt 23. 4. 2020 v 9:10 odesílatel 曾文旌 <wenjing.zwj@alibaba-inc.com> napsal:


2020年4月22日 下午10:50,Pavel Stehule <pavel.stehule@gmail.com> 写道:



st 22. 4. 2020 v 16:38 odesílatel Prabhat Sahu <prabhat.sahu@enterprisedb.com> napsal:


On Wed, Apr 22, 2020 at 2:49 PM 曾文旌 <wenjing.zwj@alibaba-inc.com> wrote:

Although the implementation of GTT is different, I think so TRUNCATE on Postgres (when it is really finalized) can remove session metadata of GTT too (and reduce usage's counter). It is not critical feature, but I think so it should not be hard to implement. From practical reason can be nice to have a tool how to refresh GTT without a necessity to close session. TRUNCATE can be this tool.
Sorry, I don't quite understand what you mean, could you describe it in detail? 
In my opinion the TRUNCATE GTT cannot clean up data in other sessions, especially clean up local buffers in other sessions.

It is about a possibility to force reset GTT to default empty state for all sessions.

Maybe it is some what does your TRUNCATE DROP, but I don't think so this design (TRUNCATE DROP) is good, because then user have to know implementation detail.

I prefer some like TRUNCATE tab WITH OPTION (GLOBAL, FORCE) - "GLOBAL" .. apply on all sessions, FORCE try to do without waiting on some global lock, try to do immediately with possibility to cancel some statements and rollback some session.

instead GLOBAL maybe we can use  "ALLSESSION", or "ALL SESSION" or some else

But I like possible terminology LOCAL x GLOBAL for GTT. What I mean? Some statements like "TRUNCATE" can  works (by default) in "local" mode .. it has impact to current session only. But sometimes can be executed in "global" mode with effect on all sessions.



Yes, I think we need a way to delete the GTT local storage without closing the session.

I provide the TRUNCATE tablename DROP to clear the data in the GTT and delete the storage files.
This feature requires the current transaction to commit immediately after it finishes truncate.

Hi Wenjing,
Thanks for the patch(v30) for the new syntax support for (TRUNCATE table_name DROP) for deleting storage files after TRUNCATE on GTT.

This syntax looks strange, and I don't think so it solve anything in practical life, because without lock the table will be used in few seconds by other sessions.

If a dba wants to delete or modify a GTT, he can use locks to help him make the change.

postgres=# begin;
BEGIN
postgres=*# LOCK TABLE gtt2 IN ACCESS EXCLUSIVE MODE;
postgres=*# select * from pg_gtt_attached_pids ;

Kill session or let session do TRUNCATE tablename DROP

postgres=*# drop table gtt2;
DROP TABLE
postgres=*# commit;
COMMIT

yes, user can lock a tables. But I think so it is user friendly design. I don't remember any statement in Postgres, where I have to use table locks explicitly.

For builtin commands it should be done transparently (for user).

Regards

Pavel
 


This is same topic when we talked about ALTER - when and where the changes should be applied.

The CLUSTER commands works only on session private data, so it should not to need some special lock or some special cleaning before.

Regards

Pavel
 
 
Please check below scenarios:

Case1:
-- session1:
postgres=# create global temporary table gtt2 (c1 integer) on commit preserve rows;
CREATE TABLE
postgres=# create index  idx1 on gtt2 (c1);
CREATE INDEX
postgres=# create index  idx2 on gtt2 (c1) where c1%2 =0;
CREATE INDEX
postgres=#
postgres=# CLUSTER gtt2 USING idx1;
CLUSTER
postgres=# CLUSTER gtt2 USING idx2;
ERROR:  cannot cluster on partial index "idx2"        

Case2:
-- Session2:
postgres=# CLUSTER gtt2 USING idx1;
CLUSTER
postgres=# CLUSTER gtt2 USING idx2;
CLUSTER

postgres=# insert into gtt2 values(1);
INSERT 0 1
postgres=# CLUSTER gtt2 USING idx1;
CLUSTER
postgres=# CLUSTER gtt2 USING idx2;
ERROR:  cannot cluster on partial index "idx2"

Case3:
-- Session2:
postgres=# TRUNCATE gtt2 DROP;
TRUNCATE TABLE
postgres=# CLUSTER gtt2 USING idx1;
CLUSTER
postgres=# CLUSTER gtt2 USING idx2;
CLUSTER

In Case2, Case3 we can observe, with the absence of data in GTT, we are able to "CLUSTER gtt2 USING idx2;" (having partial index)
But why does the same query fail for Case1 (absence of data)?

Thanks,
Prabhat Sahu

 


Wenjing



Regards

Pavel


All in all, I think the current implementation is sufficient for dba to manage GTT.

2020年4月2日 下午4:45,Prabhat Sahu <prabhat.sahu@enterprisedb.com> 写道:

Hi All,

I have noted down few behavioral difference in our GTT implementation in PG as compared to Oracle DB:
As per my understanding, the behavior of DROP TABLE in case of "Normal table and GTT" in Oracle DB are as below:
  1. Any tables(Normal table / GTT) without having data in a session, we will be able to DROP from another session.
  2. For a completed transaction on a normal table having data, we will be able to DROP from another session. If the transaction is not yet complete, and we are trying to drop the table from another session, then we will get an error. (working as expected)
  3. For a completed transaction on GTT with(on commit delete rows) (i.e. no data in GTT) in a session, we will be able to DROP from another session.
  4. For a completed transaction on GTT with(on commit preserve rows) with data in a session, we will not be able to DROP from any session(not even from the session in which GTT is created), we need to truncate the table data first from all the session(session1, session2) which is having data.
1. Any tables(Normal table / GTT) without having data in a session, we will be able to DROP from another session.
Session1:
create table t1 (c1 integer);
create global temporary table gtt1 (c1 integer) on commit delete rows;
create global temporary table gtt2 (c1 integer) on commit preserve rows;


Session2:
drop table t1;
drop table gtt1;
drop table gtt2;


-- Issue 1: But we are able to drop a simple table and failed to drop GTT as below.
postgres=# drop table t1;
DROP TABLE
postgres=# drop table gtt1;
ERROR:  can not drop relation gtt1 when other backend attached this global temp table
postgres=# drop table gtt2;
ERROR:  can not drop relation gtt2 when other backend attached this global temp table

3. For a completed transaction on GTT with(on commit delete rows) (i.e. no data in GTT) in a session, we will be able to DROP from another session.
Session1:
create global temporary table gtt1 (c1 integer) on commit delete rows;

Session2:
drop table gtt1;

-- Issue 2: But we are getting error for GTT with(on_commit_delete_rows) without data.
postgres=# drop table gtt1;
ERROR:  can not drop relation gtt1 when other backend attached this global temp table

4. For a completed transaction on GTT with(on commit preserve rows) with data in any session, we will not be able to DROP from any session(not even from the session in which GTT is created)

Case1:
create global temporary table gtt2 (c1 integer) on commit preserve rows;
insert into gtt2 values(100);
drop table gtt2;


SQL> drop table gtt2;
drop table gtt2
  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use


-- Issue 3: But, we are able to drop the GTT(having data) which we have created in the same session.
postgres=# drop table gtt2;
DROP TABLE

Case2: GTT with(on commit preserve rows) having data in both session1 and session2
Session1:
create global temporary table gtt2 (c1 integer) on commit preserve rows;
insert into gtt2 values(100);


Session2:
insert into gtt2 values(200);

-- If we try to drop the table from any session we should get an error, it is working fine.
drop table gtt2;
SQL> drop table gtt2;
drop table gtt2
  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use

postgres=# drop table gtt2 ;
ERROR:  can not drop relation gtt2 when other backend attached this global temp table

-- To drop the table gtt2 from any session1/session2, we need to truncate the table data first from all the session(session1, session2) which is having data.
Session1:
truncate table gtt2;
-- Session2:
truncate table gtt2;

Session 2:
SQL> drop table gtt2;

Table dropped.


-- Issue 4: But we are not able to drop the GTT, even after TRUNCATE the table in all the sessions.
-- truncate from all sessions where GTT have data.
postgres=# truncate gtt2 ;
TRUNCATE TABLE


-- try to DROP GTT still, we are getting error.
postgres=# drop table gtt2 ;
ERROR:  can not drop relation gtt2 when other backend attached this global temp table

To drop the GTT from any session, we need to exit from all other sessions.
postgres=# drop table gtt2 ;
DROP TABLE


Kindly let me know if I am missing something.


On Wed, Apr 1, 2020 at 6:26 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi Wenjing,
I hope we need to change the below error message.

postgres=# create global temporary table gtt(c1 int) on commit preserve rows;
CREATE TABLE

postgres=# create materialized view mvw as select * from gtt;
ERROR: materialized views must not use global temporary tables or views


Anyways we are not allowed to create a "global temporary view", 
so the above ERROR message should change(i.e. " or view" need to be removed from the error message) something like:
"ERROR: materialized views must not use global temporary tables"

--

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com



--

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com





--

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: sqlsmith crash incremental sort
Next
From: Asif Rehman
Date:
Subject: Re: WIP/PoC for parallel backup