Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Wenjing Zeng
Subject Re: [Proposal] Global temporary tables
Date
Msg-id DA46EFFC-0550-428D-BFD6-8350B503D626@gmail.com
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers


2022年2月25日 15:45,Andres Freund <andres@anarazel.de> 写道:

Hi,


This is a huge thread. Realistically reviewers and committers can't reread
it. I think there needs to be more of a description of how this works included
in the patchset and *why* it works that way. The readme does a bit of that,
but not particularly well.
Thank you for your review of the design and code.
I'm always trying to improve it. If you are confused or need clarification on something, please point it out.




On 2022-02-25 14:26:47 +0800, Wenjing Zeng wrote:
+++ b/README.gtt.txt
@@ -0,0 +1,172 @@
+Global Temporary Table(GTT)
+=========================================
+
+Feature description
+-----------------------------------------
+
+Previously, temporary tables are defined once and automatically
+exist (starting with empty contents) in every session before using them.

I think for a README "previously" etc isn't good language - if it were
commited, it'd not be understandable anymore. It makes more sense for commit
messages etc.
Thanks for pointing it out. I will adjust the description.



+Main design ideas
+-----------------------------------------
+In general, GTT and LTT use the same storage and buffer design and
+implementation. The storage files for both types of temporary tables are named
+as t_backendid_relfilenode, and the local buffer is used to cache the data.

What does "named ast_backendid_relfilenode" mean?
This is the storage file naming format for describing temporary tables.
It starts with 't', followed by backendid and relfilenode, connected by an underscore.
File naming rules are the same as LTT.
The data in the file is no different from regular tables and LTT.



+The schema of GTTs is shared among sessions while their data are not. We build
+a new mechanisms to manage those non-shared data and their statistics.
+Here is the summary of changes:
+
+1) CATALOG
+GTTs store session-specific data. The storage information of GTTs'data, their
+transaction information, and their statistics are not stored in the catalog.
+
+2) STORAGE INFO & STATISTICS INFO & TRANSACTION INFO
+In order to maintain durability and availability of GTTs'session-specific data,
+their storage information, statistics, and transaction information is managed
+in a local hash table tt_storage_local_hash.

"maintain durability"? Durable across what? In the context of databases it's
typically about crash safety, but that can't be the case here.
It means that the transaction information(relfrozenxid/relminmxid)  storage information(relfilenode)
and statistics(relpagesof GTT, which are maintained in hashtable , not pg_class.
This is to allow GTT to store its own local data in different sessions and to avoid frequent catalog changes.



+3) DDL
+Currently, GTT supports almost all table'DDL except CLUSTER/VACUUM FULL.
+Part of the DDL behavior is limited by shared definitions and multiple copies of
+local data, and we added some structures to handle this.

+A shared hash table active_gtt_shared_hash is added to track the state of the
+GTT in a different session. This information is recorded in the hash table
+during the DDL execution of the GTT.

+The data stored in a GTT can only be modified or accessed by owning session.
+The statements that only modify data in a GTT do not need a high level of
+table locking. The operations making those changes include truncate GTT,
+reindex GTT, and lock GTT.

I think you need to introduce a bit more terminology for any of this to make
sense. Sometimes GTT means the global catalog entity, sometimes, like here, it
appears to mean the session specific contents of a GTT.

What state of a GTT in a nother session?


How do GTTs handle something like BEGIN; TRUNCATE some_gtt_table; ROLLBACK;?

GTT behaves exactly like a regular table.
Specifically, the latest relfilenode for the current session is stored in the hashtable and may change it.
If the transaction rolls back, the old relfilenode is enabled again, just as it is in pg_class.



+1.2 on commit clause
+LTT's status associated with on commit DELETE ROWS and on commit PRESERVE ROWS
+is not stored in catalog. Instead, GTTs need a bool value on_commit_delete_rows
+in reloptions which is shared among sessions.

Why?
The LTT is always created and used in the current session. The on commit clause property
does not need to be shared with other sessions. This is why LTT does not record the on commit clause
in the catalog.
However, GTT's table definitions are shared between sessions, including the on commit clause,
so it needs to be saved in the catalog.





+2.3 statistics info
+1) relpages reltuples relallvisible relfilenode

?
It was mentioned above.


+3 DDL
+3.1. active_gtt_shared_hash
+This is the hash table created in shared memory to trace the GTT files initialized
+in each session. Each hash entry contains a bitmap that records the backendid of
+the initialized GTT file. With this hash table, we know which backend/session
+is using this GTT. Such information is used during GTT's DDL operations.

So there's a separate locking protocol for GTTs that doesn't use the normal
locking infrastructure? Why?


+3.7 CLUSTER GTT/VACUUM FULL GTT
+The current version does not support.

Why?
Currently, GTT cannot reuse clusters for regular table processes. I choose not to support it for now.
Also, I can't think of any scenario that would require clustering for temporary tables, which
is another reason why not support cluster first.




+4 MVCC commit log(clog) cleanup
+
+The GTT storage file contains transaction information. Queries for GTT data rely
+on transaction information such as clog. The transaction information required by
+each session may be completely different.

Why is transaction information different between sessions? Or does this just
mean that different transaction ids will be accessed?

It has the same meaning as pg_class.relfrozenxid.
For the same GTT, the first transaction to write data in each session is different and
the data is independent of each other. They have a separate frozenxid.
The vacuum clog process needs to consider it.




0003-gtt-v67-implementation.patch
71 files changed, 3167 insertions(+), 195 deletions(-)

This needs to be broken into smaller chunks to be reviewable.


@@ -677,6 +678,14 @@ _bt_getrootheight(Relation rel)
{
Buffer metabuf;

+ /*
+ * If a global temporary table storage file is not initialized in the
+ * this session, its index does not have a root page, just returns 0.
+ */
+ if (RELATION_IS_GLOBAL_TEMP(rel) &&
+ !gtt_storage_attached(RelationGetRelid(rel)))
+ return 0;
+
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
metad = _bt_getmeta(rel, metabuf);

Stuff like this seems not acceptable. Accesses would have to be prevented much
earlier. Otherwise each index method is going to need copies of this logic. I
also doubt that _bt_getrootheight() is the only place that'd need this.
You are right, this is done to solve the empty GTT being queried. I don't need it anymore,
so I'll get rid of it.



static void
index_update_stats(Relation rel,
  bool hasindex,
-   double reltuples)
+   double reltuples,
+   bool isreindex)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -2797,6 +2824,13 @@ index_update_stats(Relation rel,
Form_pg_class rd_rel;
bool dirty;

+ /*
+ * Most of the global Temp table data is updated to the local hash, and reindex
+ * does not refresh relcache, so call a separate function.
+ */
+ if (RELATION_IS_GLOBAL_TEMP(rel))
+ return index_update_gtt_relstats(rel, hasindex, reltuples, isreindex);
+

So basically every single place in the code that does catalog accesses is
going to need a completely separate implementation for GTTs? That seems
unmaintainable.
create Index on GTT and VACUUM GTT do this.
Some info of the table (relhasIndex) need to be updated to pg_class, 
while others not (relpages…).
Would you prefer to extend it on the original function?





+/*-------------------------------------------------------------------------
+ *
+ * storage_gtt.c
+ *  The body implementation of Global temparary table.
+ *
+ * IDENTIFICATION
+ *  src/backend/catalog/storage_gtt.c
+ *
+ *  See src/backend/catalog/GTT_README for Global temparary table's
+ *  requirements and design.
+ *
+ *-------------------------------------------------------------------------
+ */

I don't think that path to the readme is correct.
I tried to reorganize it.


Regards, Wenjing.




Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: psql: Make SSL info display more compact
Next
From: Masahiko Sawada
Date:
Subject: Add the replication origin name and commit-LSN to logical replication worker errcontext