Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Wenjing Zeng
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 17EEA6A4-3FA5-4C01-81BB-26D32BEC6343@gmail.com
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers


2022年2月27日 08:21,Justin Pryzby <pryzby@telsasoft.com> 写道:

I read through this.
Find attached some language fixes.  You should be able to apply each "fix"
patch on top of your own local branch with git am, and then squish them
together.  Let me know if you have trouble with that.

I think get_seqence_start_value() should be static.  (Or otherwise, it should
be in lsyscache.c).

The include added to execPartition.c seems to be unused.

+#define RELATION_IS_TEMP_ON_CURRENT_SESSION(relation) \
+#define RELATION_IS_TEMP(relation) \
+#define RelpersistenceTsTemp(relpersistence) \
+#define RELATION_GTT_ON_COMMIT_DELETE(relation)    \

=> These macros can evaluate their arguments multiple times.
You should add a comment to warn about that.  And maybe avoid passing them a
function argument, like: RelpersistenceTsTemp(get_rel_persistence(rte->relid))

+list_all_backend_gtt_frozenxids should return TransactionId not int.
The function name should say "oldest" and not "all" ?

I think the GUC should have a longer name.  max_active_gtt is too short for a
global var.

+#define    MIN_NUM_ACTIVE_GTT          0
+#define    DEFAULT_NUM_ACTIVE_GTT          1000
+#define    MAX_NUM_ACTIVE_GTT          1000000

+int        max_active_gtt = MIN_NUM_ACTIVE_GTT

It's being initialized to MIN, but then the GUC machinery sets it to DEFAULT.
By convention, it should be initialized to default.

fout->remoteVersion >= 140000

=> should say 15

describe.c has gettext_noop("session"), which is a half-truth.  The data is
per-session but the table definition is persistent..
Thanks for your advice, I will try to merge this part of the code.


You redirect stats from pg_class and pg_statistics to a local hash table.
This is pretty hairy :(
I guess you'd also need to handle pg_statistic_ext and ext_data.
pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to look
at pg_get_gtt_statistics.

I wonder if there's a better way to do it, like updating pg_statistic but
forcing the changes to be rolled back when the session ends...  But I think
that would make longrunning sessions behave badly, the same as "longrunning
transactions".

There are three pieces of data related to session-level GTT data that need to be managed
session-level storage info like relfilenode
2 session-level like relfrozenxid
session-level stats like relpages or column stats

I think the 1 and 2 are necessary, but not for stats.
In the previous email, It has been suggested that GTT statistics not be processed.
This means that GTT statistics are not recorded in the localhash or catalog.
In my observation, very few users require an accurate query plan for temporary tables to
perform manual analyze.
Of course, doing this will also avoid catalog bloat and performance problems.



Have you looked at Gilles Darold's GTT extension ?
If you are referring to https://github.com/darold/pgtt , yes.
It is smart to use unlogged table as a template and then use LTT to read and write data.
For this implementation, I want to point out two things:
1 For the first insert of GTT in each session, create table or create index is implicitly executed.
2 The catalog bloat caused by LTT still exist.


Regards, Wenjing.


<0002-f-0002-gtt-v64-doc.txt><0004-f-0003-gtt-v64-implementation.txt><0006-f-0004-gtt-v64-regress.txt>

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error
Next
From: Peter Smith
Date:
Subject: Re: Failed transaction statistics to measure the logical replication progress