Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [Proposal] Global temporary tables
Date
Msg-id CAFj8pRDVaNQhg1juJKuj3A2fN-TW5LsMVCqyOkEhKKjbjHEk5w@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: [Proposal] Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers


po 27. 1. 2020 v 10:11 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:


On 24.01.2020 22:39, Pavel Stehule wrote:
I cannot to evaluate your proposal, and I am sure, so you know more about this code.

There is a question if we can allow to build local temp index on global temp table. It is different situation. When I work with global properties personally I prefer total asynchronous implementation of any DDL operations for other than current session. When it is true, then I have not any objection. For me, good enough design of any DDL can be based on catalog change without forcing to living tables.


From my point of view there are two difference uses cases of temp tables:
1. Backend needs some private data source which is specific to this session and has no relation with activities of other sessions.
2. We need a table  containing private session data, but which is used in the same way by all database users.

In the first case current Postgres temp tables works well (if we forget for a moment about all known issues related with temp tables).
Global temp tables are used to address the second scenario.  Assume that we write some stored procedure or implement some business logic  outside database and
what to perform some complex analtic query which requires tepmp table for storing intermediate results. In this case we can create GTT with all needed index at the moment of database initialization
and do not perform any extra DDL during query execution. If will prevent catalog bloating and makes execution of query more efficient.

I do not see any reasons to allow build local indexes for global table. Yes,it can happen that some session will have small amount of data in particular GTT and another - small amount of data in this table. But if access pattern is the same  (and nature of GTT assumes it), then index in either appreciate, either useless in both cases.




I see following disadvantage of your proposal. See scenario

1. I have two sessions

A - small GTT with active owner
B - big GTT with some active application.

session A will do new index - it is fast, but if creating index is forced on B on demand (when B was touched), then this operation have to wait after index will be created.

So I afraid build a index on other sessions on GTT when GTT tables in other sessions will not be empty.


Yes, it is true. But is is not the most realistic scenario from my point of view.
As I explained above, GTT should be used when we need temporary storage accessed in the same way by all clients.
If (as with normal tables) at some moment of time DBA realizes, that efficient execution of some queries needs extra indexes,
then it should be able to do it. It is very inconvenient and unnatural to prohibit DBA to do it until all sessions using this GTT are closed (it may never happen)
or require all sessions to restart to be able to use this index.

So it is possible to imagine different scenarios of working with GTTs.
But from my point of view the only non-contradictory model of their behavior is to make it compatible with normal tables.
And do not forget about compatibility with Oracle. Simplifying of porting existed applications from Oracle to Postgres  may be the
main motivation of adding GTT to Postgres. And making them incompatible with Oracle will be very strange.

I don't think so compatibility with Oracle is valid point in this case. We need GTT, but the mechanism of index building should be designed for Postgres, and for users.

Maybe the method proposed by you can be activated by some option like CREATE INDEX IMMEDIATELY FOR ALL SESSION. When you use GTT without index, then
it should to work some time more, and if you use short life sessions, then index build can be last or almost last operation over table and can be suboptimal.

Anyway, this behave can be changed later without bigger complications - and now I am have strong opinion to prefer don't allow to any DDL (with index creation) on any active GTT in other sessions.
Probably your proposal - build indexes on other sessions when GTT is touched can share code with just modify metadata and wait on session reset or GTT reset

Usually it is not hard problem to refresh sessions, and what I know when you update plpgsql code, it is best practice to refresh session early.




-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: our checks for read-only queries are not great
Next
From: Mark Dilger
Date:
Subject: Re: making the backend's json parser work in frontend code