Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 407d6499-950d-e3d9-acbc-a2e86d8b82a3@postgrespro.ru
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On 27.01.2020 22:44, Pavel Stehule wrote:

I don't think so compatibility with Oracle is valid point in this case. We need GTT, but the mechanism of index building should be designed for Postgres, and for users.

Maybe the method proposed by you can be activated by some option like CREATE INDEX IMMEDIATELY FOR ALL SESSION. When you use GTT without index, then
it should to work some time more, and if you use short life sessions, then index build can be last or almost last operation over table and can be suboptimal.

Anyway, this behave can be changed later without bigger complications - and now I am have strong opinion to prefer don't allow to any DDL (with index creation) on any active GTT in other sessions.
Probably your proposal - build indexes on other sessions when GTT is touched can share code with just modify metadata and wait on session reset or GTT reset

Well, compatibility with Oracle was never treated as important argument in this group:)
But I hope that you agree that it real argument against your proposal.
Much more important argument is incompatibility with behavior of regular table.
If you propose such incompatibility, then you should have some very strong arguments for such behavior which will definitely confuse users.

But I heard only two arguments:

1. Concurrent building of indexes by all backends may consume much memory (n_backends * maintenance_work_mem) and consume a lot of disk/CPU resources.

First of all it is not completely true. Indexes will be created on demand when GTT will be accessed and chance that all sessions will become building indexes simultaneously is very small.

But what will happen if we prohibit access to this index for existed sessions? If we need index for GTT, then most likely it is used for joins.
If there is no index, then optimizer has to choose some other plan to perform this join. For example use hash join. Hash join also requires memory,
so if all backends will perform such join simultaneously, then them consume (n_backends * work_mem) memory.
Yes, work_mem is used to be smaller than maintenance_work_mem. But in any case DBA has a choice to adjust this parameters to avoid this problem.
And in case of your proposal (prohibit access to this index) you give him no choice to optimize query execution in existed sessions.

Also if all sessions will simultaneously perform sequential scan of GTT instead of building index for it, then them will read the same amount of data and consume comparable CPU time.
So prohibiting access to the indexes will not save us from high resources consumption if all existed sessions are really actively working with this GTT.

2. GTT in one session can contains large amount of data and we need index for it, but small amount of data in another session and we do not need index for it.

Such situation definitely can happen. But it contradicts to the main assumption of GTT use case (that it is accessed in the same way by all sessions).
Also I may be agree with this argument if you propose to create indexes locally for each sessions.
But your proposal is to prohibit access to the index to the sessions which already have populated GTT with data but allow it for sessions which have not accessed this GTT yet.
So if some session stores some data in GTT after index was created, then it will build index for it, doesn't matter whether size of table is small or large.
Why do we make an exception for sessions which already have data in GTT in this case?

So from my point of view both arguments are doubtful and can not explain why rules of index usability for GTT should be different from regular tables.

Usually it is not hard problem to refresh sessions, and what I know when you update plpgsql code, it is best practice to refresh session early.


I know may systems where session is established once client is connected to the system and not closed until client is disconnected.
And any attempt to force termination of the session will cause application errors which are not expected by the client.


Sorry, I think that it is principle point in discussion concerning GTT design.
Implementation of GTT can be changed in future, but it is bad if behavior of GTT will be changed.
It is not clear for me why from the very beginning we should provide inconsistent behavior which is even more difficult to implement than behavior compatible with regular tables.
And say that in the future it can be changed...

Sorry, but I do not consider proposals to create indexes locally for each session (i.e. global tables but private indexes) or use some special complicated SQL syntax constructions like
CREATE INDEX IMMEDIATELY FOR ALL SESSION as some real alternatives which have to be discussed.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Physical replication slot advance is not persistent
Next
From: Luis Carril
Date:
Subject: Re: Option to dump foreign data in pg_dump