Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [Proposal] Global temporary tables
Date
Msg-id CAFj8pRD+wXPERhmFxHr8d0Rm-3m51TUCCAJRG7WXXmQujUcL6w@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  ("曾文旌(义从)" <wenjing.zwj@alibaba-inc.com>)
List pgsql-hackers


čt 7. 11. 2019 v 13:17 odesílatel 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> napsal:


2019年11月7日 下午5:40,Pavel Stehule <pavel.stehule@gmail.com> 写道:



čt 7. 11. 2019 v 10:30 odesílatel 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> napsal:


> 2019年11月7日 上午12:08,Konstantin Knizhnik <k.knizhnik@postgrespro.ru> 写道:
>
>
>
> On 06.11.2019 16:24, 曾文旌(义从) wrote:
>> Dear Hackers
>>
>>
>> I attached the patch of GTT implementationI base on PG12.
>> The GTT design came from my first email.
>> Some limitations in patch will be eliminated in later versions.
>>
>> Later, I will comment on Konstantin's patch and make some proposals for cooperation.
>> Looking forward to your feedback.
>>
>> Thanks.
>>
>> Zeng Wenjing
>>
>
> Thank you for this patch.
> My first comments:
>
> 1.  I have ported you patch to the latest Postgres version (my patch is attached).
> 2. You patch is supporting only B-Tree index for GTT. All other indexes (hash, gin, gist, brin,...) are not currently supported.
Currently I only support btree index.
I noticed that your patch supports more index types, which is where I'd like to work with you.

> 3. I do not understand the reason for the following limitation:
> "We allow to create index on global temp table only this session use it"
>
> First of all it seems to significantly reduce usage of global temp tables.
> Why do we need GTT at all? Mostly because we need to access temporary data in more than one backend. Otherwise we can just use normal table.
> If temp table is expected to be larger enough, so that we need to create index for it, then it is hard to believe that it will be needed only in one backend.
>
> May be the assumption is that all indexes has to be created before GTT start to be used.
Yes, Currently, GTT's index is only supported and created in an empty table state, and other sessions are not using it.
There has two improvements pointer:
1 Index can create on GTT(A) when the GTT(A)  in the current session is not empty, requiring the GTT table to be empty in the other session.
Index_build needs to be done in the current session just like a normal table. This improvement is relatively easy.

2 Index can create on GTT(A)  when more than one session are using this GTT(A).
Because when I'm done creating an index of the GTT in this session and setting it to be an valid index, it's not true for the GTT in other sessions.
Indexes on gtt in other sessions require "rebuild_index" before using it.
I don't have a better solution right now, maybe you have some suggestions.

I think so DDL operations can be implemented in some reduced form - so DDL are active only for one session, and for other sessions are invisible. Important is state of GTT object on session start.

For example ALTER TABLE DROP COLUMN can has very fatal impact on other sessions. So I think the best of GTT can be pattern - the structure of GTT table is immutable for any session that doesn't do DDL operations.
Yes, Those ddl that need to rewrite data files will have this problem.
This is why I disabled alter GTT in the current version.
It can be improved, such as Alter GTT can also be allowed when only the current session is in use.

I think so it is acceptable solution for some first steps, but I cannot to imagine so this behave can be good for production usage. But can be good enough for some time.

Regards

Pavel

Users can also choose to kick off other sessions that are using gtt, then do alter GTT.
I provide a function(pg_gtt_attached_pid(relation, schema)) to query which session a GTT is being used by.




> But right now this check is not working correctly in any case - if you insert some data into the table, then
> you can not create index any more:
>
> postgres=# create global temp table gtt(x integer primary key, y integer);
> CREATE TABLE
> postgres=# insert into gtt values (generate_series(1,100000), generate_series(1,100000));
> INSERT 0 100000
> postgres=# create index on gtt(y);
> ERROR:  can not create index when have one or more backend attached this global temp table
>
> I wonder why do you need such restriction?
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
> <global_temporary_table_v1-pg13.patch>


pgsql-hackers by date:

Previous
From: "曾文旌(义从)"
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Peter Eisentraut
Date:
Subject: Re: Remove HAVE_LONG_LONG_INT