Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 24099f4e-742c-e92b-729c-77bac1ca80d4@postgrespro.ru
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Proposal] Global temporary tables
Re: [Proposal] Global temporary tables
Re: [Proposal] Global temporary tables
List pgsql-hackers

On 25.10.2019 18:01, Robert Haas wrote:
> On Fri, Oct 11, 2019 at 9:50 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Just to clarify.
>> I have now proposed several different solutions for GTT:
>>
>> Shared vs. private buffers for GTT:
>> 1. Private buffers. This is least invasive patch, requiring no changes in relfilenodes.
>> 2. Shared buffers. Requires changing relfilenode but supports parallel query execution for GTT.
> I vote for #1. I think parallel query for temp objects may be a
> desirable feature, but I don't think it should be the job of a patch
> implementing GTTs to make it happen. In fact, I think it would be an
> actively bad idea, because I suspect that if we do eventually support
> temp relations for parallel query, we're going to want a solution that
> is shared between regular temp tables and global temp tables, not
> separate solutions for each.

Sorry, may be I do not not understand you.
It seems to me that there is only one thing preventing usage of 
temporary tables in parallel plans: private buffers.
If global temporary tables are accessed as normal tables though shared 
buffers then them can be used in parallel queries
and no extra support is required for it.
At least I have checked that parallel queries are correctly worked for 
my implementation of GTT with shared buffers.
So I do not understand about which "separate solutions" you are talking 
about.

I can agree that private buffers may be  good starting point for GTT 
implementation, because it is less invasive and GTT access speed is 
exactly the same as of normal temp tables.
But I do not understand your argument why it is "actively bad idea".

>> Access to GTT at replica:
>> 1. Access is prohibited (as for original temp tables). No changes at all.
>> 2. Tuples of temp tables are marked with forzen XID.  Minimal changes, rollbacks are not possible.
>> 3. Providing special XIDs for GTT at replica. No changes in CLOG are required, but special MVCC visibility rules are
usedfor GTT. Current limitation: number of transactions accessing GTT at replica is limited by 2^32
 
>> and bitmap of correspondent size has to be maintained (tuples of GTT are not proceeded by vacuum and not frozen, so
XIDhorizon never moved).
 
> I again vote for #1. A GTT is defined to allow data to be visible only
> within one session -- so what does it even mean for the data to be
> accessible on a replica?

There are sessions at replica (in case of hot standby), aren't there?

>
>> So except the limitation mentioned above (which I do not consider as critical) there is only one problem which was
notaddressed: maintaining statistics for GTT.
 
>> If all of the following conditions are true:
>>
>> 1) GTT are used in joins
>> 2) There are indexes defined for GTT
>> 3) Size and histogram of GTT in different backends can significantly vary.
>> 4) ANALYZE was explicitly called for GTT
>>
>> then query execution plan built in one backend will be also used for other backends where it can be inefficient.
>> I also do not consider this problem as "show stopper" for adding GTT to Postgres.
> I think that's *definitely* a show stopper.
Well, if both you and Pavel think that it is really "show stopper", then 
this problem really has to be addressed.
I slightly confused about this opinion, because Pavel has told me 
himself that 99% of users never create indexes for temp tables
or run "analyze" for them. And without it, this problem is not a problem 
at all.

>> I still do not understand the opinion of community which functionality of GTT is considered to be most important.
>> But the patch with local buffers and no replica support is small enough to become good starting point.
> Well, it seems we now have two patches for this feature. I guess we
> need to figure out which one is better, and whether it's possible for
> the two efforts to be merged, rather than having two different teams
> hacking on separate code bases.

I am open for cooperations.
Source code of all my patches is available.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: segmentation fault when cassert enabled
Next
From: Tom Lane
Date:
Subject: Re: define bool in pgtypeslib_extern.h