Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 8dc812ea-af3f-7a9d-ae9a-d89c25ca89ca@postgrespro.ru
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Proposal] Global temporary tables
List pgsql-hackers

On 28.10.2019 15:07, Robert Haas wrote:
> On Fri, Oct 25, 2019 at 11:14 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>>> Access to GTT at replica:
>>>> 1. Access is prohibited (as for original temp tables). No changes at all.
>>>> 2. Tuples of temp tables are marked with forzen XID.  Minimal changes, rollbacks are not possible.
>>>> 3. Providing special XIDs for GTT at replica. No changes in CLOG are required, but special MVCC visibility rules
areused for GTT. Current limitation: number of transactions accessing GTT at replica is limited by 2^32
 
>>>> and bitmap of correspondent size has to be maintained (tuples of GTT are not proceeded by vacuum and not frozen,
soXID horizon never moved).
 
>>> I again vote for #1. A GTT is defined to allow data to be visible only
>>> within one session -- so what does it even mean for the data to be
>>> accessible on a replica?
>> why not? there are lot of sessions on replica servers. One usage of temp tables is fixing estimation errors. You can
createtemp table with partial query result, run ANALYZE and evaluate other steps. Now this case is not possible on
replicaservers.
 
>>
>> One motivation for GTT  is decreasing port costs from Oracle. But other motivations, like do more complex
calculationson replica are valid and valuable.
 
> Hmm, I think I was slightly confused when I wrote my previous
> response. I now see that what was under discussion was not making data
> from the master visible on the standbys, which really wouldn't make
> any sense, but rather allowing standby sessions to also use the GTT,
> each with its own local copy of the data. I don't think that's a bad
> feature, but look how invasive the required changes are. Not allowing
> rollbacks seems dead on arrival; an abort would be able to leave the
> table and index mutually inconsistent.  A separate XID space would be
> a real solution, perhaps, but it would be *extremely* complicated and
> invasive to implement.

Sorry, but both statements are not true.
As I mentioned before, I have implemented both solutions.

I am not sure how vital is lack of aborts for transactions working with 
GTT at replica.
Some people said that there is no sense in aborts of read-only 
transactions at replica (despite to the fact that them are saving 
intermediate results in GTT).
Some people said something similar with your's "dead on arrival".
But inconsistency is not possible: if such transaction is really 
aborted, then backend is terminated and nobody can see this inconsistency.

Concerning second alternative: you can check yourself that it is not 
*extremely* complicated and invasive.
I extracted changes which are related with handling transactions at 
replica and attached them to this mail.
It is just 500 lines (including diff contexts). Certainly there are some 
limitation of this implementation: number of  transactions working with 
GTT at replica is limited by 2^32
and since GTT tuples are not frozen, analog of GTT CLOG kept in memory 
is never truncated.

>
> One thing that I've learned over and over again as a developer is that
> you get a lot more done if you tackle one problem at a time. GTTs are
> a sufficiently-large problem all by themselves; a major reworking of
> the way XIDs work might be a good project to undertake at some point,
> but it doesn't make any sense to incorporate that into the GTT
> project, which is otherwise about a mostly-separate set of issues.
> Let's not try to solve more problems at once than strictly necessary.
>
I agree with it and think that implementation of GTT with private 
buffers and no replica access is good starting point.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench - extend initialization phase control
Next
From: Tom Lane
Date:
Subject: Re: Remove one use of IDENT_USERNAME_MAX