Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 6cbc12a0-bf0e-0205-959e-d871eb7e7c3e@postgrespro.ru
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Proposal] Global temporary tables  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 29.01.2020 17:47, Robert Haas wrote:
> On Wed, Jan 29, 2020 at 3:13 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> But I heard only two arguments:
>>
>> 1. Concurrent building of indexes by all backends may consume much memory (n_backends * maintenance_work_mem) and
consumea lot of disk/CPU resources.
 
>> 2. GTT in one session can contains large amount of data and we need index for it, but small amount of data in
anothersession and we do not need index for it.
 
> You seem to be ignoring the fact that two committers told you this
> probably wasn't safe.
>
> Perhaps your view is that those people made no argument, and therefore
> you don't have to respond to it. But the onus is not on somebody else
> to tell you why a completely novel idea is not safe. The onus is on
> you to analyze it in detail and prove that it is safe. What you need
> to show is that there is no code anywhere in the system which will be
> confused by an index springing into existence at whatever time you're
> creating it.
>
> One problem is that there are various backend-local data structures in
> the relcache, the planner, and the executor that remember information
> about indexes, and that may not respond well to having more indexes
> show up unexpectedly. On the one hand, they might crash; on the other
> hand, they might ignore the new index when they shouldn't. Another
> problem is that the code which creates indexes might fail or misbehave
> when run in an environment different from the one in which it
> currently runs. I haven't really studied your code, so I don't know
> exactly what it does, but for example it would be really bad to try to
> build an index while holding a buffer lock, both because it might
> cause (low-probability) undetected deadlocks and also because it might
> block another process that wants that buffer lock in a
> non-interruptible wait state for a long time.
>
> Now, maybe you can make an argument that you only create indexes at
> points in the query that are "safe." But I am skeptical, because of
> this example:
>
> rhaas=# create table foo (a int primary key, b text, c text, d text);
> CREATE TABLE
> rhaas=# create function blump() returns trigger as $$begin create
> index on foo (b); return new; end$$ language plpgsql;
> CREATE FUNCTION
> rhaas=# create trigger thud before insert on foo execute function blump();
> CREATE TRIGGER
> rhaas=# insert into foo (a) select generate_series(1,10);
> ERROR:  cannot CREATE INDEX "foo" because it is being used by active
> queries in this session
> CONTEXT:  SQL statement "create index on foo (b)"
> PL/pgSQL function blump() line 1 at SQL statement
>
> That prohibition is there for some reason. Someone did not just decide
> to arbitrarily prohibit it. A CREATE INDEX command run in that context
> won't run afoul of many of the things that might be problems in other
> places -- e.g. there won't be a buffer lock held. Yet, despite the
> fact that a trigger context is safe for executing a wide variety of
> user-defined code, this particular operation is not allowed here. That
> is the sort of thing that should worry you.
>
> At any rate, even if this somehow were or could be made safe,
> on-the-fly index creation is a feature that cannot and should not be
> combined with a patch to implement global temporary tables. Surely, it
> will require a lot of study and work to get the details right. And so
> will GTT. As I said in the other email I wrote, this feature is hard
> enough without adding this kind of thing to it. There's a reason why I
> never got around to implementing this ten years ago when I did
> unlogged tables; I was intending that to be a precursor to the GTT
> work. I found that it was too hard and I gave up. I'm glad to see
> people trying again, but the idea that we can afford to add in extra
> features, or frankly that either of the dueling patches on this thread
> are close to committable, is just plain wrong.
>

Sorry, I really didn't consider statements containing word "probably" as 
arguments.
But I agree with you: it is task of developer of new feature to prove 
that proposed approach is safe rather than of reviewers to demonstrate 
that it is unsafe.
Can I provide such proof now? I afraid that not.
But please consider two arguments:

1. Index for GTT in any case has to be initialized on demand. In case of 
regular tables index is initialized at the moment of its creation. In 
case of GTT it doesn't work.
So we should somehow detect that accessed index is not initialized and 
perform lazy initialization of the index.
The only difference with the approach proposed by Pavel  (allow index 
for empty GTT but prohibit it for GTT filled with data) is whether we 
also need to populate index with data or not.
I can imagine that implicit initialization of index in read-only query 
(select) can be unsafe and cause some problems. I have not encountered 
such problems yet after performing many tests with GTTs, but certainly I 
have not covered all possible scenarios (not sure that it is possible at 
all).
But I do not understand how populating  index with data can add some 
extra unsafety.

So I can not prove that building index for GTT on demand is safe, but it 
is not more unsafe than initialization of index on demand which is 
required in any case.

2. Actually I do not propose some completely new approach. I try to 
provide behavior with is compatible with regular tables.
If you create index for regular table, then it can be used in all 
sessions, right?
And all "various backend-local data structures in the relcache, the 
planner, and the executor that remember information about indexes"
have to be properly updated.  It is done using invalidation mechanism. 
The same mechanism is used in case of DDL operations with GTT, because 
we change system catalog.

So my point here is that creation index of GTT is almost the same as 
creation of index for regular tables and the same mechanism will be used 
to provide correctness of this operation.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: making the backend's json parser work in frontend code
Next
From: Peter Eisentraut
Date:
Subject: Re: doc: vacuum full, fillfactor, and "extra space"