Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 20200112015109.caab6qdr2ob6d5jv@development
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: [Proposal] Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>
>
>On 09.01.2020 19:48, Tomas Vondra wrote:
>>
>>>The most complex and challenged task is to support GTT for all 
>>>kind of indexes. Unfortunately I can not proposed some good 
>>>universal solution for it.
>>>Just patching all existed indexes implementation seems to be the 
>>>only choice.
>>>
>>
>>I haven't looked at the indexing issue closely, but IMO we need to
>>ensure that every session sees/uses only indexes on GTT that were
>>defined before the seesion started using the table.
>
>Why? It contradicts with behavior of normal tables.
>Assume that you have active clients and at some point of time DBA 
>recognizes that them are spending to much time in scanning some GTT.
>It cab create index for this GTT but if existed client will not be 
>able to use this index, then we need somehow make this clients to 
>restart their sessions?
>In my patch I have implemented building indexes for GTT on demand: if 
>accessed index on GTT is not yet initialized, then it is filled with 
>local data.

Yes, I know the behavior would be different from behavior for regular
tables. And yes, it would not allow fixing slow queries in sessions
without interrupting those sessions.

I proposed just ignoring those new indexes because it seems much simpler
than alternative solutions that I can think of, and it's not like those
other solutions don't have other issues.

For example, I've looked at the "on demand" building as implemented in
global_private_temp-8.patch, I kinda doubt adding a bunch of index build
calls into various places in index code seems somewht suspicious.

* brinbuild is added to brinRevmapInitialize, which is meant to
   initialize state for scanning. It seems wrong to build the index we're
   scanning from this function (layering and all that).

* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?

... and so on for other index types. Also, what about custom indexes
implemented in extensions? It seems a bit strange each of them has to
support this separately.

IMHO if this really is the right solution, we need to make it work for
existing indexes without having to tweak them individually. Why don't we
track a flag whether an index on GTT was initialized in a given session,
and if it was not then call the build function before calling any other
function from the index AM? 

But let's talk about other issues caused by "on demand" build. Imagine
you have 50 sessions, each using the same GTT with a GB of per-session
data. Now you create a new index on the GTT, which forces the sessions
to build it's "local" index. Those builds will use maintenance_work_mem
each, so 50 * m_w_m. I doubt that's expected/sensible.

So I suggest we start by just ignoring the *new* indexes, and improve
this in the future (by building the indexes on demand or whatever).

>>
>>Can't we track which indexes a particular session sees, somehow?
>>
>>>Statistic is another important case.
>>>But once again I do not completely understand why we want to 
>>>address all this issues with statistic in first version of the 
>>>patch?
>>
>>I think the question is which "issues with statistic" you mean. I'm sure
>>we can ignore some of them, e.g. the one with parallel workers not
>>having any stats (assuming we consider functions using GTT to be
>>parallel restricted).
>
>If we do not use shared buffers for GTT then parallel processing of 
>GTT is not possible at all, so there is no problem with statistic for 
>parallel workers.
>

Right.

>>
>>I think someone pointed out pushing stuff directly into the cache is
>>rather problematic, but I don't recall the details.
>>
>I have not encountered any problems, so if you can point me on what is 
>wrong with this approach, I will think about alternative solution.
>

I meant this comment by Robert:

https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Why is pq_begintypsend so slow?
Next
From: Tom Lane
Date:
Subject: Re: Why is pq_begintypsend so slow?