Home > mailing lists

Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: [Proposal] Global temporary tables
Date	January 12, 2020 01:51:09
Msg-id	20200112015109.caab6qdr2ob6d5jv@development Whole thread Raw
In response to	Re: [Proposal] Global temporary tables (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses	Re: [Proposal] Global temporary tables Re: [Proposal] Global temporary tables
List	pgsql-hackers

Tree view

On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>
>
>On 09.01.2020 19:48, Tomas Vondra wrote:
>>
>>>The most complex and challenged task is to support GTT for all 
>>>kind of indexes. Unfortunately I can not proposed some good 
>>>universal solution for it.
>>>Just patching all existed indexes implementation seems to be the 
>>>only choice.
>>>
>>
>>I haven't looked at the indexing issue closely, but IMO we need to
>>ensure that every session sees/uses only indexes on GTT that were
>>defined before the seesion started using the table.
>
>Why? It contradicts with behavior of normal tables.
>Assume that you have active clients and at some point of time DBA 
>recognizes that them are spending to much time in scanning some GTT.
>It cab create index for this GTT but if existed client will not be 
>able to use this index, then we need somehow make this clients to 
>restart their sessions?
>In my patch I have implemented building indexes for GTT on demand: if 
>accessed index on GTT is not yet initialized, then it is filled with 
>local data.

Yes, I know the behavior would be different from behavior for regular
tables. And yes, it would not allow fixing slow queries in sessions
without interrupting those sessions.

I proposed just ignoring those new indexes because it seems much simpler
than alternative solutions that I can think of, and it's not like those
other solutions don't have other issues.

For example, I've looked at the "on demand" building as implemented in
global_private_temp-8.patch, I kinda doubt adding a bunch of index build
calls into various places in index code seems somewht suspicious.

* brinbuild is added to brinRevmapInitialize, which is meant to
   initialize state for scanning. It seems wrong to build the index we're
   scanning from this function (layering and all that).

* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?

... and so on for other index types. Also, what about custom indexes
implemented in extensions? It seems a bit strange each of them has to
support this separately.

IMHO if this really is the right solution, we need to make it work for
existing indexes without having to tweak them individually. Why don't we
track a flag whether an index on GTT was initialized in a given session,
and if it was not then call the build function before calling any other
function from the index AM? 

But let's talk about other issues caused by "on demand" build. Imagine
you have 50 sessions, each using the same GTT with a GB of per-session
data. Now you create a new index on the GTT, which forces the sessions
to build it's "local" index. Those builds will use maintenance_work_mem
each, so 50 * m_w_m. I doubt that's expected/sensible.

So I suggest we start by just ignoring the *new* indexes, and improve
this in the future (by building the indexes on demand or whatever).

>>
>>Can't we track which indexes a particular session sees, somehow?
>>
>>>Statistic is another important case.
>>>But once again I do not completely understand why we want to 
>>>address all this issues with statistic in first version of the 
>>>patch?
>>
>>I think the question is which "issues with statistic" you mean. I'm sure
>>we can ignore some of them, e.g. the one with parallel workers not
>>having any stats (assuming we consider functions using GTT to be
>>parallel restricted).
>
>If we do not use shared buffers for GTT then parallel processing of 
>GTT is not possible at all, so there is no problem with statistic for 
>parallel workers.
>

Right.

>>
>>I think someone pointed out pushing stuff directly into the cache is
>>rather problematic, but I don't recall the details.
>>
>I have not encountered any problems, so if you can point me on what is 
>wrong with this approach, I will think about alternative solution.
>

I meant this comment by Robert:

https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: David Fetter
Date: 12 January 2020, 01:41:08
Subject: Re: Why is pq_begintypsend so slow?

From: Tom Lane
Date: 12 January 2020, 02:43:56
Subject: Re: Why is pq_begintypsend so slow?

Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

Previous

Next